Developer Documentation

Learn how to integrate CacheLayer into your application and start saving on your LLM API costs in minutes.

Quick Start

Get up and running with CacheLayer in under 5 minutes. All you need is your existing OpenAI or Anthropic Claude code and a CacheLayer API key.

Step 1: Get Your API Keys

  • 1
    CacheLayer API Key: Sign up at cachelayer.io/register and create an API key from your dashboard
  • 2
    Provider API Key: Use your existing OpenAI or Anthropic API key (we never store it)

Step 2: Update Your Code

Just change your base URL and add the CacheLayer API key header. That's it!

Before:

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: 'https://api.openai.com/v1'
});

After:

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: 'https://api.cachelayer.io/v1',
  defaultHeaders: {
    'x-cachelayer-api-key': process.env.CACHELAYER_API_KEY
  }
});

Step 3: Start Saving!

Your requests now automatically benefit from intelligent caching. Monitor your savings and cache performance in real-time from your analytics dashboard.

Integration Examples

CacheLayer works with OpenAI and Anthropic SDKs or any HTTP client. Here are examples for popular languages and providers.

Node.js / TypeScript

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: 'https://api.cachelayer.io/v1',
  defaultHeaders: {
    'x-cachelayer-api-key': process.env.CACHELAYER_API_KEY
  }
});

// Chat completions - works exactly like OpenAI
const completion = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }]
});

// Embeddings - fully supported
const embedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Your text here'
});

Python

from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    base_url="https://api.cachelayer.io/v1",
    default_headers={
        "x-cachelayer-api-key": os.environ.get("CACHELAYER_API_KEY")
    }
)

# Chat completions
completion = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Embeddings
embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="Your text here"
)

Anthropic Claude (Python)

from anthropic import Anthropic

client = Anthropic(
    api_key=os.environ.get("ANTHROPIC_API_KEY"),
    base_url="https://api.cachelayer.io/v1",
    default_headers={
        "x-cachelayer-api-key": os.environ.get("CACHELAYER_API_KEY")
    }
)

# Messages API
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

Anthropic Claude (Node.js)

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
  baseURL: 'https://api.cachelayer.io/v1',
  defaultHeaders: {
    'x-cachelayer-api-key': process.env.CACHELAYER_API_KEY
  }
});

// Messages API
const message = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }]
});

cURL / HTTP (OpenAI)

curl https://api.cachelayer.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-cachelayer-api-key: YOUR_CACHELAYER_KEY" \
  -H "Authorization: Bearer YOUR_OPENAI_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

cURL / HTTP (Anthropic)

curl https://api.cachelayer.io/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-cachelayer-api-key: YOUR_CACHELAYER_KEY" \
  -H "x-api-key: YOUR_ANTHROPIC_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

💡 Pro Tip: All OpenAI and Anthropic SDK features work seamlessly - streaming, function calling, vision, and more. Check the API Reference for complete endpoint documentation.

How It Works

CacheLayer sits between your application and LLM providers (OpenAI, Anthropic), intelligently caching responses to reduce costs and improve performance.

Cache Hit

When we find a cached response, we return it instantly from Redis.

  • • Response time: <5ms
  • • Cost: $0 (no OpenAI charges)
  • • Carbon footprint: 99% reduction

Cache Miss

When there's no cached response, we forward to the provider (OpenAI/Anthropic) and cache the result.

  • • Response time: Normal provider latency
  • • Cost: Standard provider pricing
  • • Result cached for future requests

Request Flow

1

Your app sends request to api.cachelayer.io

2

CacheLayer checks exact cache (Redis) for identical requests

3

If enabled, checks semantic cache for similar requests

4

On cache miss, forwards to the provider (OpenAI/Anthropic) and caches the response

5

Returns response with cache headers (X-CacheLayer-*)

Cache Strategies

CacheLayer offers two caching strategies to maximize your cache hit rate.

Exact Cache (Default)

Matches requests with identical parameters - model, messages, temperature, etc.

Best For:
  • • FAQ bots
  • • Repeated queries
  • • Deterministic outputs
Hit Rate:
  • • 20-40% typical
  • • Higher for static content
  • • Zero cost when hits

Semantic Cache

Enable in Dashboard

Matches semantically similar requests, even with different wording. Uses embeddings to find similar queries above your similarity threshold.

Best For:
  • • Conversational AI
  • • Customer support
  • • Natural language queries
Hit Rate:
  • • 40-70% typical
  • • Adjustable threshold (0.7-0.95)
  • • 2x-3x more hits than exact

Example: "How do I reset my password?" matches "What's the process to change my password?" at 0.87 similarity

⚙️ Configure in Dashboard: Adjust your semantic threshold, cache TTL, and strategy (conservative, balanced, aggressive) from your settings page.

Dashboard & Analytics

Monitor your cache performance, cost savings, and usage in real-time.

Cost Savings

Track total savings, cache hit rate, and cost per request over time

Performance Metrics

Monitor response times, cache hit rates, and request volume

Usage Analytics

View requests by model, endpoint, and time period with detailed breakdowns

Next Steps