Learn how to integrate CacheLayer into your application and start saving on your LLM API costs in minutes.
Get up and running with CacheLayer in under 5 minutes. All you need is your existing OpenAI or Anthropic Claude code and a CacheLayer API key.
Just change your base URL and add the CacheLayer API key header. That's it!
Before:
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: 'https://api.openai.com/v1'
});After:
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: 'https://api.cachelayer.io/v1',
defaultHeaders: {
'x-cachelayer-api-key': process.env.CACHELAYER_API_KEY
}
});Your requests now automatically benefit from intelligent caching. Monitor your savings and cache performance in real-time from your analytics dashboard.
CacheLayer works with OpenAI and Anthropic SDKs or any HTTP client. Here are examples for popular languages and providers.
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: 'https://api.cachelayer.io/v1',
defaultHeaders: {
'x-cachelayer-api-key': process.env.CACHELAYER_API_KEY
}
});
// Chat completions - works exactly like OpenAI
const completion = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello!' }]
});
// Embeddings - fully supported
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Your text here'
});from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
base_url="https://api.cachelayer.io/v1",
default_headers={
"x-cachelayer-api-key": os.environ.get("CACHELAYER_API_KEY")
}
)
# Chat completions
completion = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
# Embeddings
embedding = client.embeddings.create(
model="text-embedding-3-small",
input="Your text here"
)from anthropic import Anthropic
client = Anthropic(
api_key=os.environ.get("ANTHROPIC_API_KEY"),
base_url="https://api.cachelayer.io/v1",
default_headers={
"x-cachelayer-api-key": os.environ.get("CACHELAYER_API_KEY")
}
)
# Messages API
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
baseURL: 'https://api.cachelayer.io/v1',
defaultHeaders: {
'x-cachelayer-api-key': process.env.CACHELAYER_API_KEY
}
});
// Messages API
const message = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello!' }]
});curl https://api.cachelayer.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-cachelayer-api-key: YOUR_CACHELAYER_KEY" \
-H "Authorization: Bearer YOUR_OPENAI_KEY" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'curl https://api.cachelayer.io/v1/messages \
-H "Content-Type: application/json" \
-H "x-cachelayer-api-key: YOUR_CACHELAYER_KEY" \
-H "x-api-key: YOUR_ANTHROPIC_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'💡 Pro Tip: All OpenAI and Anthropic SDK features work seamlessly - streaming, function calling, vision, and more. Check the API Reference for complete endpoint documentation.
CacheLayer sits between your application and LLM providers (OpenAI, Anthropic), intelligently caching responses to reduce costs and improve performance.
When we find a cached response, we return it instantly from Redis.
When there's no cached response, we forward to the provider (OpenAI/Anthropic) and cache the result.
Your app sends request to api.cachelayer.io
CacheLayer checks exact cache (Redis) for identical requests
If enabled, checks semantic cache for similar requests
On cache miss, forwards to the provider (OpenAI/Anthropic) and caches the response
Returns response with cache headers (X-CacheLayer-*)
CacheLayer offers two caching strategies to maximize your cache hit rate.
Matches requests with identical parameters - model, messages, temperature, etc.
Matches semantically similar requests, even with different wording. Uses embeddings to find similar queries above your similarity threshold.
Example: "How do I reset my password?" matches "What's the process to change my password?" at 0.87 similarity
⚙️ Configure in Dashboard: Adjust your semantic threshold, cache TTL, and strategy (conservative, balanced, aggressive) from your settings page.
Monitor your cache performance, cost savings, and usage in real-time.
Track total savings, cache hit rate, and cost per request over time
Monitor response times, cache hit rates, and request volume
View requests by model, endpoint, and time period with detailed breakdowns
Complete technical documentation for all endpoints, parameters, and responses
Adjust cache strategy, semantic threshold, TTL, and other performance settings
Create, rotate, and manage your CacheLayer API keys securely
Need help? Contact our team for technical support and implementation guidance