Caching Guide
Redefining API efficiency with Prompt Caching.
Prompt caching is one of the most significant advancements in LLM cost optimization. By caching static prefixes of your prompts, you can reduce latency by up to 80% and costs by up to 90% for repetitive context loads.
Supported Providers
AIWorkbench.dev supports tracking and utilizing caching for the following providers:
Anthropic Claude
Anthropic uses Ephemeral Caching.
- Mechanism: Exact prefix matching.
- Cache Breakpoints: You manually set points in your prompt that should be cached.
- Cost Savings: Cached input tokens are billed at a fraction of the standard input rate.
- TTL: 5 minutes (standard), extendable to 1 hour.
Google Gemini
Gemini offers high-performance Context Caching.
- Mechanism: Automatically detects repeated long context (Gemini 1.5 Pro/Flash).
- Manual Caching: Create a cache resource that lasts up to 40 days.
- Best For: Large documents, video analysis, or complex system instructions used across multiple sessions.
How to use Caching in the Workbench
- Toggle the Caching switch in the model parameters panel.
- For Claude, ensure your system prompt is at the top of the message list.
- Observe the Cache Hit indicators in the response metadata to verify savings.