← back to home

Docs

Core Tools

API WorkbenchMulti-Model CompareToken CounterCost Calculator

Prompt Engineering

Prompt LibraryOptimizerCaching Guide

Security

BYOK ArchitecturePrivacy Policy

Caching Guide

Redefining API efficiency with Prompt Caching.

Prompt caching is one of the most significant advancements in LLM cost optimization. By caching static prefixes of your prompts, you can reduce latency by up to 80% and costs by up to 90% for repetitive context loads.

Supported Providers

AIWorkbench.dev supports tracking and utilizing caching for the following providers:

Anthropic Claude

Anthropic uses Ephemeral Caching.

  • Mechanism: Exact prefix matching.
  • Cache Breakpoints: You manually set points in your prompt that should be cached.
  • Cost Savings: Cached input tokens are billed at a fraction of the standard input rate.
  • TTL: 5 minutes (standard), extendable to 1 hour.

Google Gemini

Gemini offers high-performance Context Caching.

  • Mechanism: Automatically detects repeated long context (Gemini 1.5 Pro/Flash).
  • Manual Caching: Create a cache resource that lasts up to 40 days.
  • Best For: Large documents, video analysis, or complex system instructions used across multiple sessions.

How to use Caching in the Workbench

  1. Toggle the Caching switch in the model parameters panel.
  2. For Claude, ensure your system prompt is at the top of the message list.
  3. Observe the Cache Hit indicators in the response metadata to verify savings.