Anthropic Claude · 5-min & 1-hour TTL · 90% off cached reads

Prompt Caching Calculator: see your real Claude savings

Calculate exactly how much Anthropic Claude prompt caching will save you. 5-minute writes cost 1.25× the normal input rate, 1-hour writes cost 2×, and cached reads cost 0.1× (a 90% discount). Enter your prompt structure and call volume — get break-even point, daily savings, and monthly projection.

Caching scenario

Model

Cache strategy

Currency

Cached prompt tokens (system / docs)

Fresh tokens per call (user msg / RAG)

Output tokens per call

Calls per day

Calls per cache TTL window

Claude full calculator

Without caching

—

With caching

—

Savings

—

Break-even

—

Detailed breakdown

Use cases

When prompt caching pays off

Caching shines whenever you have a large, stable prefix reused across many smaller queries — the 90%-off cached read rate compounds fast.

RAG cost reduction

Cache the retrieved documents — pay 1.25× once, then 0.1× on every follow-up question. Often cuts cost 60–80%.

Agent loops

Agents reuse the same tool definitions and instructions across many steps. Caching turns each step's overhead into a 90%-off line item.

Long-doc Q&A

Upload a 100K-token document once, ask 20 questions. Without caching: $50. With caching: $5. The math sells itself.

Customer support bots

Cache the company knowledge base and FAQ. Every user message gets near-instant, near-free context.

Step by step

How to model your caching scenario

Estimate your cached prefix

Count tokens in your stable content: system prompt + tool definitions + retrieved documents that don't change per user message.

Estimate fresh tokens

Count what changes every call — usually just the user message and any per-query RAG. Output tokens never cache.

Pick TTL + volume

5-min if interactive; 1-hour if batch. Set calls per TTL window — that's how many reads you'll get per cache write.

FAQ

Frequently asked questions

How does Anthropic prompt caching work?

You mark sections of your prompt as cacheable. The first call writes them to a cache (at 1.25× the normal input rate for 5-min TTL, or 2× for 1-hour TTL). Every subsequent call within the TTL reads from the cache at 0.1× the normal input rate — a 90% discount on those tokens. The cache expires after 5 minutes (or 1 hour) of inactivity.

When does caching actually save money?

Caching wins when you have a large stable prefix (system prompt, tool defs, knowledge base, long document) that's reused across many short queries. The 5-min cache pays off after roughly 2 reads; the 1-hour cache pays off after roughly 12 reads. Below that break-even, caching costs more than no caching.

Should I use 5-minute or 1-hour TTL?

5-minute is the right default for interactive chat where users rapid-fire questions. 1-hour is better for background agents, long sessions, or batch processing where calls are spaced out over many minutes. The 1-hour write costs more upfront (2× vs 1.25×) but you get more reads before the cache expires.

Can I cache the system prompt and the user message?

Yes — caching is per-block. You can cache the system prompt (stable across all users), then cache a per-user knowledge base, then send a fresh user message. Each block has its own cache breakpoint. The output tokens are never cached and always bill at the normal output rate.

About

About Claude prompt caching

Anthropic Claude's prompt caching feature lets you trade a small upfront premium for a massive discount on repeated content. It's the single biggest cost lever available for production Claude apps — typical savings range from 50% to 90% of input cost.

The mechanics

Cache write (first call) — 1.25× the base input rate for a 5-minute TTL, 2.00× for a 1-hour TTL. You pay this premium once per cache breakpoint.
Cache read (subsequent calls) — 0.10× the base input rate. A 90% discount on cached tokens. This is the win.
Fresh tokens — anything not cached (your variable user message, RAG results that change per query) bills at the normal input rate.
Output tokens — never cached. Always full output rate.

Break-even math

For a 5-minute cache to pay off, you need at least 2 reads in 5 minutes. For 1-hour cache, you need ~12 reads. Below break-even, caching costs more than no caching. Above it, the savings compound fast.

Pricing per model (2026)

Claude Opus 4.7 — $15.00 base input. Cache write 5m: $18.75. Write 1h: $30.00. Read: $1.50.
Claude Sonnet 4.6 — $3.00 base input. Cache write 5m: $3.75. Write 1h: $6.00. Read: $0.30.
Claude Haiku 4.5 — $0.80 base input. Cache write 5m: $1.00. Write 1h: $1.60. Read: $0.08.

Output rates are unchanged: Opus $75 / Sonnet $15 / Haiku $4 per million.