Anthropic · Balanced · Production workhorse

Claude Sonnet Cost Calculator: estimate Sonnet 4.6 API pricing

Estimate API costs for Claude Sonnet 4.6 — Anthropic's balanced cost-quality tier. $3 input / $15 output per million tokens, 200K context. The default pick for most production workloads.

Prompt text0 tokens

Cost breakdown

Calculator options

Expected output tokens

Monthly call volume

3,000calls / month · ~100 / day

Input is cached

Pricing tier

Reasoning tokens (o1/o3/GPT-5 hidden thinking — billed at output rate)

Auto-calculateRecalculate on every change

Count tokens precisely

Input tokens

Output tokens

Per call

—

Status

Ready

Pricing reference

Claude Sonnet 4.6 pricing

Source: Anthropic pricing page (2026). 5× cheaper than Opus, 3× more expensive than Haiku.

Model	Input / 1M	Output / 1M	Context
Claude Sonnet 4.6 Anthropic	$3.00	$15.00	200K

Use cases

What you'll use this for

LLM costs scale fast. A quick estimate before you ship saves real money in production.

Budget planning

Project monthly and annual API spend before you ship. Catch surprises before billing does.

Comparing models

Pick the right model for your use case. Cheapest that passes your evals usually wins.

Cost optimization

Estimate caching impact, identify the cheapest viable model, find the breakeven for a smaller tier.

Pricing transparency

Show stakeholders concrete numbers for "what does AI cost us?" — no vendor pitch deck required.

Step by step

How to estimate LLM cost

Paste a representative prompt

Use something real from your app, not a toy example. Token count scales with input length.

Set expected output tokens

How long is the model's reply? 500 tokens is a typical chat response; classification might be 5; agents can run thousands.

Set calls per day

Conservative is fine. Multiply by users × actions/user/day. The yearly figure is usually the eye-opener.

Read totals

Per call, per day, per month, per year. Toggle caching to see the discount impact.

FAQ

Frequently asked questions

How accurate is the token estimate?

This calculator uses a rough heuristic of ~4 characters per token. Real tokenizers vary: code is denser, languages other than English are sparser, and each provider has its own tokenizer. Expect ±20% accuracy. For exact counts use the model-specific token counters.

Where does the pricing come from?

From each provider's public pricing page as of 2026. Rates are subject to change; always verify on the provider site before relying on these numbers for procurement decisions.

Is it free?

Yes. No signup, no limits, no ads, no data leaves your browser.

What about caching?

Toggle "Input is cached" to apply the model's published cached-input rate, read from the pricing JSON shipped with this page. Rates differ by provider (Anthropic ~90% off, OpenAI ~50–87% off depending on model, Google ~75% off, DeepSeek ~90% off).

Why does my actual bill differ?

A few reasons. (1) Token counts on the provider may differ slightly from our ~4-char estimate. (2) System prompts, tool definitions, and retrieved context all count toward input. (3) Provider rates may have changed since this tool was last updated. (4) Some providers add surcharges for premium tiers, regional endpoints, or long-context overflow. Use this for ballpark estimates, not for final accounting.

About

About Claude Sonnet 4.6

Sonnet 4.6 is the model most teams settle on. At $3 / $15 per 1M it's one-fifth the price of Opus while delivering the bulk of its capability for code, chat, RAG, and agentic workflows. Anthropic positions it as the default tier and most of the ecosystem treats it that way.

Where Sonnet shines

Customer-facing chat agents, coding assistants, RAG over docs, structured extraction, most tool-use workflows. The 200K context comfortably handles long conversations and substantial retrieved context.

When to step up to Opus

If Sonnet repeatedly fails the same hard prompt — multi-step reasoning, subtle code refactors, nuanced instruction following — Opus is the next stop. Cheaper to pay 5× per call than to ship a wrong answer.

When to step down to Haiku

High-volume classifications, routing decisions, simple summarization, anything where latency and unit economics matter more than ceiling capability.