LLM Cost Comparison
Compare the API cost of every major LLM side-by-side for your exact prompt. Ranked cheapest to most expensive, per call / day / month. No signup, runs in your browser.
Pricing reference (per 1M tokens)
All rates are list pricing as of 2026. Volume discounts and committed-spend tiers not included.
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
| Claude Opus 4.7 Anthropic | $15.00 | $75.00 | 200K |
| Claude Sonnet 4.6 Anthropic | $3.00 | $15.00 | 200K |
| Claude Haiku 4.5 Anthropic | $1.00 | $5.00 | 200K |
| GPT-5 OpenAI | $5.00 | $25.00 | 256K |
| GPT-4o OpenAI | $2.50 | $10.00 | 128K |
| GPT-4o mini OpenAI | $0.15 | $0.60 | 128K |
| Gemini 2.5 Pro Google | $1.25 | $10.00 | 1,000K |
| Llama 3.3 Meta | $0.20 | $0.60 | 128K |
| DeepSeek V3 DeepSeek | $0.27 | $1.10 | 64K |
What you'll use this for
LLM costs scale fast. A quick estimate before you ship saves real money in production.
Budget planning
Project monthly and annual API spend before you ship. Catch surprises before billing does.
Comparing models
Pick the right model for your use case. Cheapest that passes your evals usually wins.
Cost optimization
Estimate caching impact, identify the cheapest viable model, find the breakeven for a smaller tier.
Pricing transparency
Show stakeholders concrete numbers for "what does AI cost us?" — no vendor pitch deck required.
How to estimate LLM cost
Paste a representative prompt
Use something real from your app, not a toy example. Token count scales with input length.
Set expected output tokens
How long is the model's reply? 500 tokens is a typical chat response; classification might be 5; agents can run thousands.
Set calls per day
Conservative is fine. Multiply by users × actions/user/day. The yearly figure is usually the eye-opener.
Read totals
Per call, per day, per month, per year. Toggle caching to see the discount impact.
Frequently asked questions
This calculator uses a rough heuristic of ~4 characters per token. Real tokenizers vary: code is denser, languages other than English are sparser, and each provider has its own tokenizer. Expect ±20% accuracy. For exact counts use the model-specific token counters.
From each provider's public pricing page as of 2026. Rates are subject to change; always verify on the provider site before relying on these numbers for procurement decisions.
Yes. No signup, no limits, no ads, no data leaves your browser.
Toggle "Input is cached" for an approximate 90% input-side discount. The exact discount varies by provider (Anthropic: 90%; OpenAI/Google: 50-75%). This calculator uses the Anthropic figure as a useful upper bound.
A few reasons. (1) Token counts on the provider may differ slightly from our ~4-char estimate. (2) System prompts, tool definitions, and retrieved context all count toward input. (3) Provider rates may have changed since this tool was last updated. (4) Some providers add surcharges for premium tiers, regional endpoints, or long-context overflow. Use this for ballpark estimates, not for final accounting.
About LLM cost comparison
The cheapest model isn't always the right pick — but for many production workloads the cost gap is wider than the quality gap. Comparing per-call pricing across providers surfaces where you're overpaying and where a smaller model would do.
How to use this
Paste a representative prompt — ideally a real one from your application. Set the expected output length. The table sorts cheapest first. If the cheapest model also passes your evals, switch.
Why the same prompt costs different amounts
Each provider tokenizes text differently. A 1,000-character prompt might be 220 tokens on one tokenizer and 260 on another. This calculator estimates uniformly (~4 chars/token) — for exact counts use each provider's tokenizer or our per-model token counters.
What this doesn't include
System prompts (always counted), tool definitions, retrieved RAG context, and conversational history all add to the input cost. Output costs vary with response length — a chat agent generates more tokens than a one-shot classifier.