HCODX/AI Cost Calculator
100% browser-based · 9 models · Live calculation

AI Cost Calculator

Paste a prompt and see what it costs to run across every major LLM. Side-by-side per call, per day, per month. Browser-only, no signup, no API key needed.

Prompt text0 tokens
Cost breakdown
Calculator options
3,000calls / month · ~100 / day
Count tokens precisely
Input tokens
0
Output tokens
0
Per call
Status
Ready
Pricing reference

Pricing reference (per 1M tokens)

Provider pricing as of 2026. Verify on each provider's pricing page before relying on these numbers.

ModelInput / 1MOutput / 1MContext
Claude Opus 4.7
Anthropic
$15.00$75.00200K
Claude Sonnet 4.6
Anthropic
$3.00$15.00200K
Claude Haiku 4.5
Anthropic
$1.00$5.00200K
GPT-5
OpenAI
$5.00$25.00256K
GPT-4o
OpenAI
$2.50$10.00128K
GPT-4o mini
OpenAI
$0.15$0.60128K
Gemini 2.5 Pro
Google
$1.25$10.001,000K
Llama 3.3
Meta
$0.20$0.60128K
DeepSeek V3
DeepSeek
$0.27$1.1064K
Use cases

What you'll use this for

LLM costs scale fast. A quick estimate before you ship saves real money in production.

Budget planning

Project monthly and annual API spend before you ship. Catch surprises before billing does.

Comparing models

Pick the right model for your use case. Cheapest that passes your evals usually wins.

Cost optimization

Estimate caching impact, identify the cheapest viable model, find the breakeven for a smaller tier.

Pricing transparency

Show stakeholders concrete numbers for "what does AI cost us?" — no vendor pitch deck required.

Step by step

How to estimate LLM cost

1

Paste a representative prompt

Use something real from your app, not a toy example. Token count scales with input length.

2

Set expected output tokens

How long is the model's reply? 500 tokens is a typical chat response; classification might be 5; agents can run thousands.

3

Set calls per day

Conservative is fine. Multiply by users × actions/user/day. The yearly figure is usually the eye-opener.

4

Read totals

Per call, per day, per month, per year. Toggle caching to see the discount impact.

FAQ

Frequently asked questions

This calculator uses a rough heuristic of ~4 characters per token. Real tokenizers vary: code is denser, languages other than English are sparser, and each provider has its own tokenizer. Expect ±20% accuracy. For exact counts use the model-specific token counters.

From each provider's public pricing page as of 2026. Rates are subject to change; always verify on the provider site before relying on these numbers for procurement decisions.

Yes. No signup, no limits, no ads, no data leaves your browser.

Toggle "Input is cached" for an approximate 90% input-side discount. The exact discount varies by provider (Anthropic: 90%; OpenAI/Google: 50-75%). This calculator uses the Anthropic figure as a useful upper bound.

A few reasons. (1) Token counts on the provider may differ slightly from our ~4-char estimate. (2) System prompts, tool definitions, and retrieved context all count toward input. (3) Provider rates may have changed since this tool was last updated. (4) Some providers add surcharges for premium tiers, regional endpoints, or long-context overflow. Use this for ballpark estimates, not for final accounting.

About

About this AI cost calculator

Most LLM APIs price by tokens — not characters, not words. A token is roughly four characters of English text, but the exact mapping depends on each provider's tokenizer. Input tokens (your prompt) and output tokens (the model's response) are priced separately, and output tokens almost always cost more — typically 3-5× the input rate. That asymmetry matters: a chat with long replies costs far more than a summarizer with one-line outputs.

Why pricing is per million tokens

Providers quote rates per 1,000,000 tokens because typical workloads burn through millions in a day. Dividing by a million keeps the numbers readable: $3.00 / 1M is more intuitive than $0.000003 / token. This calculator uses the same convention.

Input vs output costs

The output rate is higher because generation is more computationally expensive than reading. Every output token requires a full forward pass; input tokens are processed in a single batched pass. For chat agents the output dominates; for classification, the input dominates.

Prompt caching

Anthropic, OpenAI and Google all offer some form of prompt caching — when you re-send the same prefix, the input side is discounted (Anthropic: 90% off; others: 50-75%). Toggle "Input is cached" to model this. It rarely makes sense for one-off prompts; it shines for repeat system prompts, RAG, or coding agents.

Related

Related tools