Why does it vary by model?

Each provider uses a different tokenizer. Code, JSON and non-English text tokenize differently.

100% browser-based · Token estimate · Model-specific limits

Context Window Visualizer

Visualize how much of a model's context window your prompt fills. Paste text, pick the model, see an estimate of tokens used and a fill bar.

Your prompt

Context window

0 / 0 tokens (0%)

Paste your prompt to see how much of the context window it uses.

Model options

Model

Auto-updateRe-run on every input change

LLM token counter

Tokens used

Model limit

Fill %

Status

Ready

Example

One prompt, every model

The same 50K-token prompt fills 25% of Claude Opus 4.7's window, 39% of GPT-4o's, and just 5% of Gemini 2.5 Pro's. Knowing the percentages up front saves a 429.

Your prompt

[~50,000 tokens of docs + chat history]

Fill

Claude Opus 4.7  ▓▓▓░░░░░░░  25%
GPT-4o           ▓▓▓▓░░░░░░  39%
Gemini 2.5 Pro   ▓░░░░░░░░░   5%

Use cases

What you'll use this for

Anywhere a long prompt is at risk of brushing the context limit — RAG, agents, multi-turn chat, document Q&A.

Cost planning

Bigger contexts cost more. Confirm you actually need that 1M window before paying for it.

RAG context budgeting

Reserve space for retrieved chunks + system prompt + response — visualize it before you ship.

Fitting docs into prompt

Will the whole PDF fit, or do you need to chunk? Paste it and find out.

Model selection

Pick the smallest model whose context window comfortably holds your typical request.

Step by step

How to visualize context fill

Paste your prompt

System + user + retrieved context — paste everything that will go on the wire.

Pick the model

The dropdown lists current limits for major models.

Read the fill bar

Green under 80%, amber 80–100%, red over the limit.

Adjust as needed

Trim the prompt, swap to a larger-window model, or chunk the input.

FAQ

Frequently asked questions

How accurate is the token estimate?

Heuristic: chars / 4. Usually ±20% versus the real tokenizer. Good for capacity planning, not for billing precision — use the LLM token counter for exact counts.

Why does the estimate vary by model?

Tokenizers differ. Same text becomes a different number of tokens for tiktoken (OpenAI), Claude's tokenizer, SentencePiece (Gemini, Llama), and DeepSeek's. The heuristic averages across them.

Is it free?

Yes. No signup, no limits. Everything runs in your browser.

What counts toward the limit?

System prompt, user messages, tool definitions, retrieved context — everything you send. Output tokens come out of a separate budget on most providers, but watch out: a few share one pool.

Does it update in real time?

Yes — with auto-update on (default), the bar redraws as you type or paste.

About

About context windows

A model's context window is the maximum number of tokens it can attend to in a single request — system prompt, history, retrieved docs and the model's own output, all combined.

Window sizes (today)

200K — Claude Opus 4.7, Sonnet 4.6, Haiku 4.5.
128K — GPT-4o family, Llama 3.3.
256K — GPT-5.
1M — Gemini 2.5 Pro.
64K — DeepSeek V3.

Why fill matters

Quality often degrades past ~70% fill (the "lost in the middle" effect).
Latency and cost scale with tokens — big windows aren't free.
Hard caps trigger 400-class errors when exceeded.