Context Window Visualizer
Visualize how much of a model's context window your prompt fills. Paste text, pick the model, see an estimate of tokens used and a fill bar.
Paste your prompt to see how much of the context window it uses.
One prompt, every model
The same 50K-token prompt fills 25% of Claude Opus 4.7's window, 39% of GPT-4o's, and just 5% of Gemini 2.5 Pro's. Knowing the percentages up front saves a 429.
[~50,000 tokens of docs + chat history]
Claude Opus 4.7 ▓▓▓░░░░░░░ 25% GPT-4o ▓▓▓▓░░░░░░ 39% Gemini 2.5 Pro ▓░░░░░░░░░ 5%
What you'll use this for
Anywhere a long prompt is at risk of brushing the context limit — RAG, agents, multi-turn chat, document Q&A.
Cost planning
Bigger contexts cost more. Confirm you actually need that 1M window before paying for it.
RAG context budgeting
Reserve space for retrieved chunks + system prompt + response — visualize it before you ship.
Fitting docs into prompt
Will the whole PDF fit, or do you need to chunk? Paste it and find out.
Model selection
Pick the smallest model whose context window comfortably holds your typical request.
How to visualize context fill
Paste your prompt
System + user + retrieved context — paste everything that will go on the wire.
Pick the model
The dropdown lists current limits for major models.
Read the fill bar
Green under 80%, amber 80–100%, red over the limit.
Adjust as needed
Trim the prompt, swap to a larger-window model, or chunk the input.
Frequently asked questions
Heuristic: chars / 4. Usually ±20% versus the real tokenizer. Good for capacity planning, not for billing precision — use the LLM token counter for exact counts.
Tokenizers differ. Same text becomes a different number of tokens for tiktoken (OpenAI), Claude's tokenizer, SentencePiece (Gemini, Llama), and DeepSeek's. The heuristic averages across them.
Yes. No signup, no limits. Everything runs in your browser.
System prompt, user messages, tool definitions, retrieved context — everything you send. Output tokens come out of a separate budget on most providers, but watch out: a few share one pool.
Yes — with auto-update on (default), the bar redraws as you type or paste.
About context windows
A model's context window is the maximum number of tokens it can attend to in a single request — system prompt, history, retrieved docs and the model's own output, all combined.
Window sizes (today)
- 200K — Claude Opus 4.7, Sonnet 4.6, Haiku 4.5.
- 128K — GPT-4o family, Llama 3.3.
- 256K — GPT-5.
- 1M — Gemini 2.5 Pro.
- 64K — DeepSeek V3.
Why fill matters
- Quality often degrades past ~70% fill (the "lost in the middle" effect).
- Latency and cost scale with tokens — big windows aren't free.
- Hard caps trigger 400-class errors when exceeded.