HCODX/Context Window Visualizer
100% browser-based · Token estimate · Model-specific limits

Context Window Visualizer

Visualize how much of a model's context window your prompt fills. Paste text, pick the model, see an estimate of tokens used and a fill bar.

Your prompt
Context window
0 / 0 tokens (0%)

Paste your prompt to see how much of the context window it uses.

Model options
Tokens used
0
Model limit
0
Fill %
0%
Status
Ready
Example

One prompt, every model

The same 50K-token prompt fills 25% of Claude Opus 4.7's window, 39% of GPT-4o's, and just 5% of Gemini 2.5 Pro's. Knowing the percentages up front saves a 429.

Your prompt
[~50,000 tokens of docs + chat history]
Fill
Claude Opus 4.7  ▓▓▓░░░░░░░  25%
GPT-4o           ▓▓▓▓░░░░░░  39%
Gemini 2.5 Pro   ▓░░░░░░░░░   5%
Use cases

What you'll use this for

Anywhere a long prompt is at risk of brushing the context limit — RAG, agents, multi-turn chat, document Q&A.

Cost planning

Bigger contexts cost more. Confirm you actually need that 1M window before paying for it.

RAG context budgeting

Reserve space for retrieved chunks + system prompt + response — visualize it before you ship.

Fitting docs into prompt

Will the whole PDF fit, or do you need to chunk? Paste it and find out.

Model selection

Pick the smallest model whose context window comfortably holds your typical request.

Step by step

How to visualize context fill

1

Paste your prompt

System + user + retrieved context — paste everything that will go on the wire.

2

Pick the model

The dropdown lists current limits for major models.

3

Read the fill bar

Green under 80%, amber 80–100%, red over the limit.

4

Adjust as needed

Trim the prompt, swap to a larger-window model, or chunk the input.

FAQ

Frequently asked questions

Heuristic: chars / 4. Usually ±20% versus the real tokenizer. Good for capacity planning, not for billing precision — use the LLM token counter for exact counts.

Tokenizers differ. Same text becomes a different number of tokens for tiktoken (OpenAI), Claude's tokenizer, SentencePiece (Gemini, Llama), and DeepSeek's. The heuristic averages across them.

Yes. No signup, no limits. Everything runs in your browser.

System prompt, user messages, tool definitions, retrieved context — everything you send. Output tokens come out of a separate budget on most providers, but watch out: a few share one pool.

Yes — with auto-update on (default), the bar redraws as you type or paste.

About

About context windows

A model's context window is the maximum number of tokens it can attend to in a single request — system prompt, history, retrieved docs and the model's own output, all combined.

Window sizes (today)

  • 200K — Claude Opus 4.7, Sonnet 4.6, Haiku 4.5.
  • 128K — GPT-4o family, Llama 3.3.
  • 256K — GPT-5.
  • 1M — Gemini 2.5 Pro.
  • 64K — DeepSeek V3.

Why fill matters

  • Quality often degrades past ~70% fill (the "lost in the middle" effect).
  • Latency and cost scale with tokens — big windows aren't free.
  • Hard caps trigger 400-class errors when exceeded.
Related

Related tools