Prompt Cleaner
Clean LLM prompts before sending. Strips zero-width characters, smart quotes, non-breaking spaces, BOMs, and redundant whitespace — the invisible junk that wastes tokens and confuses tokenizers.
Noisy in, clean out
A pasted prompt usually carries invisible junk: zero-width joiners, smart quotes, NBSPs, double-spaces. Strip them and the same prompt becomes shorter and more predictable.
Hello world This is "smart" quote 'test'. Extra space.
Hello world This is "smart" quote 'test'. Extra space.
What you'll use this for
Anywhere a prompt leaves a Google Doc, Notion page, Slack message, or PDF and lands in your LLM — clean it first.
Trim noisy paste
Strip the artifacts that come along when you paste into GPT, Claude, or any LLM playground.
Fix smart-quote chaos
Curly quotes break code snippets in prompts. Normalize them to ASCII in one click.
Reduce token cost
Trimmed whitespace and stripped zero-width characters mean fewer billed tokens per call.
Debug weird tokenization
If your tokenizer is producing more tokens than expected, hidden characters are usually why.
How to clean a prompt
Paste the prompt
Drop it into the left editor. The cleaner runs entirely locally — nothing is uploaded.
Pick options
Defaults strip zero-width, normalize quotes, replace NBSP, and collapse whitespace. Loosen any toggle you don't want.
Click Clean
Or leave auto-clean on for live updates.
Copy and send
Paste the cleaned prompt into your LLM. Check the estimated tokens saved in the stats bar.
Frequently asked questions
Codepoints like U+200B (zero-width space), U+200C/U+200D (zero-width joiners), and U+FEFF (BOM) render as nothing but still consume tokens. They sneak in from PDFs, Word docs, and copy-paste across apps.
Curly quotes (U+2018, U+2019, U+201C, U+201D) often tokenize differently from straight ASCII quotes. Normalizing them gives more predictable token counts and avoids breaking code-style prompts.
Yes. No signup, no limits, no ads. Runs entirely in your browser.
Yes. If you paste content saved as UTF-8 with BOM, the leading U+FEFF is removed when the Strip BOM toggle is on.
As a rough approximation: removed characters divided by four. Real savings depend on the tokenizer, but this gives a useful order-of-magnitude estimate. For exact counts, use the LLM Token Counter.
About prompt cleaning
LLM tokenizers see every character, including the ones you can't. A pasted prompt is rarely just text — it's text plus zero-width spaces, smart quotes, non-breaking spaces, BOMs, and stray double-spaces left over from rich-text formatting. All of these consume tokens and can confuse the model.
What this tool removes
- Zero-width characters — U+200B, U+200C, U+200D, U+FEFF.
- Smart quotes — normalized to ASCII
'and". - Non-breaking spaces — U+00A0 collapsed to regular space.
- Redundant whitespace — repeated spaces, trailing line whitespace, excessive blank lines.
When not to clean
- Code samples with deliberate indentation — turn off Trim each line.
- ASCII art or fixed-width tables — turn off Collapse spaces.