Text Cleaner
Clean text by removing non-printable characters, zero-width spaces, smart-quote replacements, BOMs, and other invisible junk that breaks parsers and stylesheets.
Invisible junk in, clean text out
Pasting text from Word, Outlook, or a PDF often introduces zero-width characters, smart quotes, and non-breaking spaces. This tool strips them.
Hello world test Smart “quotes” and ‘more’ Normal text.
Hello world test Smart "quotes" and 'more' Normal text.
What you'll use this for
Invisible characters break parsers, search, copy-paste, and CI pipelines. Clean them out before anything else.
Pasted Word/Outlook text
Word documents are full of smart quotes, NBSPs, and curly dashes. Clean them in one click.
Copy-paste from PDFs
PDF text extraction often introduces zero-width characters and ligatures that confuse downstream tools.
Web scraping
Scraped HTML strings carry NBSPs, BOMs, and stray control characters. Strip them before storing.
CSV cleanup
Smart quotes in a CSV cell will break a strict parser. Convert them to straight quotes first.
How to clean text
Paste your text
Drop the messy text into the left editor. Any size, any language.
Pick what to strip
Sensible defaults cover most copy-paste mishaps. Toggle dashes if you also want em/en dashes flattened.
Click Clean
Or leave auto-clean on for live updates. Runs locally — no upload.
Copy or download
Copy to clipboard or save as cleaned.txt. Follow up with trim-whitespace for fully tidy text.
Frequently asked questions
Invisible Unicode characters that take no visual space: zero-width space (U+200B), zero-width non-joiner (U+200C), zero-width joiner (U+200D), and BOM (U+FEFF). They often sneak in via word processors or copy-paste from messaging apps and break parsers, regex matching, and downstream comparison.
JSON, CSV, YAML, and most programming languages require ASCII straight quotes (" and '). Smart quotes (“”‘’) look almost identical but are different Unicode code points and will cause parse errors, syntax errors, or silent mismatches.
Yes. No signup, no limits, no ads. Runs entirely in your browser.
Byte Order Mark — a single character (U+FEFF) at the start of a text file that signals the encoding. UTF-8 BOMs are common from Windows tools and confuse Unix utilities that expect a clean file start.
No. Cleaning is lossy — once a smart quote is converted to a straight quote, the original code point is gone. Keep your original if you need to undo.
About cleaning text
Text from word processors, web pages, and PDFs is rarely as clean as it looks. It carries invisible Unicode characters — zero-width spaces, joiners, BOMs, control characters — and visible-but-non-ASCII replacements like smart quotes, em dashes, and non-breaking spaces. This tool strips or normalizes all of them.
What each option does
- Strip zero-width — removes U+200B, U+200C, U+200D, U+FEFF anywhere in the string.
- Strip control chars — removes U+0000–U+001F and U+007F, but preserves
\nand\t. - Smart → straight quotes — flattens
“”‘’to"and'. - Smart dashes → regular — converts en (
–) and em (—) dashes to-. - NBSP → space — replaces U+00A0 (non-breaking space) with a regular space.
- Strip BOM — drops a leading U+FEFF byte order mark.
Recommended flow
- Clean first to remove invisible junk.
- Then run trim-whitespace to tidy spaces and line endings.