HCODX/Text Cleaner
100% browser-based · Non-printable · Smart quotes · BOMs

Text Cleaner

Clean text by removing non-printable characters, zero-width spaces, smart-quote replacements, BOMs, and other invisible junk that breaks parsers and stylesheets.

Plain text
Cleaned text
Clean options
Trim Whitespace
Input size
0 B
Output size
0 B
Removed
Status
Ready
Example

Invisible junk in, clean text out

Pasting text from Word, Outlook, or a PDF often introduces zero-width characters, smart quotes, and non-breaking spaces. This tool strips them.

Input (with invisible junk)
Hello​ world test
Smart “quotes” and ‘more’
Normal text.
Cleaned
Hello world test
Smart "quotes" and 'more'
Normal text.
Use cases

What you'll use this for

Invisible characters break parsers, search, copy-paste, and CI pipelines. Clean them out before anything else.

Pasted Word/Outlook text

Word documents are full of smart quotes, NBSPs, and curly dashes. Clean them in one click.

Copy-paste from PDFs

PDF text extraction often introduces zero-width characters and ligatures that confuse downstream tools.

Web scraping

Scraped HTML strings carry NBSPs, BOMs, and stray control characters. Strip them before storing.

CSV cleanup

Smart quotes in a CSV cell will break a strict parser. Convert them to straight quotes first.

Step by step

How to clean text

1

Paste your text

Drop the messy text into the left editor. Any size, any language.

2

Pick what to strip

Sensible defaults cover most copy-paste mishaps. Toggle dashes if you also want em/en dashes flattened.

3

Click Clean

Or leave auto-clean on for live updates. Runs locally — no upload.

4

Copy or download

Copy to clipboard or save as cleaned.txt. Follow up with trim-whitespace for fully tidy text.

FAQ

Frequently asked questions

Invisible Unicode characters that take no visual space: zero-width space (U+200B), zero-width non-joiner (U+200C), zero-width joiner (U+200D), and BOM (U+FEFF). They often sneak in via word processors or copy-paste from messaging apps and break parsers, regex matching, and downstream comparison.

JSON, CSV, YAML, and most programming languages require ASCII straight quotes (" and '). Smart quotes (“”‘’) look almost identical but are different Unicode code points and will cause parse errors, syntax errors, or silent mismatches.

Yes. No signup, no limits, no ads. Runs entirely in your browser.

Byte Order Mark — a single character (U+FEFF) at the start of a text file that signals the encoding. UTF-8 BOMs are common from Windows tools and confuse Unix utilities that expect a clean file start.

No. Cleaning is lossy — once a smart quote is converted to a straight quote, the original code point is gone. Keep your original if you need to undo.

About

About cleaning text

Text from word processors, web pages, and PDFs is rarely as clean as it looks. It carries invisible Unicode characters — zero-width spaces, joiners, BOMs, control characters — and visible-but-non-ASCII replacements like smart quotes, em dashes, and non-breaking spaces. This tool strips or normalizes all of them.

What each option does

  • Strip zero-width — removes U+200B, U+200C, U+200D, U+FEFF anywhere in the string.
  • Strip control chars — removes U+0000–U+001F and U+007F, but preserves \n and \t.
  • Smart → straight quotes — flattens “”‘’ to " and '.
  • Smart dashes → regular — converts en () and em () dashes to -.
  • NBSP → space — replaces U+00A0 (non-breaking space) with a regular space.
  • Strip BOM — drops a leading U+FEFF byte order mark.

Recommended flow

  • Clean first to remove invisible junk.
  • Then run trim-whitespace to tidy spaces and line endings.
Related

Related tools