Text Similarity
Compute similarity between two texts. Levenshtein ratio (character-level edit distance), Jaccard (shared word set), and cosine (word-vector overlap). All three runs locally.
Two texts, three scores
Each metric measures a different kind of similarity. Use the one that matches your question — typos? content overlap? weighted vocabulary?
The quick brown fox jumps over the lazy dog.
A quick brown dog runs over the lazy fox.
What you'll use this for
Wherever you need a quantitative answer to "how close are these two strings?"
Dedupe similar entries
Find near-duplicates in a list of titles, snippets, or descriptions.
Plagiarism detection
Spot suspiciously similar essays or code comments.
Prompt drift tracking
Detect when an LLM prompt has been edited away from its original version.
Version compare
Quick numeric answer to "how different is v2 from v1?"
How to compare two texts
Paste two texts
Text A on the left, Text B on the right.
Read three scores
Levenshtein (char-level), Jaccard (word-set), and cosine (word-frequency vector). Higher = more similar.
Pick the metric that fits
Typo-level: Levenshtein. Bag-of-words overlap: Jaccard. Weighted vocabulary: cosine.
Iterate
Edit either text and watch the scores update live.
Frequently asked questions
It depends. Levenshtein is best for typo-level differences — it counts the edit operations. Jaccard captures content overlap regardless of word order or repetition. Cosine weights repeated words and works well for longer texts where vocabulary balance matters.
Yes. Runs entirely in your browser, no signup.
Jaccard and cosine tokenize with a basic word regex (\w+) and lowercase before comparing. Levenshtein operates on raw characters — case-sensitive, whitespace-sensitive.
Jaccard and cosine are case-insensitive. Levenshtein is case-sensitive — lowercase both inputs first if you want a case-insensitive edit distance.
Similarity is symmetric — A↔B equals B↔A. There's no "reverse" because both texts are first-class inputs. Use the swap button to flip them.
About text similarity
"Text similarity" isn't one thing — it's a family of measures, each appropriate for a different question.
Levenshtein ratio
- Operates on raw characters. Counts insertions, deletions, and substitutions needed to transform A into B.
- Reported as
1 - (edits / max-length), so 100% = identical, 0% = nothing in common. - Best for short strings (titles, IDs, file names, single sentences).
Jaccard similarity
- Tokenizes both texts into word sets, then computes
|A∩B| / |A∪B|. - Word order and repetition don't matter — only whether a word is present.
- Best for content overlap on medium-length texts (paragraphs, descriptions).
Cosine similarity
- Tokenizes both texts into term-frequency vectors, then computes the cosine of the angle between them.
- Repeated words matter — texts with similar vocabulary balance score higher.
- Best for longer texts where vocabulary distribution is meaningful.