Processing…

HCODX/PDF OCR
Local-only · Tesseract.js WebAssembly · 100+ languages

PDF OCR: extract text or make scanned PDFs searchable

Free in-browser PDF OCR. Extract plain text from any scan, or rebuild your PDF as a searchable PDF with an invisible text layer over the original page image. 100+ languages via Tesseract.js running fully on your device — no upload, no signup.

Drop a PDF to OCR

Or click to choose. One file at a time. Files stay on your device.

Choose PDF
No PDF loaded yet — drop one above.
OCR options
Language First use downloads a ~10 MB pack; cached after.
Render DPI Higher DPI improves accuracy but uses more memory.
Pages Leave blank for all. Use 1, 3, 5-9.
Progress
idle
Extracted text
Use cases

When to OCR a PDF

Scanned documents

Turn flatbed scans, photographed forms or fax PDFs into selectable, searchable text.

Archive search

Make a folder of legacy PDFs searchable by indexing the OCR text in your tool of choice.

Receipts & invoices

Extract totals, dates and vendor info from photographed receipts into a CSV or accounting app.

Study notes

OCR textbook scans so you can search and quote-highlight without retyping.

Step by step

How to OCR a PDF online

1

Open the PDF

Drag it in or click Choose. The page count and size appear in the file card.

2

Pick language & DPI

Use the main language of the document. 200 DPI is a good balance of speed and accuracy.

3

Choose output

Plain text for clipboard / spreadsheet use, or Searchable PDF to keep the original look and add a text layer.

4

Run OCR & download

Tesseract.js processes each page in your browser. Progress shows page-by-page; download when done.

FAQ

PDF OCR — frequently asked questions

Drop the PDF in, pick the language the document uses, choose whether you want plain text or a searchable PDF, then click Run OCR. Each page is rendered at 200 DPI and processed by Tesseract.js in your browser — nothing is uploaded.

A searchable PDF keeps the visible page image (so it looks exactly like the original scan) and adds an invisible text layer on top. You can select, copy and search the text in any PDF reader. This tool produces standard searchable PDFs that work in Adobe Acrobat, macOS Preview, Foxit and the built-in viewers in Chrome / Edge / Firefox.

No. Tesseract.js runs as a WebAssembly worker in your browser. PDF.js handles the rendering. The PDF and the extracted text never leave your device.

100+ languages including English, Spanish, French, German, Italian, Portuguese, Russian, Chinese (simplified and traditional), Japanese, Korean, Arabic, Hebrew, Hindi and Vietnamese. Each language pack is ~3–10 MB and is downloaded on demand the first time you use it. Subsequent runs are cached.

Tesseract is most accurate on clean, high-contrast, well-aligned text. Scanned documents typically achieve 95–99% accuracy. Phone photos of receipts or handwritten notes are harder and may need manual cleanup.

5–15 seconds per page for English on a modern laptop. First run downloads the ~10 MB language pack (one-time, cached after). Larger languages like Chinese take longer.

About

About PDF OCR

OCR (optical character recognition) turns an image of text into actual text characters a computer can read, search and edit. Most modern PDFs contain real text, but scans, photos and faxes contain only pixel images of text — that's where OCR helps.

How this tool runs OCR

  1. Render. PDF.js rasterises every requested page into a canvas at the chosen DPI.
  2. Recognise. A Tesseract.js WebAssembly worker is created for the selected language pack, then run against each canvas. It returns text plus per-word bounding boxes and confidence scores.
  3. Assemble. For plain text output, we concatenate page results. For searchable PDF, we embed each page's rendered image with pdf-lib and overlay the OCR words as invisible (rendering-mode 3) text positioned by their bounding boxes — selectable but not visible.

Tips for better accuracy

  • Use the language that matches your document, not just the menu's English.
  • Set DPI to 300 if accuracy matters more than speed.
  • Crooked or low-contrast scans can be improved first with our Image Cropper or Image Compressor tools.
Related

Related tools