PDF OCR: extract text or make scanned PDFs searchable
Free in-browser PDF OCR. Extract plain text from any scan, or rebuild your PDF as a searchable PDF with an invisible text layer over the original page image. 100+ languages via Tesseract.js running fully on your device — no upload, no signup.
Drop a PDF to OCR
Or click to choose. One file at a time. Files stay on your device.
Choose PDFWhen to OCR a PDF
Scanned documents
Turn flatbed scans, photographed forms or fax PDFs into selectable, searchable text.
Archive search
Make a folder of legacy PDFs searchable by indexing the OCR text in your tool of choice.
Receipts & invoices
Extract totals, dates and vendor info from photographed receipts into a CSV or accounting app.
Study notes
OCR textbook scans so you can search and quote-highlight without retyping.
How to OCR a PDF online
Open the PDF
Drag it in or click Choose. The page count and size appear in the file card.
Pick language & DPI
Use the main language of the document. 200 DPI is a good balance of speed and accuracy.
Choose output
Plain text for clipboard / spreadsheet use, or Searchable PDF to keep the original look and add a text layer.
Run OCR & download
Tesseract.js processes each page in your browser. Progress shows page-by-page; download when done.
PDF OCR — frequently asked questions
Drop the PDF in, pick the language the document uses, choose whether you want plain text or a searchable PDF, then click Run OCR. Each page is rendered at 200 DPI and processed by Tesseract.js in your browser — nothing is uploaded.
A searchable PDF keeps the visible page image (so it looks exactly like the original scan) and adds an invisible text layer on top. You can select, copy and search the text in any PDF reader. This tool produces standard searchable PDFs that work in Adobe Acrobat, macOS Preview, Foxit and the built-in viewers in Chrome / Edge / Firefox.
No. Tesseract.js runs as a WebAssembly worker in your browser. PDF.js handles the rendering. The PDF and the extracted text never leave your device.
100+ languages including English, Spanish, French, German, Italian, Portuguese, Russian, Chinese (simplified and traditional), Japanese, Korean, Arabic, Hebrew, Hindi and Vietnamese. Each language pack is ~3–10 MB and is downloaded on demand the first time you use it. Subsequent runs are cached.
Tesseract is most accurate on clean, high-contrast, well-aligned text. Scanned documents typically achieve 95–99% accuracy. Phone photos of receipts or handwritten notes are harder and may need manual cleanup.
5–15 seconds per page for English on a modern laptop. First run downloads the ~10 MB language pack (one-time, cached after). Larger languages like Chinese take longer.
About PDF OCR
OCR (optical character recognition) turns an image of text into actual text characters a computer can read, search and edit. Most modern PDFs contain real text, but scans, photos and faxes contain only pixel images of text — that's where OCR helps.
How this tool runs OCR
- Render.
PDF.jsrasterises every requested page into a canvas at the chosen DPI. - Recognise. A
Tesseract.jsWebAssembly worker is created for the selected language pack, then run against each canvas. It returns text plus per-word bounding boxes and confidence scores. - Assemble. For plain text output, we concatenate page results. For searchable PDF, we embed each page's rendered image with
pdf-liband overlay the OCR words as invisible (rendering-mode 3) text positioned by their bounding boxes — selectable but not visible.
Tips for better accuracy
- Use the language that matches your document, not just the menu's English.
- Set DPI to 300 if accuracy matters more than speed.
- Crooked or low-contrast scans can be improved first with our Image Cropper or Image Compressor tools.