🎉 All tools are free forever — no sign-up, no watermarks, no limits

All Articles
PDF Guide

How to Extract Text from Scanned PDFs Using OCR

Scanned PDFs store text as images — you cannot select, search, or copy it. OCR technology converts that image-text back into real, editable characters. Here is how.

5 min read

To extract text from a scanned PDF, you need OCR — Optical Character Recognition. When you scan a paper document to PDF, the result is a photograph stored inside a PDF container. The text is not real text; it is just pixels shaped like letters. You cannot select it, search it, or copy it. OCR analyses that image and converts it into actual editable text, completely free, in your browser.

How to Extract Text from a Scanned PDF — Step by Step

  1. 1Open the OCR PDF tool and upload your scanned PDF.
  2. 2Select the language of the document from the dropdown — this significantly improves recognition accuracy.
  3. 3Click "Extract Text." The tool renders each page and runs OCR on it.
  4. 4Review the extracted text displayed on screen. Look for common OCR errors: 0/O, 1/l, rn/m substitutions.
  5. 5Click "Copy" to paste the text elsewhere, or "Download" to save it as a .txt file.

How OCR Works

OCR software analyses a scanned image pixel by pixel. It identifies shapes that correspond to characters, compares them against a database of known letter forms, and outputs the most probable text sequence. Modern OCR tools — including the Tesseract.js engine used in this tool — use neural networks trained on millions of document images for high accuracy.

The OCR PDF tool renders each PDF page as a high-resolution image, then runs Tesseract.js on each page. Everything happens locally in your browser — your scanned PDF is never uploaded to any server.

What Affects OCR Accuracy

The quality of the extracted text depends heavily on the quality of the original scan:

  • Resolution: 200 DPI is the minimum for good results; 300 DPI is recommended. Below 150 DPI, characters are too small to recognise reliably.
  • Contrast: text should be dark on a light background. Faded, coloured, or low-contrast pages significantly reduce accuracy.
  • Page alignment: a page tilted even 5–10 degrees reduces accuracy noticeably. Use your scanner's auto-straighten feature if available.
  • Font style: clean serif and sans-serif fonts recognise very accurately. Decorative fonts, handwriting, and script fonts are more difficult.
  • Page condition: crumpled, torn, or water-damaged pages produce more errors.

Tip: If your scan quality is poor, increase the contrast to near-maximum in an image editor before running OCR. This single step can dramatically improve results on faded or washed-out documents.

OCR vs. PDF to Word — Which Should You Use?

Both tools extract text from scanned PDFs, but they serve different purposes:

  • OCR PDF → extracts raw text only. Fast, simple. Use when you need to copy specific text, numbers, or names from a scanned page.
  • PDF to Word → extracts text and attempts to preserve document structure (headings, paragraphs, tables). Use when you need a fully formatted, editable document.
  • For quick copying: OCR PDF is faster. For full document editing: PDF to Word gives better structure.

Supported Languages

The OCR PDF tool supports English, Spanish, French, German, and Simplified Chinese. Always select the correct language from the dropdown before extracting — using the wrong language model significantly degrades accuracy.

Tips for Better OCR Results

  • Scan at 300 DPI minimum — this is the most impactful single improvement you can make.
  • Scan in grayscale or black-and-white for text documents (colour scans are larger and sometimes lower contrast).
  • Flatten the document before scanning — curved or bent pages at the edges of a book reduce edge accuracy.
  • After extraction, use Find & Replace in Word to check for common substitutions: search for "1" (numeral) to find "l" (letter), and vice versa.
  • For multi-column documents, OCR may merge columns incorrectly. Manually check the reading order of extracted text.

Try it now — free

No registration, no file uploads to external servers, 100% private.

Use Ocr Pdf Tool →

Frequently Asked Questions

How do I extract text from a scanned PDF for free?

Open the OCR PDF tool, upload your scanned PDF, select the document language, and click "Extract Text." The text is extracted and displayed instantly. Download as a .txt file or copy directly. It is completely free.

Why can't I select text in my PDF?

If you cannot select text in a PDF, it is a scanned PDF — the pages are images, not searchable text. OCR converts those images back into selectable, editable text.

How accurate is OCR on scanned documents?

On clean, high-resolution scans (300 DPI, good contrast), accuracy is typically 95–99% for standard fonts. On poor-quality scans, accuracy drops. Always proofread extracted text, especially numbers, names, and technical terms.

Which languages does the OCR tool support?

The tool supports English, Spanish, French, German, and Simplified Chinese. Select the correct language from the dropdown for best results.

Is my scanned PDF uploaded to a server for OCR?

No. OCR processing runs entirely in your browser using Tesseract.js and PDF.js. Your scanned document never leaves your device.

Related tools

Related Articles

← Back to all articles