Avoid OCR when digital extraction is enough.
Check whether the PDF already has text
- Use when
- Use this path before OCR when the PDF may contain selectable text, generated text, or a mixed digital/scanned page set.
- Avoid when
- Do not treat a text-layer signal as a perfect layout guarantee. Tables, columns, and reading order still need review.