I have a PDF which contains some scanned documents and screenshots, whose contents I want to be able to search for. Is there software for Linux that can apply OCR to images in a PDF and make them searchable in a PDF reader?
I have a PDF which contains some scanned documents and screenshots, whose contents I want to be able to search for. Is there software for Linux that can apply OCR to images in a PDF and make them searchable in a PDF reader?
There are options for several cases:
--remove-vectors EXPERIMENTAL. Mask out any vector objects in the PDF so that they will not be included in OCR. This can eliminate false characters. -f, --force-ocr Rasterize any text or vector objects on each page, apply OCR, and save the rastered output (this rewrites the PDF) -s, --skip-text Skip OCR on any pages that already contain text, but include the page in final output; useful for PDFs that contain a mix of images, text pages, and/or previously OCRed pages --redo-ocr Attempt to detect and remove the hidden OCR layer from files that were previously OCRed with OCRmyPDF or another program. Apply OCR to text found in raster images. Existing visible text objects will not be changed. If there is no existing OCR, OCR will be added.
Thanks!