PDF 2 txt with OCR

When a PDF file includes only a picture with text you cannot paste or use the text. You can extract the text from the image the following way:

  • convert -normalize -density 300 foo.pdf foo.png
  • tesseract foo.png foo.txt -l eng

You need to install the packages imagemagick, tesseract-ocr and tesseract-ocr-eng (on Debian: aptitude install imagemagick tesseract-ocr tesseract-ocr-eng).