python pdf line by line

Go To StackoverFlow.com

0

How can I get the content of pdf file line by line in python? I have searched in stackoverflow but could not find any good answer. Notes: pyPdf gives assertion erro, if possible something with slate and pdfminer.

2012-04-03 21:17
by user873286


0

from the command line:python /path/to/pdf2txt.py -o text.txt /path/to/yourpdf.pdf

You can then just take the text file it makes and use for line in file:

If you want to be efficient you would have to change pdf2txt.py, and have outfp be a python iostring, which would avoid the making a file and then reading from it.

2012-04-04 02:10
by apple16