How can I get the content of pdf file line by line in python? I have searched in stackoverflow but could not find any good answer. Notes: pyPdf gives assertion erro, if possible something with slate and pdfminer.
from the command line:
python /path/to/pdf2txt.py -o text.txt /path/to/yourpdf.pdf
You can then just take the text file it makes and use
for line in file:
If you want to be efficient you would have to change pdf2txt.py, and have
outfp be a python iostring, which would avoid the making a file and then reading from it.