Friday, March 4, 2016

Parsing PDFs

I am trying to code up a PDF parser in order to parse and sort my huge directory of academic and conference papers. I tried most of the PDF reference manager and organizer(maybe a forthcoming blog post review), but none of them were intuitive, did exactly what I wanted, and did not move or create new files.

I wanted to write a parser in .NET C#, here are my preliminary search results:
  • iTextSharp - GOOD
  • Restrictive license - still the best option
  • PDFSharp - FAIL
  • Failed to parse newer/most PDF - Work around with iTextSharp
      
Do you have a open-source PDF parser you can recommend?
Please comment below, thanks!