PDFTOHTML is a very good Opensource tool for converting PDF files to HTML so it can then be parsed with other tools. It can be found here: http://pdftohtml.sourceforge.net/
A problem that you will find is processing some if not all double column articles. The specific problem that I am solving is automatically converting the PDF to a text file and I need to have access to the code to connect it to the rest of the application. I am currently programming my own since I gave up on alternatives. PDF TOTEXT will solve all other single column documents.