I have a large scientific literature dataset (>20000), where references have been inserted in a very rough format (single text string, with authors, year, title, etc.).
We need to convert them into a clean format (BibTeX or similar). Do you know of any automatic system to help us with this work? There are several systems extracting info from the title or from a pdf (e.g. Mendeley) but I haven't found anything working on reference strings