My aim is to obtain valuable info from Prospectus (document that describes a financial security). I.e., I need to build a metadata repository about financial securities by extracting info from documents that describe them.
If you want to start with Information Extraction, I recommend you to take a look at "The Wiley's Handbook of Computational Linguistics and Natural Language Processing - Chapter 18: Information Extraction".
Forman: financial securities (financial instruments to be traded) are defined in digital documents by organizations. the documents have different structures but similar content since they specify a financial instrument. I need to extract info from those documents.