i want to build a retrieval system for research papers in computer science domain. CSO domain ontology will be used for weighting purposes.
While processing a document, I want to extract expressions that match Ontology concepts (e.g. "information retrieval system" ), index those expressions, and weight them using the ontology. It's essential to weight the expression as a whole, not each single word separately.
The index should also include expressions that partially match ontology concepts, (e.g. "retrieval system" ) , because they are also important and will be weighted using ontology.
Terms that don't (fully or partially) match ontology concepts should also be indexed and weighted in a classical way (e.g. TF-IDF).
Queries will be processed in the same way to extract expressions.
How can i do such indexing? should i treat each multi-word expression as a single word and add an entry for it in the inverted index?
And how to do matching between query and documents?