i want to build a retrieval system for research papers in computer science domain. CSO domain ontology will be used for weighting purposes.

While processing a document, I want to extract expressions that match Ontology concepts (e.g. "information retrieval system" ), index those expressions, and weight them using the ontology. It's essential to weight the expression as a whole, not each single word separately.

The index should also include expressions that partially match ontology concepts, (e.g. "retrieval system" ) , because they are also important and will be weighted using ontology.

Terms that don't (fully or partially) match ontology concepts should also be indexed and weighted in a classical way (e.g. TF-IDF).

Queries will be processed in the same way to extract expressions.

How can i do such indexing? should i treat each multi-word expression as a single word and add an entry for it in the inverted index?

And how to do matching between query and documents?

More Nina Saabiyeh's questions See All
Similar questions and discussions