02 September 2020 0 5K Report

I developed an information retrieval system for my master thesis, and I need to evaluate my term weighting function. My function takes into consideration paragraphs of the document when performing calculations. So, to evaluate it, I need a publicly available dataset with relevance judgments (query-document pairs) where each document consists of multiple paragraphs, and not just one single paragraph as in "Reuters" dataset for example, and there should be a ground truth for the document retrieval task, so I can compare my system's results to it.

Can you suggest to me such a dataset?

In the image is a sample where document structure meets my needs.

More Nina Saabiyeh's questions See All
Similar questions and discussions