I want to setup a search engine performance system to automatic estimation.
Say we have a dataset with more than 30,000 documents, its hard to find relevance data for some queries.
which measure is better for this task?
MAP, nDCG, bpref?
Some says bpref is better, but paper[1] says it have average performance. Can somebody help what is a better measure which can be setup with less efforts.
[1] https://pdfs.semanticscholar.org/c02b/3ec648307a759604c4374b162e9c02a58667.pdf