I need to evaluate a pure content-based recommender system for documents extraction (It may also be seen as a search engine) that gets top N results based on a similarity score. I know there are some metrics like HR@k, accuracy@k, NDCG@k, CTR, etc. However, if I understand right, all those metrics require a pre-evaluation from expert coders, rating score for documents (e.g., scale from 1 to 5) or click pattern from users.

This content-based recommender systems have no user (yet) to rate/click on query results, and I can not understand how the expert coders can provide ratings for every document against every possible query.

Are there any means to evaluate such types of content-based recommender systems?

Similar questions and discussions