I would like to explore methods for re-ranking result sets retrieved using a term-based query against a database of bibliographic records. I believe that this additional layer of processing could improve a user's information-seeking experience by helping them to more find easily find articles relevant to their need.
An alternative implementation is to exclude records from the result set which, although contain the search term, fail to meet other criteria.
In either case, am looking for existing literature which could help me identify a suitable method of analysis for comparing one set of ranked results to another. I have found studies in which a subject matter expert codes each individual record returned in a result set as relevant or not, in order to compute precision and recall. This may be one strategy, but I am not sure if this alone will really be able to describe and express the differences between two result sets, or the differences in how they are ranked (at least for some arbitrary number of results returned-- it could become unfeasible for a human to evaluate thousands of results, for example.) I am also considering the value of a mixed method approach, in which I integrate more qualitative assessments of user satisfaction with what they feel to be the quality of results retrieved.
I would appreciate any suggestions for literature or methods to consider for this type of research. Thank you!