I'm developing a general-purpose Recommender System but I need to evaluate its results for example with Precision, Recall and Unexpectedness. So, i would like to know if there is a testbed or a gold standard technique that allows me to compare the results of my RS.