In evaluating recommender system with binary rating data, which other evaluation metrics can one use aside F-1 (Precision and Recall) measure for better accuracy?
The Matthews Correlation Coefficient (MCC) is a useful metric as it takes into account all portions of the confusion matrix at the same time (unlike e.g. F1, precision, recall, AUC, etc.). With MCC, the bounds are -1 to +1, with random performance at a value of 0.
I would strongly recommend Cohen's Kappa as it takes into account the possibility of the agreement occurring by chance. MCC may be even better. I find it is important to replicate the measures and cross-validation methods used by prior work in your field so that your work can easily be placed in context.