In case of balanced classes, what is the best metric to evaluate a supervised binary classifier that predicts if a tweet will be relevant or not to a user: MCC (Matthews correlation coefficient) or F1-Score (F-Measure)?
It depends on problem domain. There exists several off-the-shelf metrics, e.g. accuracy, precision, recall and so on. Each of these metrics indicates different aspects. So, the best metric cannot be generalized. It depends what the point of interest is based on problem domain. However, the mostly used metric is the accuracy for evaluating a supervised binary classifier with balanced classes.
If your classes are balanced, why not use the most intuitive measure, Accuracy? If you are not sure, that your classifier chose the best decision boundary you should go for AUC, which is equivalent to the probability that your classifier assigns the higher score to the relevant tweets compared to the irrelevant tweets. Be careful with any other metrics, because they usually don't have a clear meaning and as soon as you get into the regime of unbalanced classes, most metrics are misleading, especially the F-Measure should not be used at all (proof in the attached paper).
in the case of the balanced dataset, precision, recall, and F1 are the good measurement. If your dataset is unbalanced, you would better use ROC curve or macro/micro precision
This paper introduces a detailed explanation with numerical examples many classification assessment methods or classification measures such as:
Accuracy, sensitivity, specificity, ROC curve, Precision-Recall curve, AUC score and many other metrics. In this paper, many details about the ROC curve, PR curve, and Detection Error Trade-off (DET) curve. Moreover, many details about some measures which are suitable for imbalanced data are explained.
It depends on problem domain. There exists several off-the-shelf metrics, e.g. accuracy, precision, recall and so on. Each of these metrics indicates different aspects. So, the best metric cannot be generalized. It depends what the point of interest is based on problem domain. However, the mostly used metric is the accuracy for evaluating a supervised binary classifier with balanced classes.