I need to know how aspect based sentiment analysis methods have been evaluated. Precision recall and f1 measures are the only measuring terms. However, I have seen Rand Index and I dont know why they use it?
You may need to go a bit wider/ deeper but these are traditional measures. To get an idea about specificity or sensitivity and what are pros and cons for using all the basic measures you can read Sokolova's papers:
As metionned by Mr. Rafal, Recall and Precision are basic measures (information retrieval field and other fields), the most other measures represent combination between recall and precision
There are other recall and precision forms like R-precision which indicates the precision at a recall level, this latter take into account the rank of returned element, which is not the case in sentiment analysis.
Following on from answers above: Classification criteria include Precision, Recall, Accuracy, F1 etc., while clustering criteria include Rand, Jaccard coefficient, Fowlkes–Mallows index and more.
I don't think that standard ML measures such as recall, precision, F1 can adequately reflect classification results in sentiment analysis. For this reason, I studied other measures in the section 5.3 "Interpreting Classification Results in Emotional Corpora" of my PhD thesis Thesis Opinion Mining and Lexical Affect Sensing
, for example, such that consider similarity between sentimental outcomes or costs of misclassification or number of sentimental outcomes.