Take the example of predicting a disease. Let’s say that only only 10% of the instances in your dataset have the actual disease. This means that you could get 90% accuracy by simply predicting the negative class all the time. But, how useful is this? Not very useful, as you wouldn’t have predicted a single instance of the actual disease! This is where the F1-score can be very helpful. In this example, the recall for the positive class would be 0, and hence the F1-score would also be 0.
When building and optimizing any supervised learning model, measuring how accurately it can classify data is crucial, especially when the developer must choose between two or more algorithms. It is an easy question to ask but rather a problematic dilemma to answer what algorithm should be chosen if one of the used algorithms performs better on one class and the other on the other class. In most cases, obtaining high classification accuracy results gives a misleading indication of the model's classification ability, especially when dealing with imbalanced datasets available in real life. Overcoming this problem is particularly important in applications where misclassifying instances from the minority class is more costly. Hence, the ability to evaluate classification models independently of the size of datasets and the distribution of data on their classes is pivotal to selecting the most appropriate model to employ. It should be remarked that the selection of the most appropriate evaluation metric varies and is based on many factors, such as, the size of the dataset, the distribution of data on classes, and which class is more important to the end-user and thus to the developer. The Matthews Correlation Coefficient(MCC) is regarded as being the most informative single score to determine the quality of a binary classifier prediction in a confusion matrix context. The MCC is a correlation coefficient between the observed and predicted binary classifications. It returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 represents a random prediction and −1 indicates total disagreement between prediction and observation. Despite the superiority of MCC compared with other metrics, it has a limitation of making arbitrary assumptions to overcome the divide-by-zero problem. Moreover, it is unable to assess the class-based performance of classification algorithms. We introduced a performance evaluation framework based on a new evaluation metric we name "Multidimensional Classification Assessment Score (MCAS)". MCAS is used to evaluate the performance of learning algorithms by measuring how good is the classification algorithm in the presence of errors. This evaluation metric overcomes the limitations of the existing ones as it works independently regardless of the size of the datasets and the distribution of samples in its classes. The MCAS is a score that measures how efficient is the classification algorithm for dealing with binary classification problems in the presence of errors, as introduced in the following subsections. for further details, take a look at https://dspace.library.uvic.ca/bitstream/handle/1828/13219/Elhaddad_Mohamed_PhD_2021.pdf?sequence=3&isAllowed=y
Article One-class ensemble classifier for data imbalance problems
Here are some articles, you can have a good read about the performance metrics. It is often seen that relying on any one metric is not a healthy practice while dealing with imbalanced datasets. You should also keep in mind the degree of imbalance that your dataset has i.e the imbalance ratio.