What is the best evaluation metrics for classifier trained on unbalanced data?

More Omar Elzeki's questions See All

What is the best practice for resuspending cell pellets during competent cell preparation (both electro and comp)?

I'm fairly new to making homemade comp cells and have done so with some mild success. What is generally an acceptable way of resuspending comp cells during wash steps? I've been told to swirl the...

31 July 2024 8,309 0 View

What are the current challenges and future prospects of integrating artificial intelligence into recognition systems for autonomous vehicles?

This question aims to explore the intersection of artificial intelligence and autonomous vehicle technology. It seeks to identify the key challenges faced in implementing AI for recognition...

20 July 2024 3,469 2 View

How can artificial intelligence be leveraged to enhance the accuracy and efficiency of climate change impact predictions on agricultural productivity?

This question seeks to explore the role of artificial intelligence in improving the prediction models for climate change impacts on agriculture. It aims to understand how AI technologies, such as...

20 July 2024 7,647 3 View

Quelles applications innovantes peuvent bénéficier des propriétés des géopolymère?

Les géopolymères offrent plusieurs propriétés intéressantes qui peuvent être exploitées dans diverses applications innovantes : 1. Construction durable : - Bétons à faible empreinte carbone -...

10 July 2024 7,457 1 View

What are the latest advances in the use of geopolymers for nuclear waste storage?

Geopolymers have gained attention for their potential in immobilizing radioactive waste. Recent studies indicate that they offer a promising alternative to traditional cement binders. Geopolymers,...

06 July 2024 9,797 0 View

Why i can't get access to isosurface in fluent?

hello everyone i get into fuent and i select the result icone then i try to create an isosurface but the button is gray so i can't choose it. what's the problem? any suggestion.

03 July 2024 5,418 4 View

How to calculate the proportions in grams or milliliters of the compounds for 60 mL of SAPO-11 zeolite precursor gel?

It is required to make a precursor gel for a SAPO-11 Zeolite, the compounds are the following: * As a source of Al: Aluminum Isopropoxide (98%) Formula: C9 H21 O3 Al PM: 204.24 g/mol Density:...

01 July 2024 617 3 View

What are the recommended rapid response scopus indexed journals in computer science?

Kindly provide their URLs. Thank you

30 June 2024 8,475 0 View

Q3- or - Q4 journal (soups &wos) fast publish (computer science) ?

Q3- or - Q4 journal (SCOPUS Indexed Journals &WOS) fast publish (computer science) ?

30 May 2024 5,330 2 View

I am looking for a simple method to synthesize magnetic biochar from plant parts. Is there any?

30 May 2024 9,400 2 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

How combine yolo with Faster R-CNN?

I want a model that is balanced with accuracy or speed, faster rcnn has high accuracy while yolo have fast speed. i am thinking to combine them to get a hybrid model to achieve both speed and accuracy

02 August 2024 3,104 0 View

How can AI-driven personalized learning balance individual instruction with fostering social and collaborative skills?

This question tackles the complex issue of ensuring that while AI can tailor educational experiences to individual needs, it doesn't neglect the equally important aspect of fostering social...

02 August 2024 4,751 3 View

Which file formats are accepted for supplementary material?

I have a dataset consisting of json files. i tried to upload a zip or tar of it but the system tells me that the file format is not accepted... br

25 July 2024 1,316 3 View

Dataset of synchronized cardiac angiography and ECG?

Hello, I'm working on medical project and I would need synchronized angiography with ECG? Does anyone know if some open source dataset of this kind exist? Regards, Bruno

25 July 2024 2,214 2 View

Is a reliability test necessary in my survey on translations?

Dear all, I gave 116 respondents 18 translated sentences and asked them to indicate their levels of acceptance of these translations on a five-point scale. Some translations result from strategies...

24 July 2024 8,245 5 View

How do we pick data for determination of Validation Acceptance Criteria?

Hello, colleagues! There is commenting open for new upcoming edition of USP 1033. Validation target acceptance criteria is now different from what it used to be and it doesn't include Cpm....

23 July 2024 7,292 3 View

How to Select the most suitable machine learning algorithm depending on the characteristics of the given dataset ?

I'm working on a project that involves analyzing a new dataset, and I'm at the stage of selecting the most appropriate machine learning algorithm. The dataset consists of both numerical and...

22 July 2024 6,097 7 View

How to use evolutionary algorithms with real parameters in ryu sdn controller with large scale?

Hi, I wanna to implement evolutionary algorithms in ryu sdn controller in mininet, i have some challenges, how i can run the big scale topo with one sdn contoller??? and another question is to...

21 July 2024 246 2 View

What it is, that we do not know about the possible Social Impact of the AI on Society. Education, Human Computer Interaction and the Global Economy?

As with most changes in life, there will be positive and negative impacts on society as artificial intelligence continues to transform the world we live in. How that will balance out is anyone’s...

20 July 2024 10,062 3 View

Andrey Davydenko

I recommend Brier score based on leave-one-out cross-validation. But more details about the dataset are needed in order to better understand your concerns.

Omar Elzeki

Andrey Davydenko Thanks alot

I am talking about UNSW intrusion dataset which have millions of records [Normal] and a hundred thouthands of records [anomal]

Ferdib Al Islam

You can check this article - https://machinelearningmastery.com/tour-of-evaluation-metrics-for-imbalanced-classification/

If there are millions of records, the leave-one-out approach may not be feasible, but it depends on the settings you have.

I prefer Brier score since it lets you evaluate the quality of uncertainty estimates. It should be suitable for both types of the records you mentioned. And the link provided by Ferdib Al Islam is great, I think it describes the Brier score very well.

Let me also recommend this work:

Chapter Forecast Evaluation Techniques for I4.0 Systems

It describes relevant data formats to store your predictions and also presents error metrics for numeric predictions.

Alaa Alhowaide

I suggest you use both the PR curve and ROC-AUC.

Mohammed Elmogy

There are many performance metrics that can be used to evaluate unbalanced datasets, such as sensitivity, specificity, false-positive rate, false-negative rate, geometrical mean, positive likelihood, Diagnostic Odds Ratio (DOR), Discriminant Power (DP), and YI.

Shaker El-Sappagh

It's better to solve the imbalance issue before evaluating to create more reasonable results. Try to use SMOTE techniques.

Mohamed A. Kassem

Try sensitivity, specificity, false positive, and negative rate.

And the best solution is to solve the imbalance between classes for reliable results.

Ahmed M. Anter

Please check the following

The F1_score represents a more balanced view compared to the ACC, Prec., and Recall metrics but could give a biased result because it doesn’t include True Negatives (TN). Recently, several scientists highlighted drawbacks of the F1_score measure [1-3], and in fact, Hand and Peter [4] claim that alternative measures should be used instead of F1_score.

On the other hand, MCC considers all four entries of the confusion matrix (FP,FN,TP,TN). MCC is not affected by the unbalanced data, and MCC is preferred over F1_score as it is a more ‘balanced’ assessment of classifiers, no matter which class is positive. MCC produces a more informative and truthful score in evaluation than accuracy and F1_score.

[1] Chicco, D., Totsch, N., & Jurman, G. (2021). The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData mining, 14(1), 1-22.

[2]Naulaerts, S., Dang, C. C., & Ballester, P. J. (2017). Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours. Oncotarget, 8(57), 97025.

[3] Chicco, D., & Jurman, G. (2020). The Advantages Of The Matthews Correlation Coefficient (Mcc) Over F1 Score And Accuracy In Binary Classification Evaluation. Bmc Genomics, 21(1), 1-13.

[4] Hand D, Christen P. A note on using the F-measure for evaluating record linkage algorithms. Stat Comput. 2018; 28:539–47.

Surendrabikram Thapa

Accuracy won't hold well for unbalanced data. The metrics like Sensitivity-Specificity Metrics and precision-recall metrics can however give a good overview of performance. Similarly, True Positive rates and False Positive Rates along with ROC AUC or G-mean can also provide a good overview of performance.

Al Amin Biswas

Try to handle the class imbalance problem first. You may use synthetic minority over-sampling technique on the training sample. You may get useful information regarding the model's performance evaluation for imbalance data by visiting the following link: https://machinelearningmastery.com/tour-of-evaluation-metrics-for-imbalanced-classification/

Md Shoaib Ahmed

Omar Elzeki, It depends on the LEVEL of your imbalance dataset. For a modest imbalance (say 4:7) dataset, the Fawlker-Mallows index could be performed better. And for the drastic imbalance dataset (say 1: 900), the Jacob index could be better. Sometimes, Jacob performs less compare to the Fawlker in drastic imbalance dataset.

I think that you should apply both evaluation metrics and see the performance according to your dataset.

Nicola Procopio

Omar Elzeki in addition to everything already recommended, there is also balanced accuracy

https://statisticaloddsandends.wordpress.com/2020/01/23/what-is-balanced-accuracy/#:~:text=Balanced%20accuracy%20is%20a%20metric,the%20presence%20of%20a%20disease.

Emerson Nithiyaraj

For unbalanced data, try Matthews correlation coefficient (MCC).

Check https://en.wikipedia.org/wiki/Matthews_correlation_coefficient.

Ashwani Kumar

AUC

Francis Jeomoan Kurian

A better metric for unbalanced data would be K-S statistics that tells you the separation power of the model.