What is the best metric (precision, recall, f1, and accuracy) to evaluate the machine learning /Deep learning model for imbalanced data? how to explain and present the results in the research paper using accuracy, precision, recall, and F1 ?
In my opinion all elements what you said for evaluation the performance of DL models, its very good to judge robustness of your model, all this what are you said is called rapport of classification. we can in some time plot the Confusion Matrix for visualized the confusion between the classes.
I hope that be Clair for you.@ Ibrahim mohamed Gad
Article Classification Assessment Methods: a detailed tutorial
there is no such thing as an absolute measure of performance ... at the end of day, it all depends on your application ; for instance, you could know the costs for TP, FP, TN and NN : in such case, the expected cost should be your prefered metric ...
Consider performing a pre-processing step to ''balance'' the dataset such that bias in the network performance is eliminated or at least reduced. This can, for example, be done by removing some of the datasets corresponding to examples with higher proportion. I would think it is better to train the network with a small-size balanced data than a large unbalanced data.
you can't rely on accuracy measure when you have imbalanced data because it might be deceiving .
but you should get a high accuracy anyway to prove that your model is working well .
then comes the most important measures : precision , recall and f1score
your model is making a lot of false positive predictions that's why your precision is not hight (that's not a good thing )
your model is not making a lot of false negative predictions that's why your recall is higher (that's the good thing )
f1 score is the harmonic average ( keep in mind it's not a normal average it gives weight to either precision or recall depending on something called beta value )
your f1 score is good because it's automatically gives a little weight to your recall (which is already good ) .
for me i think your precision is not good and that can't be overlooked .
In my opinion all elements what you said for evaluation the performance of DL models, its very good to judge robustness of your model, all this what are you said is called rapport of classification. we can in some time plot the Confusion Matrix for visualized the confusion between the classes.
I hope that be Clair for you.@ Ibrahim mohamed Gad
My recently published paper with the title of “Applying Separately Cost-sensitive Learning and Fisher's Discriminant Analysis to Address the Class Imbalance Problem: A Case Study Involving a Virtual Gas Pipeline SCADA System” and URL of https://www.sciencedirect.com/science/article/pii/S1874548220300214 may be helpful.