Hello,

I am working with database of facial expressions that has imbalanced data. For example there are four times more examples of expression of "happiness" then expression of "disgust". 

I am using libsvm library to learn model. When I train SVM on imbalanced dataset I get accuracy of 45%. But when I artificially balanced the data by copy pasting expressions that are under sampled, I get an accuracy of 80%. 

Now my questions are:

1. Is this way of balancing the data acceptable in scientific community? 

2. Should I report both the accuracies or just the best one? 

3. How to explain this phenomenon?

Thank you in advance.  

Similar questions and discussions