Hello,
I am working with database of facial expressions that has imbalanced data. For example there are four times more examples of expression of "happiness" then expression of "disgust".
I am using libsvm library to learn model. When I train SVM on imbalanced dataset I get accuracy of 45%. But when I artificially balanced the data by copy pasting expressions that are under sampled, I get an accuracy of 80%.
Now my questions are:
1. Is this way of balancing the data acceptable in scientific community?
2. Should I report both the accuracies or just the best one?
3. How to explain this phenomenon?
Thank you in advance.