Hello everyone,

If I have 10k imbalanced tweet collection and will be assigned in 4 classes. Class A consists of 4k data, class B consists of 1.5k data, class C consists of 2k data and class D consists of 2.5k data.

For my research, I need each classes to have balanced data. What do you think the best method how to calculate how many data should be in each classes?

I think about combining proportionate stratified random sampling method with quota sampling method but I still have no idea.

Thank you,

Diky

Similar questions and discussions