Hello everyone,
If I have 10k imbalanced tweet collection and will be assigned in 4 classes. Class A consists of 4k data, class B consists of 1.5k data, class C consists of 2k data and class D consists of 2.5k data.
For my research, I need each classes to have balanced data. What do you think the best method how to calculate how many data should be in each classes?
I think about combining proportionate stratified random sampling method with quota sampling method but I still have no idea.
Thank you,
Diky