I will write in breif the pitfalls/advantages of each of the ways to handle class imbalance. I am looking for recommendations on how to improve each approach or whether there are any recent developments to manage this issue.

  • Resampling
  • I believe this is the most common method in the literature, but, there are many reports on the disadvantages especially with SMOTE. Random undersampling results in the loss of valuable data.

    2. Class weighting

    I have seen some good results with this method from my own experience.

    3. Boosting

    Certain algorithms (XGBoost, EasyEnsemble...) perform well.

    It would be appreciated if more can be added to this discussion.

    Similar questions and discussions