Hello,
I am running a logistic regression, random forest, and support vector machine models to predict whether a loan will default. My data is highly imbalanced, which I have addressed using resampling techniques (Random Over Sampler). I am getting unusually high ROC-AUC (very close to 1) scores that do not match the results of similar studies. I have already addressed variables with high correlations. I am at a slight loss of what to try next, and was looking for some guidance.
Update: I removed some variables and there were significant improvements in the Logistic Regression and LinearSVC models. However, the Random Forest is still returning scores equal to 1 for all metrics (accuracy, precision, recall, AUC, and F1). Any suggestions to address the Random Forest?
Thanks!