I am working with an extremely unbalanced dataset with a total of 44 samples for my research project. It is a binary classification problem with 3/44 samples of the minority class for which I am using Leave One Out Cross Validation. If I perform SMOTE oversampling of the entire dataset prior to LOOCV loop, both prediction accuracy and AUC for ROC curves are close to 90% and 0.9 respectively. However, if I oversample only the training set inside the LOOCV loop, which happens to be a more logical approach, AUC for ROC curves falls as low as 0.3

I also tried precision-recall curves and stratified k-fold cross validation but faced a similar distinction in results from oversampling outside and inside the loop. Please suggest me what is the right place to oversample and also explain the distinction if possible.

Similar questions and discussions