I am working with an extremely unbalanced dataset with a total of 44 samples for my research project. It is a binary classification problem with 3/44 samples of the minority class for which I am using Leave One Out Cross Validation. If I perform SMOTE oversampling of the entire dataset prior to LOOCV loop, both prediction accuracy and AUC for ROC curves are close to 90% and 0.9 respectively. However, if I oversample only the training set inside the LOOCV loop, which happens to be a more logical approach, AUC for ROC curves falls as low as 0.3
I also tried precision-recall curves and stratified k-fold cross validation but faced a similar distinction in results from oversampling outside and inside the loop. Please suggest me what is the right place to oversample and also explain the distinction if possible.