I have to calculate the CI of the AUC (Roc) for a series of classifiers (e.g. Lasso, Random Forest, SVM) learned using the same test dataset, in order to identify the best model for this problem (prediction of a dichotomous variable).
Considering the small size of the dataset, I used the less known but almost unbiased Leave-Pair-Out-Cross-Validation (Airola et al.,2010). In brief, you learn and validate the model holding-out all possible combinations made of one case of one class and another case of the other class. Thus the (several) folds of the cross-validation procedure overlaps, with each case re-used in different validation folds.
The AUC is calculated according the Wilcoxon statistic (i.e as the average of all folds results, considering 1 if the p(C1) > p(C2) and 0 otherwise) as indicated in the papers where LPOCV was proposed. However I couldn't find in the literature any formal procedure to calculate the CI.
I thus decided to use a bootstrap procedure but I'm not sure that the design of the resampling procedure I applied is the correct one given the structure of LPOCV.
I independently resampled (with repetition) the cases belonging to each of the two classes. Then I considered the folds according to the combination of two resamples, calculating the AUC as before. I performed these several times to obtain a distribution of bootstrapped AUC.
Is this correct? are there better resampling scheme to apply in this case?
Thank you!
http://www.jmlr.org/proceedings/papers/v8/airola10a/airola10a.pdf