Hello everyone and thank you in advance for you help!

I'm building a screening tool with a machine learning algorithm. The model provides a probabilistic prediction (i.e. logistic regression, decision tree) and a threshold is used to predict one of the two classes for each case.

I'm using a Leave-Pair-Out-Cross-Validation strategy to choose the best set of model hyperparameters (I have less than 50 training cases so it is computation feasible and it’s known to be a better strategy that LOOCV).

As it will serve as a screening tool, I’m actually not interested to indentify the model with the best AUC but actually to find the model that, given a sensitivity of 1, shows the best specificity.

 

Considering the cross-validation strategy that I’m using, I planned:

  • for each hyperparameter combination, to identify the best threshold (that is the one that provides the best achievable specificity keeping a sensitivity of 1) employing ALL hold-outs altogether to calculate sensitivity and specificity (every case will be repeated several times considering the LPOCV)
  • to choose the hyperparameter combination, together with its best threshold, that provided the highest specificity
  • Is it a correct strategy? Any better one available?

    Thank you!

    More Massimiliano Grassi's questions See All
    Similar questions and discussions