31 May 2022 2 5K Report

I have a logistic regression (LR) model using texture features and I have a very small study population because the disease is rare.

We cannot validate the model by using external test set and no reviewer accepts the results of 10-k cross-validation only.

Thus I need to spare a test set from the study group and have to cross validate the model with the rest. I recently found nested-CV to protect the model from a biased training set. Method has two different folds: External folds to find accuracy and internal bootstraps to tune the model.

I have to find solutions in R. I found three libraries "rsample", "TANDEM" and "tidyverse" focusing on that subject. Nevertheless I cannot make any of them working for my purpose.

I found,

Nested-CV coupled with LASSO to choose best feature(s).

Nested-CV coupled with model studies to find out the best hyperparameter among a limited number of parameters (0.01 , 1 ,10,100, 1000 etc), which are good for algorithms such as SVM

thus, my searches were unsuccessful for my purpose

Can nested-CV help me to find best hyperparameters (Hp) for a three-feature model such as:

x= Intercept + Hp1*Feature1 + Hp2*Feature2 + Hp3*Feature3

In fact I have already calculated one model by using cost function. However I cannot understand whether I can use Nested-CV for tuning a LR model or not.

I can crate external and internal folds by using rsample library:: nested_cv function. However I cannot go further.

l------l------l------l------l------l------l------l------l------l------l------l------l------l------l Whole study population

l------l------l------l------l------l------l------l------l------l------l------l------l Train l------l------l Test

l------l------l------l------Training--l------l------l------l------l l------l------l Crossvalidate (Fold 1)

I can change the internal CV set by using bootstrapping, and I can conduct cross validation 5 times. Then how do I calculate the best hyperparameters for a LR model? A mean of all bootstrap folds? The fold with the best results only?

I can make the calculations by hand for every single fold. For example for 5 outer and 5 inner folds.

The problem is inner folds: How do I calculate the best hyperparameters from inner folds

Sincerely,

Similar questions and discussions