I understand that we use cross-validation to estimate uncertainty in the model selection phase. Would the technique be used to produce confidence intervals, as well?

I am trying to estimate the plantation area of a crop in an area, using satellite data, and want to choose a model first and then to be sure about the uncertainty in the final model at the end of the season. I finally decided on K-fold cross-validation in a training set [stratifiedly sampled from an area to approximate real situation] for the model selection and bootstrapping method in a test set [separately and randomly sampled from the area] for confidence intervals.

What confuses me is the question of why I don't merge training set with the test set and perform cross-validation to estimate uncertainty. Would cross-validation estimation be biased in that case since we do not sample training set randomly or for other reasons? On the other hand, could it be a more reliable estimation since the sample size is larger if we merge the training set with the test set?

Any resource or suggestion on this topic is highly appreciated, thanks.

Similar questions and discussions