Multiple Regression

Right now I'm using a training partition to do [partial] correlation analysis.

Then once done I split the data up into folds and do backwards step cross validation and using mape to prune factors.

My question is specific to correlation analysis.

If I do correlation analysis on the training partition (to do factor reduction), and then use that same partition to do cross validation on. Am I not implementing a type of overfitting?

Should I use a separate partition (for correlation analysis) than the one I do cross validation on?

I can't seem to find any examples in my texts (applied regression modelling as well as Data Mining for Business Analytics). All the examples I see online do cross validation on the training partition, but they do correlation analysis on the entire dataset (including test partition).

More Joshua Laferriere's questions See All
Similar questions and discussions