Multiple Regression
Right now I'm using a training partition to do [partial] correlation analysis.
Then once done I split the data up into folds and do backwards step cross validation and using mape to prune factors.
My question is specific to correlation analysis.
If I do correlation analysis on the training partition (to do factor reduction), and then use that same partition to do cross validation on. Am I not implementing a type of overfitting?
Should I use a separate partition (for correlation analysis) than the one I do cross validation on?
I can't seem to find any examples in my texts (applied regression modelling as well as Data Mining for Business Analytics). All the examples I see online do cross validation on the training partition, but they do correlation analysis on the entire dataset (including test partition).