Dear Madam/Sir,
I am in the process of developing predictive models from a dataset that comprises more than 50 explanatory variables and 1 explained variable.
The study involves descriptive, exploratory and predictive analysis of the data.
For the predictive analysis, I plan to perform regression. But due to the nature of the data, I am trying a couple of different modeling techniques, for instance: OLS, PCR and PLS.
Hence I would like to ask the following questions:
1. How can I find the one modeling technique that is more appropriate to my data set?
2. Should my choice of model be made 'a priori', based on the structure and characteristics of the data set? Or should it be 'a posteriori', i.e., based on the quality of the obtained models?
3. In case there is not one single best modeling technique, would it be appropriate to present, in the manuscript for submission, models obtained from more than one modeling technique? Or would this be characterized as "cherry picking"?
Thank you in advance for your time reading this and answering.
Kind regards,
Luciano