I need to find a way to deal with all these variables in a stepdown logistic regression model, to be able in the end to generate a predictive model.
The additional problem is that some of these variables are highly rank-correlated (spearman: variables do not distribute normally).
For the reasons above, I have tried first to group variables according to their eigenvectors resulting from a covariance Principal Components Analysis, associating them to one or more principal components (I kept the first 15 components).
I then carried out several stepdown logistic regressions, on for each "group" of variables... actually, if the regressions held, I put together variables from more groups.
I finally kept only the variables resulting from the stepdown procedures and put them in a final model (again a stepdown logistic regression).
Do you think it works this way? Any suggestion?
The outcome: cases are IBDs (Crohn and Ulcerative Colitis), controls are both surgical controls and gastrointestinal controls. The variables are VOCs (volatile organic compounds, molecules identified mainly through their molecular weight).