I need to find a way to deal with all these variables in a stepdown logistic regression model, to be able in the end to generate a predictive model.

The additional problem is that some of these variables are highly rank-correlated (spearman: variables do not distribute normally).

For the reasons above, I have tried first to group variables according to their eigenvectors resulting from a covariance Principal Components Analysis, associating them to one or more principal components (I kept the first 15 components).

I then carried out several stepdown logistic regressions, on for each "group" of variables... actually, if the regressions held, I put together variables from more groups.

I finally kept only the variables resulting from the stepdown procedures and put them in a final model (again a stepdown logistic regression).

Do you think it works this way? Any suggestion?

The outcome: cases are IBDs (Crohn and Ulcerative Colitis), controls are both surgical controls and gastrointestinal controls. The variables are VOCs (volatile organic compounds, molecules identified mainly through their molecular weight).

More Lorenzo Monasta's questions See All
Similar questions and discussions