I'm trying to run a multivariable logistic regression model in R (backward selection), where a lot of my variables are dichotomous (e.g. diabetes = yes/no). An issue for my data is that I have 34 patients and would want to test approximately 25 variables (there are issues with this if I'm not mistaken). Alternatively, I could reduce the number of variables I want to test by performing a pre-selection of variables for the multivariable model based on univariable analysis, but that'll also lead to some bias. Or I could also just not run a multivariable model, because sample numbers are too small, but where's the fun in that?

When running the model in R (especially when computing the confidence intervals), I get the following warning:

glm.fit: fitted probabilities numerically 0 or 1 occurred

which led me to this discussion:

http://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression

I've now applied a Firth's correction to the regression model (using the logistf-package), and my output occurs without warnings, however its completely different to my original model. Further, I'm not completely sure, this is what is necessary for my data. I also ran a bayesianglm() using the arm package, and here too, the stats look completely different.

I'm pretty new to stats, so any explanations on how to deal with these kind of problems would probably have to be formulated as simply as possible. In other words, I'm not too strong in math and think I might have gotten myself up shi**s creek with this whole issue.

Regardless of that last comment, if anyone has any ideas or feedback, that would be great.

In reference to this discussion http://andrewgelman.com/2011/05/04/whassup_with_gl/

it may also simply be a "bug" in the glm() function???

Similar questions and discussions