I have a data set with quite a lot of missing values, more in some variables than others. I performed bivariate analyses and I want to select variables for multivariate analysis. I understand that some suggest including all variables with a p value of < 0.25 in the bivariate analysis to go into the multivariate analysis, and then performing backward elimination. I'm using binary logistic regression (the outcome variable is binary).
My problem is that I have one variable that is a strong independent predictor, but there are many missing values for this variable; the number of cases included in the analysis would almost double should I omit this variable. Should I omit this variable, some other variables would reach statistical significance in the regression.
My questions are:
1. What are your thoughts on whether this variable should go into the regression?
2. What are your thoughts on why those other variables reach statistical significance in the regression? Do you think it's simply because of the larger number of cases included in the analysis?