I am trying to impute a continuous variable prior to running logistic regression and having trouble with it. Multiple imputation of the continuous variable works perfectly, however, when I categorize it, after imputation but prior to adding to the logistic model, somehow my results change. It appears to be significant loss of power for some reason such that my highly significant variable falls out of significance. For instance, when trying to impute blood sugar values and then running a logistic model against a binary outcome, the parameter estimates are very comparable for all covariates including blood sugar with or without imputed data. However, when I categorize the blood sugar into 4 categories in the imputed dataset and then run it, the variable falls out of significance. If I run the logistic model excluding patients with missing value (pre-imputation) with blood sugar categories, I get very plausible, logical results (comparable to results from an entirely different dataset). Is it wrong to categorize the imputed variable in the imputed dataset?

More Shveta S Motwani's questions See All
Similar questions and discussions