I have a dataset with some data missing completely at random (most < 10% missing). So far, so good. I am planning on computing regression models. However, the variables in my regression model, and also in the entire dataset, are, at best, only weakly correlated (if correlated at all). The total N of my sample is 95. I am using SPSS 27.

As far as I know, multiple imputation is a regression-based technique. So I am wondering: does it make any sense to impute data? I read that one can use 'auxiliary' variables for imputation from the entire dataset (usually people would, for instance, use items from a questionnaire with complete data to impute the missings; but I don't have that luxury). In my case, the only auxiliary variables that I could use from my dataset are correlated around .20 - .30 at best; and measure different constructs and also were often measured at different time points than the data I would need to impute.

Additional question: Is median- or mean-based imputation for covariates (control variables) okay in that case, where I really cannot/ should not impute based on regression due to lack of correlation in my dataset and missingness < 10% ?!?

I find these issues are really under-reported in publications that use imputation techniques.

More Leonie Ascone's questions See All
Similar questions and discussions