Multiple Imputation (in R with mice). Is multicollinearity a problem in the imputation model?

Xenis Benevolenskaya @Xenis-Benevolenskaya

24 June 2020 0 2K Report

I am planning to impute missing data in various variables, and am currently planning my imputation model. I wonder which auxiliary variables, besides the variables of the analysis model, I should include in the imputation model.

I have read that auxiliary variables should be correlated with missing variables (recommendation is r > 0.4) (https://stats.idre.ucla.edu/stata/seminars/mi_in_stata_pt1_new/), but if I understand Stef va Buuren correctly, multicollinearity could also be a problem.

"For datasets containing hundreds or thousands of variables, using all predictors may not be feasible (because of multicollinearity and computational problems) to include all these variables. It is also not necessary. In my experience, the increase in explained variance in linear regression is typically negligible after the best, say, 15 variables have been included."(https://stefvanbuuren.name/fimd/sec-modelform.html)

How strongly should the variables in the imputation model correlate at most with the missing variables and other auxiliary variables? And how do I generally find suitable auxiliary variables for my imputation model?

Badges
Science topic

More Xenis Benevolenskaya's questions See All

Lavaan: Do model test statistics matter in a model with only observed variables? fixed.x=TRUE or FALSE ?

Hello everyone, due to missing data I decided to use lavaan with missings='fiml' for my mediation and moderation analyses. I analyse only observed variables, no latent variables. I was wondering...

17 September 2020 5,434 3 View

How can I test the assumption for regressions after multiple imputation (MICE)?

I would like to test the assumptions for a regression analysis. Since I have missing values, I multiply imputed them with the mice function in R. I thought it might make more sense to test the...

26 August 2020 1,514 4 View

R: Mediation Analysis in combination with Multiple Imputation (MICE)?

I am conducting a mediation analayses in R with incomplete data for my master's thesis. For the missings I use MICE multiple imputation according to van Buuren & Groothuis-Oudshoorn (2011)....

23 August 2020 7,350 1 View

R: mediation analysis in combination with MICE multiple imputation

I am conducting a mediation analayses in R with incomplete data for my master's thesis. For the missings I use MICE multiple imputation according to van Buuren & Groothuis-Oudshoorn (2011)....

01 January 1970 3,372 2 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Request a single Lecture notes for math as detailed as this that I can find in one place?

- The Existence/Uniqueness of Solutions to Higher Order Linear Differential Equations - Higher Order Homogenous Differential Equations - Wronskian Determinants of $n$ Functions - Wronskian...

03 August 2024 2,366 0 View

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

When we conduct linear regression, there are several assumptions. The assumption of normality is whether the residual errors are normally distributed, not whether a predictor is normal?

31 July 2024 6,164 3 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

SAS Generalized Linear Model for trial/event anaysis and not survival (time to event) analysis?

I am looking for a published article using SAS or SPSS Generalized linear model for trial/event data and not survival analysis. Both software packages off the option for the number of success out...

30 July 2024 3,835 2 View

Is it redundant to use both Random Forest and Decision Tree algorithms in the same regression project?

I am currently working on a regression model for a project and considering using both Random Forest and Decision Tree algorithms. Given that Random Forest is essentially an ensemble of Decision...

23 July 2024 4,306 3 View

If in a panel data, T>N then which model is appropriate ?

In my data set, T is greater than N, so I chose quantile regression for my data set. So is it appropriate for that?

15 July 2024 6,416 4 View

What are the problems we face when we directly inverse a multivariate regression equation?

Why direct inversion of mutivariate regression equation is not preferred and instead optimization techniques are used?

15 July 2024 8,642 3 View