What about complete() function of the mice package in R?

27 March 2023 2 1K Report

Hi everyone,

I would like your opinion on using the complete() function of the mice package on R.

Theoretically, after multiple imputations, analyses should be performed on each imputed dataset, and then the results of the analyses should be pooled (see attached diagram).

However, I consider the complete() function. In summary, it permits generating a unique final dataset using the results of multiple imputations previously performed with the mice() function. This strategy is easy, "inexpensive," and allows us to manipulate only one dataset.

This is a concrete example of the usefulness of this strategy. I am conducting mixed-methods research in which I want to interview some participants after analyzing their responses to my survey. If my respondent John Doe did not answer to an item of a scale, I would risk having 5 plausible answers from John Doe after multiple imputations (if m=5, or 20 plausible responses if m=20, etc.). However, the complete() function will summarize the different estimates into one dataset (instead of 5, or 20, etc.). Basically, during an interview, I will be able to question John Doe based on his scale score computed with NA replacement. So I lose precision, but gain in ability to exploit the answers.

However, this approach seems problematic, as the literature does not support it well. In fact, except for this paper by van Buuren et al. (2011, cf. section 5.2), I cannot find any source that supports this approach:

van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. https://doi.org/10.18637/jss.v045.i03

Well, I'm stuck between a rock (a more rigorous approach, i.e. the pooling) and a hard place (a more practical approach, i.e. the complete() function). What do you think?

Hope to read you (and my apologies for my broken English)

Jochen Wilhelm

Doing muliple imputations and pooling their results will give you more more robust, less noisy results. But this is "just" a technical issue. Whatever leads you from incomplete data to your final analysis results, you should repeat the whole process a couple of times to see how much the imputation impacts the final results - a form of sensitivity analysis.

Peter Edelsbrunner

I believe that "complete" just extracts the imputed data sets, and if you don't specify further, it will give you the first imputed set.

So it seems obvious that you shouldn't use this function and perform analysis or present descriptive statistics just based on this one data set, as you will drastically (depending on the amount of uncertainty in the imputations) underestimate standard errors and with that confidence intervals and p-values. Basically then you would be doing a weird kind of single imputation instead of multiple imputation.

So: Don't.

How to transfer volumetric soil moisture to mm of soil moisture?

How to calculate baroreflex sensitivity (in ms/mmHg) from an Excel file?

How to compute PISA 2018 tests scales?

How might we justify high backgrounds in indirect ELISA?

Can someone help with a problem quantifying fibrosis sirius red stain?

Does anyone know of any European provider for Kapton windows vacuum proof?

Is anyone familiar with hotplate baking of pyroelectric material (LiNbO3)?

Methodology for identifying/mapping public policies (in my case, in education)

Transforming n-Butyl carbamate to t-butyl carbamate

Looking for survey participants

How to learn more about SPSS and its Application?

Is there a problem with my RNA pellet?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Mouse CD3 antibody sequence?

Is Galaxy.org good to use for research for analyzing data and for publication?

Can I multiplex a mouse monoclonal and rat primary for IHC (mouse brain tissue)?

How much total RNA concentration to be extracted from sorted plasma cells from bone marrow of C57BL/6 mice for RT-PCR ?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

How to isolate lymphocytes from mouse spleen?