Hi everyone,
I would like your opinion on using the complete() function of the mice package on R.
Theoretically, after multiple imputations, analyses should be performed on each imputed dataset, and then the results of the analyses should be pooled (see attached diagram).
However, I consider the complete() function. In summary, it permits generating a unique final dataset using the results of multiple imputations previously performed with the mice() function. This strategy is easy, "inexpensive," and allows us to manipulate only one dataset.
This is a concrete example of the usefulness of this strategy. I am conducting mixed-methods research in which I want to interview some participants after analyzing their responses to my survey. If my respondent John Doe did not answer to an item of a scale, I would risk having 5 plausible answers from John Doe after multiple imputations (if m=5, or 20 plausible responses if m=20, etc.). However, the complete() function will summarize the different estimates into one dataset (instead of 5, or 20, etc.). Basically, during an interview, I will be able to question John Doe based on his scale score computed with NA replacement. So I lose precision, but gain in ability to exploit the answers.
However, this approach seems problematic, as the literature does not support it well. In fact, except for this paper by van Buuren et al. (2011, cf. section 5.2), I cannot find any source that supports this approach:
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. https://doi.org/10.18637/jss.v045.i03
Well, I'm stuck between a rock (a more rigorous approach, i.e. the pooling) and a hard place (a more practical approach, i.e. the complete() function). What do you think?
Hope to read you (and my apologies for my broken English)
FM