I remember in my MA class for advanced multivariate methods that multiple imputations in Mplus was one of the methods, but it seemed far too much for me to grasp it. Any other easier way? I am working in R.
Probably not. Multiple imputation or using an estimation method that "borrows" existent information to fill in missing values (e.g., Full information maximum likelihood estimation). So far this is possible at least in Mplus and Stata. In my opinion it's more straightforward but if you have a large amount of missing data I would look into comparing results from imputed data and FIML. I'm sorry I can't help you regarding R...
Probably not. Multiple imputation or using an estimation method that "borrows" existent information to fill in missing values (e.g., Full information maximum likelihood estimation). So far this is possible at least in Mplus and Stata. In my opinion it's more straightforward but if you have a large amount of missing data I would look into comparing results from imputed data and FIML. I'm sorry I can't help you regarding R...
I agree with Tina. Multiple imputation and raw-data methods such as FIML are the two major defensible ways to handle data missing at random (the compromise assumption most often used between plausibility and practicality, at least in my fields).
FIML has the advantage of usually being trivially easy to implement in almost any SEM software, including lavaan in R (and likely in other SEM R packages, but I'm not expert in R).
MI has the advantage that all variables in the imputation dataset contribute to the MAR assumption, rather than just those in the model. Mplus gets around what with the AUXILIARY option, but I don't think lavaan has that facility yet.
If the variables contributing to MAR are the same, the two methods are asymptotically equivalent.
If your missing data exceeds 15%, you need to remove the observation. If your missing data is less than 5%, you may use mean replacement. If it is more than 5%, you can first determine the demographic profile of the respondents with missing data and then calculate the mean for the sample subgroup representing the identified demographic profile.
If your question with missing data is associated with a construct with multiple items, you can use the average of the responses to all the items associated with the construct.
For more info you can take a look at Hair's "A primer on partial least squares structural equation modeling" (page 50-51)
It depends on what you want to do with the imputed whether you need MI. If you're only interested in estimating totals or means (and not in estimating variances or coverage intervals), you can, for instance, use hot deck imputation or regression imputation.
If you want to use multiple imputation and are looking for software to do so (e.g. in R), you can look at the website of Stef van Buuren: http://www.stefvanbuuren.nl/mi/Software.html.
Lavaan indeed. I am curious about deleting data if it represents more than 15% of the answers. I see that other people do not fully agree with this methods. Thoughts?
I read Hosein's 15% suggestion as a heuristic for within-case missing. I sometimes use cutoffs like that in an attempt to omit cases who didn't comply with the research protocol. In a recent example, I dropped cases who completely skipped more than 1/3 of the individual questionnaires in a large packet, using my judgment (and also looking at the frequency distribution of such skipping) as to whether I trusted the remaining responses to be valid vs. largely arbitrary. But it's context-specific -- I would not argue for a firm number, or using such a rule at all, across all studies.
I agree with Patrick that it is context-specific, and advising a firm number is not wise. Moreover, if a person has 20% missing scores (or in the same way, a variable has 20% missing), there still is 80% observed. Removing these observed scores is wasteful. And you should not forget that deleting incomplete respondents/variables is also a missing data treatment: complete case, available case, listwise deletion, pairwise deletion. These are usually not the best treaments in terms of power and especially bias, as they assume that data are missing completely at random. The suggested methods (MI and model-based methods like FIML) work under the less strong assumption of Missing at Random (MAR) and can correct some biases.
Thanks a lot, you answers are really helpful. Last thing. I am using Mplus 6, and I can send you the script I am using. Question is, this software solves the problem of missing data. Do I have to address this problem when I present the results in my paper?
Yes. You should always discuss the nature and extent of your missing data problem, your approach to dealing with it, and the assumptions that approach requires. In psychology papers these days, that information is typically included in an analysis strategy section in the methods.
For multiple imputation there are good packages in R. Norm is one, if you have (approximately) multivariate normal data (for variables with missings). Mice is another, based on chained equations, where you can specify a different model (distribution) for each incomplete variable. For the latter, you may want to check the paper of White, Roystan and Wood in Statistics in Medicine (2010).