I have tried different methods for making a factor analysis with missing values, and I get very different results.
Here is a comparison of a 2 factor analysis for a 73x40 data set with 43% missing values, using four different methods:
Method, Cumulative variance for two factors:
A: 0.285 0.408
B: 0.425 0.591
C: 0.193 0.258
D: 0.414 0.636
Method A: Missing values are replaced by the mean.
Method B: Based on covariance matrix computed by CRAN package 'norm'.
Method C: Based on covariance matrix computed by CRAN package 'norm2'.
Method D: Function umxEFA in CRAN package umx.
The software packages in B, C, and D are all designed to analyze data with missing values by means of structured equation modeling.
(The functions in B and D needed minor modifications to overcome limitations to the size of the data set, but nothing that changed the algorithm)
Other methods:
E: Random imputation: This gives non-reproducible results with excessive weight on variables with few missing values.
F: Multiple imputation with an auxiliary variable (Hot deck method). Missing values are replaced by values from another observation with the same value of the auxiliary variable. This method is useful, but I suspect that subsequent correlation tests will be invalid because the auxiliary variable must be something that is assumed a priori to correlate with everything.
I would appreciate an explanation of why the results are so different, and how to get useful and valid results.
FYI: My data set consists of data from different cross-cultural surveys. Each survey reports various cultural variables for a number of different countries. Each survey covers a different subset of countries, and no survey covers all the countries. This is the reason for the missing values.
See also the answers to https://www.researchgate.net/post/Any_suggestions_on_missing_values_in_factor_analysis