SPSS for multiple imputation to have a dataset with no missing values

Patrick, it is a bit irresponsible to suggest that "SPSS does only single imputation", when even a very cursory Google search shows that it does multiple imputation. See the links below, for example. The 3rd link is for a PDF of Chapter 5 from John Graham's book on missing data (see the 4th link). HTH.

http://www.appliedmissingdata.com/spss-multiple-imputation.pdf

https://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.html

http://download.springer.com/static/pdf/693/chp%253A10.1007%252F978-1-4614-4018-5_5.pdf?auth66=1424962623_ea0c6fccebd2bb80112734fd3fbe1a60&ext=.pdf

http://methodology.psu.edu/pubs/books/missing

Patrick S Malone

Bruce, Thank you. It seems my information is out of date. My apologies for any confusion.

Bruce Weaver

No worries, Patrick--and thanks for taking my remarks in such an even-handed manner. It just irks me (as a long-time SPSS user) that many people form impressions about what SPSS can and cannot do on the basis of very little or very weak evidence. I posted a mini rant on this to the SPSSX-L discussion forum back in January--see the link below if you're interested. :-)

p.s. - As a long-time SPSS user, I am certainly aware that in comparison to some of the other major packages, SPSS often does take its sweet time developing procedures for newer methods. E.g., the genearlized linear model procedure (GENLIN) appeared LONG after the corresponding PROC was available in SAS (and probably Stata too). So I'm not arguing SPSS is perfect, just that it's not as limited as some users of other packages often think. ;-)

http://spssx-discussion.1045642.n5.nabble.com/SPSS-Statistics-Survey-td5728513.html#a5728533

M. Reza Hosseini

Dear All,

Thank you for all your comments. Yes I have used SPSS doe multiple imputation of data. However, results are only usable inside SPSS and you cannot export the outcome to any other package because SPSS performs analysis based on 5 imputed versions and then creates a pooled answer for the analysis (not a pooled dataset).

I am also aware that AMOS can handle missing data using Maximum Likelihood Method technique, but the point is some features of AMOS do not work with missing data and are active merely with complete data (such as modification indices). Besides, sometimes missing data could be the source of some problems with model identification (according to AMOS 22 user guide page 272-273). Hence, I decided to complete the data and then use the outcome in AMOS.

I guess I should go with single imputation as I have around 2-5% missing data or use R.

I am happy to have your comments on this matter.

Cheers,

Patrick S Malone

Ah, if SPSS can only export a single dataset, that was probably the source of my confusion! Also, I'll note that 5 imputations--the traditional recommendation to minimize bias--is not very many for purposes of maximizing power. Unless the analysis takes a long time per imputation, there's not really a good reason to not use far more (other than software limitations).

That said, with 2-5% missing data, either single imputation or listwise deletion is unlikely to be harmful. Single imputation will minimize bias but underestimate standard errors, thereby inflating power. Listwise deletion has the potential to introduce bias and will reduce power. Depending on your sample size, that may not be a concern.

Listwise assumes data missing completely at random, which is testable. If you go that route, you might try a comparison between the retained sample and the omitted sample on the non-missing variables. If differences are trivial, so will be bias.

But again, you're relatively unlikely to be in much danger either way with such a small degree of missing. The exception to that generalization is if you have highly imbalanced categorical variables, in which case listwise has the potential to do real harm.

Pat

Bruce Weaver

Let's go back to the original question. Reza wrote (with emphasis added):

I used SPSS for multiple imputation to have a dataset with no missing values (for AMOS). How should I save and use the pooled outcome in AMOS?

As far as I can see, we don't know the answers to these important questions:

What type of analysis did you perform with SPSS? (This will determine what the "pooled outcome" you refer to consists of. E.g., if you were estimating some kind of regression model, the outcome will be a table of pooled estimates of the regression coefficients.)
What did you hope to do with the "pooled outcome" in AMOS?

In a later post, Reza wrote this:

However, results are only usable inside SPSS and you cannot export the outcome to any other package because SPSS performs analysis based on 5 imputed versions and then creates a pooled answer for the analysis (not a pooled dataset).

And then Pat wrote:

Ah, if SPSS can only export a single dataset, that was probably the source of my confusion!

I'd like to clarify that regardless of what software one is using, multiple imputation is never about generating one "pooled" data set. It is about generating multiple imputed data sets, performing the desired analysis on each of those data sets, and then generating pooled estimates of the results (e.g., regression coefficients & their associated SEs). So this is not some limitation of SPSS--it would apply equally to any other package.

Second, the 5 (or however many) imputed data sets generated by the MULTIPLE IMPUTATION procedure in SPSS are all in the same data file. There is a variable called Imputation_ that is equal to 0 for the original data set, and to 1 through m for the m imputed data sets. (When you perform the actual analysis, the file is SPLIT BY Imputation_. This is what tells SPSS go generate the pooled estimates.) So if you are trying to use those multiple imputed data sets in AMOS, I don't see what the problem is.

Finally, for the sake of those who do not use SPSS (and those who do, but don't bother to read the documentation), a list of SPSS procedures that support computation of pooled estimates from multiple imputed data sets can be viewed at the first link below--click on Procedures That Support Pooling.

HTH.

p.s. - For questions about how to do things with SPSS (and other related products such as AMOS), I would recommend posting to the SPSSX-L mailing list (second link below) rather than to RG. Actual SPSS users participate in that forum, so you might have better luck getting answers to your questions. ;-)

http://www-01.ibm.com/support/knowledgecenter/SSLVMB_20.0.0/com.ibm.spss.statistics.help/mi_analysis.htm

http://spssx-discussion.1045642.n5.nabble.com/

Is anyone aware of any package or method to graphically visualise a correlation table?

Reward/Prize/Donation for surveys?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?