My data in SPSS encountered over 40% missing numbers. It seems to be unrealistic to present the findings without adjusting for the missing number. Are there any other methods to adjust the missing number?
Of course there is a way in SPSS to deal with missing numbers through Transform, then replace missing values. But I believe that replacing 40% misssing values can significantly affect the results.
Yes, 40% missing number largely affect the result. I am not in condition to revise the data collection once again. So, I have to deal by statistical procedures.
Given the rather large amount of missing data i recommend that you do some sort of sensitivity analysis, regardless of which method you finally choose. There are different strategies for handling you data set.
First you should look at the data and analyse the missing data pattern and then decide what method to use. There is a presentation from University of Texas [http://www.utexas.edu/cola/centers/prc/_files/cs/Missing-Data.pdf] with som general advice, and a reference [Schafer, Joseph L., John W. Graham. 2002. “Missing Data: Our View of the State of the Art.” Psychological Methods] where they mention methods that should give good results even for MNAR data.
Try using different methods including the standard procedure and see how it affects analysis. You could also run the analysis on bootstrap samples to see how different subsamples affact the results.
Such sensitivity analysis should give you a good picture on the consequences of the missing data. It requires some work, but then you do have a lot of missing data.
According to Hair et al (2010), missing data is considered one of the most continuing problems in data analysis that may affect the results of the research objectives. it is important to determine the type of missing values to know whether the missing data were occurring randomly or non-randomly (Pallant, 2010). In this regards, if the missing values are non-randomly distributed within the items of the questionnaire, then such data can be ignored. However, if the missing values are non-randomly distributed, then the generalizability of the results will be affected (Tabachnick and Fidell, 2007). Schumacker and Lomax (2004) suggests that missing data up to 5% is considered acceptable. the researcher applied the ‘mean substitution’ method to replace missing data for the categorical variables while missing data for nominal variables were excluded later during the data analysis. In other words, in your case you CAN'T take your results in consideration as it will badly affect your results. So, you know have to know the reason behind this high rate. you maybe, for example, asking very sensitive questions that potential participants opt out from answering this particular question.
1.0 40% is considerably large missing data size. I tend to agree with Ali on Schumacker and Lomax (2004) suggestion of 5% cut-off point. This should apply to your variable on interest (IV and DV), but not on your demographic data such as gender, age (unless these variables are part of your predictor/criterion variables).
2.0 Even if you run missing value analysis (MVA) in SPSS and request for Expected Maximization (EM), and Little's MCAR test gives you p > .05 (which is good), your final results in subsequent analyses are mostly biased.
3.0 My suggestion: Look at the pattern of missing values, address the issues surrounding the questions for those missing values and re-collect the data.
4.0 This document can be helpful: https://www.researchgate.net/publication/262151892_Introduction_to_SPSS