How can I account for missing data on an 18 Item Scale?

Good question. I would suggest Imputation and then contrasting the results with a complete case analysis (listwise deletion). Plenty theory on why available case analysis (pairwise deletion) is a poor choice. An undifferentiated random hot deck is okay if missingness is rather low. I would however recommend a nearest neighbor hot deck, in which you substitute the missing values in one object with those in another. Selection of the donor can be done by a simple distance function or within clusters. Certainly a regression imputation is also possible. MCMC can be hard to understand and thus to explain in a paper.

There are many pieces of literature that deal with missing data. Some of the more common ones include:

Enders, C. (2010) Applied Missing Data Analysis. Guilford Press: London.

Little, R.J.A. and Rubin, D.B. (2002) Statistical Analysis with Missing Data. Wiley: Hoboken.

Noel Lacey

Gentlemen,

Thank you very much for your input, it is much appreciated

Marie Fongwa

Best to make every effort to avoid missing data on the collected survey, even if it means calling the respective respondents for the missing information. Some statistical programs would include only items with complete information in the analysis. If the program allows, I would use imputation and a good idea to report both results -- before and after imputation.

Ruben Fernández-Alonso

Prof Lacey:

To decide the best method of recovering lost data is necessary to answer three questions:

What is the percentage of missing data?

How have lost data: Missing Completely At Random (MCAR) or Not Missing at Random (NMAR) ?. In the first case the data is lost at random and in the second case the loss is biased to one side of the distribution. The best situation is that data were MCAR. However, social and psychological studies it is normal that the loss is NMAR. For example with an item such as "I am a good student," students tend not to answer are precisely those who have more difficulties. In this case the data tends to get lost in the bottom of the distribution of the variable "academic performance". In such situations (lost NMAR) is not a good decision to replace the missing data by the mean of scale. You probably unduly overestimate the average population.

Do you have incomplete data or missing data? Let me explain. Suppose you have 5 Likert items to measure a psychological construct. Suppose a person responds to 3 items and unresponsive 2 items. In this case you have incomplete data. Instead the person answering the questionnaire does not answer your any of the 5 items would be before a lost data.

Depending on your answers to the 3 questions you've got several options to choose from.

Our group has compared different methods for recovering missing data. Unfortunately our work is in Spanish ... We compared seven methods to recover lost data: listwise deletion, replacement by the mean of the scale, by the item mean, mean the subject, the subject corrected mean, multiple regression, and Expectation-Maximization (EM ). They all have advantages and disadvantages but some are better than others.

The most handy option (displayed by default in programs such as SPSS) is eliminates the missing data (listwise deletion). Today we know that listwise deletion is the least desirable method to handle missing data. I do not recommend it, since it will skew your data. Especially if your losses are conditioned.

The simplest methods are replacement for the average (or mean of the scale, the item or subject). For you it would be a simple and basic possibility. If you have incomplete data and its loss is not very large (less than 10%) it seems that gets better population data is replacement method for the average of the subject. It is based on a basic principle in psychology: usually the best predictor of future behavior of a person is precisely the past behavior of that person.

Finally, there are what might be called "More refined methods", like regression and multiple imputation based on iterative procedures. These methods, where auxiliary variables are used, are better than the average replacement. Especially if your data is completely lost and the percentage of data loss exceeds 20%. Of course they are more complex. However, if you can, I encourage you to investigate.

I do not know what program you used. In any case, the commercial software (SPSS, Mplus ...) has specific modules to recover lost data. If you tell me what software you use and I know it could be more precise in my answer.

Finally I dare to let one readings on the subject.

Willms, J. D., and Smith, T. (2006). A Manual for Conducting Analyses with Data from TIMSS and PISA (Report prepared for the UNESCO nstitute for Statistics). New Brunswick: Canadian Research Institute for Social Policy. Retrieved May 17, 2011 in: http://www.unb.ca/crisp/pdf/Manual_TIMSS_PISA2005_0503.pdf. The first pages of the document are a good initial introduction to the methods of recovering missing data

greetings

Kathy Sias

Todd Little at TTU covers this - it is in his area of expertise. He is on research gate and you might be able to see his articles there. He has also recently published a book Longitudinal SEM. If you are still stuck next summer, he teaches a STATS Camp and addresses this in his intro to SEM class.

Catia Duro

Prof. Ruben Fernández-Alonso

I have a similar question. I hope you can help me

How can I account for missing data on an 22 Item Scale (7 lickert scale)? - it is MBI fo Burnout has 3 dimentions, so the scale is divided in 3 itens and with the scores I can classified in low, medium and high risck and asses burnout.

Wich is best method of of recovering lost data: Mean of item, Mean of subject or delete cases (not missing value) ?

I use SPSS 22 version

I answer your 3 questions.

1. What is the percentage of missing data? 4,5% per case (1/22 itens) 9% if 2 itens (I have just one case with 2 missing values). And

Fluigent oxyGEN Software Crashing?

We are having issues with the baseline in HPLC MSMS. It goes way too high and is unstable. Any suggestions?

Factors influencing the entrepreneurial intentions among senior high school students?

As calcium silicate hydrates are amorphous phase in the cement paste, how can one qunatify CSH in XRD analysis?

Can you cause triploid pear fruit to produce seeds by adding colchicine to the blossoms or root area?

May someone help with an article that contains self efficacy ,GBV and conflict management?

On average, how many times can you run a Tousimis Samdri-PVT-3D critical point dryer before needing to replace the LCO2 tank?

Can I use several qualitative traditions at the same time for my Master thesis?

Only my PCR primers are not working?

Why can't I trap 1um carboxylated beads on live cells with an optical tweezers?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Is this a facetotecta nauplius?

May members post flyers about opportunities to present at a conference? If so, where to post?

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

Is Galaxy.org good to use for research for analyzing data and for publication?

Research Methodology - Impact of Corporate Reputation on Stakeholders Behaviors?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

How does one derive the standard deviation of a scale?