How to eliminate Bad Respondents from Survey Data, in a reasonable and widely acceptable way? How to clean Data from Amazon M-Turk?

08 August 2018 6 8K Report

Dear All,

I have been analyzing data from surveys collected on Amazon M-Turk for the last year and a lot of the times it is obvious (and understandable) that people do a pretty awful job at responding. I can completely understand that a lot of the times people will be tired, drunk or stoned, and will be filling in surveys to make ends meet, but I need to find a widely accepted way of dealing with these responses so they don't add noise to the results.

I come from a neuroscience/pychophysics background where I had loads of freedom with cleaning data (as long as I did it transparently), but now in Consumer Research & Marketing a justified but somewhat arbitrary cleaning of the data is less accepted, both in terms of the reports I produce and the journals I am targeting.

I have an open question at the end of the survey, for ethical reasons, where I ask people what they think the purpose of the study was. These are some of the responses I get (real responses):

- NOTHING

- i ave impressed

- no

- NOTHING FOR LIKE THAT UNCLEAR. IT'S ALMOST FOR SATISFIED.

Clearly one cannot expect anything from a respondent that answers in such a way, and, in fact, when I eliminate such respondents the results make much more sense. I have already set my sample to US residents only, and stated I want English speakers. But linguistically impaired or non-English speakers seem to wriggle their way in.

What do you advise me to do? What is acceptable in science and business, in terms of dealing with random, bad, non-sensical responses?

Some people tell me that they eliminate up to 50% of data from M-turk because it is crappy, and that is normal to them. Other people say that is unacceptable. The people who eliminate up to 50% of data seem to not report it. I would like to have a reasonable procedure that most reasonable people would see as acceptable, and report it.

I am thinking about investing time creating a little program that processes English language and that detects text that cannot be considered as functional, grammatically-sound English statements. Is that something someone has tried?

Lastly, I have heard about an elusive statistical procedure that detects random responses, when rating items on a 5 or 7 point scale. I cannot find anything concrete on this, which makes me think its not widely accepted or well-known or generalizable.

Any tips or thoughts on the matter will be well appreciated.

Michael

Ian Kennedy

I was once interviewed on the South bank of the Thames by a Sky TV reporter. He did not like my reply or thought it was an outlier, so it never appeared on TV that night, and no apology was broadcast. Supermarket market researchers too do not include responses by drunks and people who cannot answer the questions posed. The less control the researcher has over the interviewee or respondent, the more wild the responses will be. As a researcher today, I would be quite happy to (responsibly) throw away a full half of the responses.

How to determine which? Include a testing filter question early. Ask 'Does “I” always come before “E”?' 'Yes'. 'Thank you for participating.' Continue asking your last question, and reject their other responses if they demonstrate cluelessness.

As I used to tell my colleagues: 'It is by failing students that we maintain our standards'. You maintain the integrity of your data only by deleting the unthinking or illiterate responses.

Asterios Chardalias

If you are working in a KAP (Knowledge-Attitude-Perception) framework, it makes sense to use your Knowledge questions to rank response adequacy. From there on, if you want to proceed with an outlier analysis followed by normalization of some kind, or go with a weighted scheme is up to you.

You can detect (/avoid) random answers by incorporating 'validation questions' (rephrasing of questions asked previously). If respondents answer both the original question and its validation counterpart consistently you're good; if not you have grounds to suspect they are not doing their best.

Optional questions means you already have a way to handle imbalanced data. Not all respondents will opt to answer. It's similar with open questions with no word limit.

Using linguistic criteria doesn't seem appropriate, unless the survey itself is linguistics-oriented.

You can always report versions of your analysis on both 'as-is' and 'cleaned' data.

Michael Puntiroli

Asterios Chardalias and Ian Kennedy

Thanks for that, and I will be incorporating a simple validation question in the future, such as "Does I always come before E" example.

However, the data from many studies has already been collected, so it is too late now to relaunch them all with the validation questions. For that reason I need to use what I have, which is good/interesting, especially when all the bad respondents are eliminated.

Regarding outlier detection. I am a fan of these methods, but they might not be entirely relevant, since a person who barely understands the questions could put mid-range answers to most things, making it difficult for them to be picked up as an outlier. Also, they might be an outlier in one dimension, because of their meek mid-range answers on a topic people usually feel more strongly about, but they won't necessarily be outliers on another dimension.

You are right Asterios Chardalias that using linguistic screening might not be ideal in an investigation that has little or nothing to do with linguistics. It is true, however, that judging the quality of the responses on the open question seem to be the most telling method.

I should probably simply eliminate everyone who gives nonsensical answers, report the % of how many these people are, and openly share the data with whoever wants to verify.

Ian Kennedy, your final statement seems to be spot on. I'll abide by that.

Asterios Chardalias

That's why it's important to meticulously design and pilot-test survey instruments in advance. Pre-registration may also be a good idea in order to avoid ad hoc decisions when the data start pouring in.

A general word of caution: Think twice before dumping data out the window. Mining dark data is now a lucrative business. Most data points do have value, even if it is more difficult to extract from some than from others.

Equating 'unthinking' and 'illiteracy' is somewhat highbrow for my taste. Especially in Consumer Research & Marketing where, more often than not, you do want to get a feel for some mainstream crowd mentality.

Ian Kennedy

Edited to 'unthinking' or 'illiteracy'

Md Zafar Alam Bhuiyan

I think this is also a part of research-how much people make response to my survey.

So, elimination is not wise always.

When creating a control group using Matching Techniques (like Exact Matching or Propensity Score Matching) is it OK to match specific cases?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Is there an English Translation of the Carl Moller text: ZUR VERGLEICHENDEN ANATOMIE DER SILURIDEN?

Is this a facetotecta nauplius?

May members post flyers about opportunities to present at a conference? If so, where to post?

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

A paper on a fossil lycopod?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

The Bigger You Are, the Harder You Fall (some lessons from Dinosaurs)?