How do I remove bad scorers from a dataset?

09 September 2019 2 9K Report

We let rate about 60 short stories on valence and arousal.

We suspect that a few scorers have not read the short stories and have marked the possible answers more randomly or according to a pattern. My goal now is to find these bad scorers and remove them from the record. I want to be very careful and leave the scorers in the dataset in case of doubt.

I have chosen the following two criteria for exclusion: 1. Deviation from a range of expected values. 2. Deviation from the expected distribution of all rating values.

Criteria 1: If in more than 6 ratings, a scorer deviates by more than one standard deviation of the averaged ratings across all scorers, the ratings of the scorer will be removed from the dataset. If we assume the probability per rating that a reviewer randomly answers next to the expected range, after 6 short stories there's a probabilityof less than one percent.

Criteria 2: The distribution of all ratings across all scorers resulted in an equal distribution. If you allow an average deviation of 4 points for each scorer from the distribution of the rating values of all scorers for each score, all scorers with more than an average deviation of 4 points will drop out of the data set.

If either or both of these criteria apply to an scorer, the scorer's ratings are removed from the record.

Is this legitimate?
Are there better practices?

Thank you very much for your answers.

Yours sincerely

David Morse

Hello Egon,

If your goal is to make the data set more homogeneous with respect to the ratings assigned, you will at least partially accomplish that goal by using the proposed method.

The problem is, do the two criteria actually capture those instances wherein raters were not appropriately awarding ratings (true positives), and, if they do, how many instances of valid responses will end up being dropped (false positives)? To me, the ideal way to approach this would be to debrief individual participants, to try to determine whether they approached each rating with the intended level of diligence to the task at hand. If participants had been allowed to "confess" without any negative consequences, you might have had a more valid basis for deciding whether to exclude responses.

Should you decide to impose a screening for data points to be included, be sure to describe it in your write-up. You might also try analysis with and without your suspected responses, reporting both, and discussing what difference/s, if any, the decision to exclude cases would make.

Good luck with your work.

Egon Werlen

Hello David

Thank you vor your response.

In fact, I compared the data wiht and without the 8 excluded cases.

The differences are rather small, and all correlations for valence and for arousal between the whole sample and the 'reduced' sample are >.99 for all rating groups.

Best regards

Egon

What are the scales measuring self-efficacy expectations for (adult) students?

What could be the effects when adapting the general self-efficacy scale to learning?

Components of emotion: How do they go together?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Absorption coefficient of methane?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Dirty and clean?

How can I interpret the data without the need of solving it manually?

I need the datasets of Microgrid for system identification?