How does a single researcher establish intrarater validity/reliability of the coding in a qualitative study?

You may need to think about how different your Time 1 and Time 2 scores are? For example, if 20% or more scores are different, then I would revise and refine the scoring scheme particularly for the coding discrepancy data, and then retest for intra-rater reliability until you've ironed out the discrepancy sensibly. However, unless you are a specialist in the area, I might still look to asking someone else to score the data for inter-rater reliabilty. Maybe someone else from your course needs an inter-rater reliability score- and you could share the burden.

Michelle B. Cowley-Cunningham

See also this link for how to inter-rater/ intra-rater reliability on SPSS and MS Excel.

http://www.statstutor.ac.uk/resources/uploaded/coventryreliability.pdf

David L Morgan

I agree that you should go beyond saying that your results are "slightly different." In particular, I would recommend calculating an inter-rater reliability index such as Krippendorff"s alpha.

You might also consider whether doing this kind of re-rating is really necessary for your work. In some fields, such as communication studies, inter-rater reliability is almost a requirement in you are doing content analysis on media, but in other fields where qualitative research is more interpretive, it is not considered to be useful.

Maria Samodra

Hello everyone,

Thank you very much for your advice; I am glad to receive all your answers.

Maria

Robert Trevethan

Hello Maria,

I think you have received some good suggestions from the above contributors. When dealing with those suggestions, however, I think you need to be aware that the analysis you choose might depend on how you have coded your original data. For example, even if the original data were qualitative in nature, have you converted those data to something that is categorical / nominal in nature, or, rather, something that is ordinal or equal interval in nature? The nature of the data you want to work with in this second "round" will determine the kind of analysis you can aim for. Also, consideration needs to be given to the number of data points (people? essays with subcategories?) you have for analysis. In addition, some analyses would require larger numbers than would others to be valid.

In essence, I think that, for you to receive really helpful and apposite suggestions, it would be necessary for you to provide a little more information about the nature of your research design / situation. For example, intraclass correlation coefficients might be exactly the thing you should use under some circumstances, but totally inappropriate under other circumstances. The same goes for Krippendorff's alpha.

I hope that's helpful - but you might have received enough advice already to know what you should do.

Robt.

Maria Samodra

Hi Robert,

Thank you very much for the suggestions. I am doing research on (corpus) linguistics, specifically how writers express doubts in their research papers by looking at how many times, for example, the modal verb "may" appears in their texts. Since "may" can have multiple meanings other than expressing doubts (e.g. to express permission as in "You may go now"), I need to exclude those which do not function to reflect uncertainty. I have tried converting them into categorical data (e.g. 1 for expressions of doubts and 0 for non-expression of doubts) and I am thinking of using Cohen's Kappa for reliability test of my coding in Time 1 and Time 2. And perhaps I can try to resolve the little difference in both times by asking other people to help me judge/decide the definite sets of data to use.

Maria

Robert Trevethan

Hello again, Maria,

The extra information you've provided is useful - and very interesting. (I'm going to go away and wonder whether "may" can be used in other ways - or even in overlapping ways - than the categories you have used above. I think your idea of asking others (trusted judges) to help you out is a good one.

Robt.

David L Morgan

Given the kind of work that you are doing, calculating an inter-rater reliability does make sense. There are a number of choices for calculating the reliability coefficient, and Cohen's Kappa has come under some criticism over the years. so I would recommend Krippendorff's alpha as an alternative.

Robert Trevethan

Hello Maria - again,

First, I need to confess to not closing some of the parenthesised text in my last post. It was after 1 a.m. (I am in Australia), and I must not have been functioning as well as I should have.

More importantly, I have been wondering what the purpose of your research is and therefore what is best for you to aim for. Under some circumstances, either intra- or inter-rater reliability might be less important that simply reaching a decision that is justifiable / valid. For example, when some researchers are deciding which articles to include in a meta-analysis, two researchers might evaluate all of the prospective articles and, if there are disagreements, they resolve those disagreements by joint discussion or by calling in a third party. In my experience, they don't report how many times disagreement occurred - only how any discrepancies were resolved. Here's an example of that: doi:10.1177/1358863X16645854.

In your situation, this might be paralleled by, in a way, your talking to yourself, i.e., having a careful look at why you made a different decision at the two different times and discovering that you might not have been thinking clearly at one point in time, OR by, as you suggest above, asking someone else to help you make a decision. After that bit of a flurry, you could get down to what might be the substantial focus of your research without, as I have suggested above, not needing to report the extent of discrepancy.

If, however, you think you really do need to produce some kind of statistic that indicates the extent of disagreement (with yourself), David's suggestion about using Krippendorff's alpha might well be the way to go. I am not familiar with that statistic, so David might be so good as to reassure you that it would be appropriate if you were essentially conducting intrarater, not interrater, reliability (David is referring solely to the latter, which I think you are not able to entertain). If that statistic is appropriate, the next issue is obtaining it using your data. I'm sorry, but I can't help with that - though I did notice a YouTube site that might (may?) be helpful.

Robt.

Maria Samodra

Dear David and Robt,

Thank you for the helpful suggestions--greatly appreciated!

Maria

Zhuo Jing-Schmidt

I see that you are concerned with intra-coder reliability. While this notion is not common, I understand what you try to get at. For qualitative research, the key is to establish a set of coding criteria that are explicitly articulated -- what data is collected, how it is catalogued and annotated, and analyzed. That way, you provide a level of reliability in that the way you handle your qualitative data can be compared with others' coding criteria. If other researchers want to compare or replicate your research, they can look up the criteria by which you code your data.

Brian R. Urlacher

So inter-coder reliability is important to ensure that bias (application of coding procedures) are consistent. When you have multiple researchers working with data you want to make sure that your coding processes are understood by all involved (so you test to make sure everyone is coding things similarly). In your case you are the only one who is doing the coding. In theory you are applying the coding procedures consistently (although it might be useful to review the first 1/4 of your coding after you have gone through everything) Inter coder reliability is still useful for purposes of establishing the validity of your procedures. In other words, would a reasonable person look at the same data and the same coding protocols and arrive at the same conclusion. I've ordered pizza, explained my protocols, and had fellow grad students code a small sample. As long as your results are in line with others and results are consistent, you can be reasonably confident in your procedures and coding.

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

Is this a facetotecta nauplius?

May members post flyers about opportunities to present at a conference? If so, where to post?

How are iso-frequency contours plotted?

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?