Interrater agreement/reliability - What is the best method to deal with multiple rater pairs?

More Dirk Richter's questions See All

What is the appropriate method for risk adjustment when conducting provider profiling with small sample sizes?

I am involved in research which aims at comparing health outcomes of pediatric hospitals units. The specific problem in pediatrics is that both the sample of hospitals and the sample size of...

31 December 2013 4,863 13 View

What are the long-term impacts of incarceration on youths' developing brain?

I want to explore the long-term effects of incarceration on a youth's developing brain. I also want to explore research that looks critically at incarceration and punitive measures as the primary...

12 August 2024 862 0 View

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View

May members post flyers about opportunities to present at a conference? If so, where to post?

May members post flyers about opportunities to present at a conferehttps://veraeducation.com/nce? If so, where to post for the Virginia Educational Research Association? https://veraeducation.com/

08 August 2024 4,585 1 View

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

My name is Apurva Saoji. I am a Ph.D scholar in Computer engineering in India. I am looking for international expert in reviewing my PhD thesis, "Competitive Optimization Techniques to Minimize...

07 August 2024 4,600 2 View

Research Methodology - Impact of Corporate Reputation on Stakeholders Behaviors?

Please can anyone support with the survey questions based on RQ measures and propose how to do it in FMCG industry and include as well the role of brand equity Thanks

06 August 2024 949 0 View

How does one derive the standard deviation of a scale?

Dear all, I am working on analyzing data from a survey on student satisfaction. The survey contains items with a 7-point Likert response format that produce 12 scales related to different areas...

05 August 2024 2,141 4 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Why results of ROS flurescence are negative as there was no bacteria within?

Hello. I am working on ROS production of two systems: system A is cerium oxide and hydrogen peroxide, system B is cerium oxide nanoparticle, hydrogen peroxide and potassium bromide. I did some...

04 August 2024 5,974 3 View

What should Berlin do as a city to become as impactful as London and Paris in World Football?

Please go through my Abstract. I can also share a proposed Thesis Outline.

04 August 2024 2,077 0 View

Radiogenomics Cancer Research Challenges?

what are the top 3 challenges to the advancement of the field of Radiogenomics in cancer research? is it the availability of easily available low-cost matched imaging and biosamples with clinical...

03 August 2024 5,828 4 View

Stephen Joy

This may or may not be helpful, but here goes: I assume that the raters are receiving some sort of training in how to use the rating instrument. At the end of the training period (whether this is done in a workshop setting or as an on-line activity), they should all be presented with a set of "cases" to rate. Those cases (whether real or fictitious) should contain enough material to allow for ratings to be made, but in a format similar to what might be available in the small facilities (e.g., doctor's reports, bloodwork, nurses' progress notes). Then calculate the inter-rater agreement based on those training cases and use that figure. As long as you are up-front about what you did and why you did it this way, it seems acceptable. Not optimal, maybe, but since you can't have the same raters see all the cases, what more could be expected?

Claudio S Hutz

The ideal situation would be that all cases would be examined by the same raters. Sometimes it is difficult because people who are being assessed are in differente places. One possibility would be to videotape the assessment session and have some raters do the assessment of all cases.

Dirk Richter

Thanks, Stephen and Claudio. We have decided to work with real cases and the staff's notes are (as quite often in those institutions) not very reliable. Thus, the assessment has to be done by staff members who know the person to be assessed for quite a time. So, the problems remains and is now more a statistical problem about which Interrater measure and which procedure (e.g., calculating the mean of all rater pairs) seems to be most appropriate.

About what I thought. By fictitious cases, all I meant was ones that can be used to establish inter-rater reliability. Those pseudo-cases could be documented well. They wouldn't be part of the study sample.

If you don't have a subset of cases (real or otherwise) rated by everyone, then you have no way of knowing whether some raters are systematically more lenient or more stringent than the rest. Again, those can be training cases you make up. But there's no statistic I ever heard of that can tell you whether two raters agree when they're rating entirely different samples.