Interrater agreement/reliability - What is the best method to deal with multiple rater pairs?

More Dirk Richter's questions See All

Absorption coefficient of methane?

Hello, Can anyone provide me with the absorption coefficient of methane gas at 7.7 um? Any reference?

06 August 2024 980 5 View

How are Large Models Exploring and Outputting Knowledge Understanding in Specific Content Areas, and What Does Academic Research Say About It?

Hello everyone！ I am currently exploring the performance of large models in understanding knowledge in specific domains, and attempting to construct a knowledge framework similar to what...

05 August 2024 5,729 2 View

Regarding a model for simulating battery charge and discharge, what do you consider to be high fidelity?

Regarding a model for simulating battery charge and discharge, what do you consider to be high fidelity? What is the acceptable percentage of error (regardless of the metric)? Could you suggest...

03 August 2024 5,358 0 View

How do i get an account to upload my published papers?

need to open an account to upload my published papers

01 August 2024 9,255 1 View

What is the problem with these tissue culture plants?

All plants are green but some of these plants becomes yellow. I did not found any reason. Please help me to find out the real problem.

01 August 2024 589 4 View

How to correctly use the UTE and ZTE pulse sequences in Bruker's ParaVision software?

I am using a Bruker 600M solid-state NMR spectrometer with a Micro 2.5 microimaging system. The test sample is a tube of 1M LiCl aqueous solution, and the nucleus detected is 1H. I am trying to...

01 August 2024 9,227 1 View

Is artifacts in XPS possible to build high deviation in binding energy larger than 5 eV??

Hello. Thanks for your consideration to see my question. Recently, I conducted XPS anaylsis of g-CN that is prepared from thermal polycondensation of DCDA, so-called conventional bulk-g-CN,...

30 July 2024 9,824 2 View

Which statistical test should we use?

N=6 Comparing pre and post test likert scale responses. Participants are mix of practicing & preservice teachers.

30 July 2024 7,233 4 View

How to build my own lab made four point probe set up?

Hello, I'm trying to measure the conductivity of semiconductor films but since I don't have a commercial four point probe set up I would like to build one on my own in my lab. I have generators,...

30 July 2024 906 2 View

Can the limit of quantification (LOQ) of an analytical method fall outside its linear dynamic range, or must it always be within it?

Can an analytical method's limit of quantification (LOQ) be outside its linear dynamic range, or is it always required to be within it? Please provide a thorough explanation supported by verified...

29 July 2024 7,198 9 View

What are the long-term impacts of incarceration on youths' developing brain?

I want to explore the long-term effects of incarceration on a youth's developing brain. I also want to explore research that looks critically at incarceration and punitive measures as the primary...

12 August 2024 862 0 View

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View

May members post flyers about opportunities to present at a conference? If so, where to post?

May members post flyers about opportunities to present at a conferehttps://veraeducation.com/nce? If so, where to post for the Virginia Educational Research Association? https://veraeducation.com/

08 August 2024 4,585 1 View

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

My name is Apurva Saoji. I am a Ph.D scholar in Computer engineering in India. I am looking for international expert in reviewing my PhD thesis, "Competitive Optimization Techniques to Minimize...

07 August 2024 4,600 2 View

Research Methodology - Impact of Corporate Reputation on Stakeholders Behaviors?

Please can anyone support with the survey questions based on RQ measures and propose how to do it in FMCG industry and include as well the role of brand equity Thanks

06 August 2024 949 0 View

How does one derive the standard deviation of a scale?

Dear all, I am working on analyzing data from a survey on student satisfaction. The survey contains items with a 7-point Likert response format that produce 12 scales related to different areas...

05 August 2024 2,141 4 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Why results of ROS flurescence are negative as there was no bacteria within?

Hello. I am working on ROS production of two systems: system A is cerium oxide and hydrogen peroxide, system B is cerium oxide nanoparticle, hydrogen peroxide and potassium bromide. I did some...

04 August 2024 5,974 3 View

What should Berlin do as a city to become as impactful as London and Paris in World Football?

Please go through my Abstract. I can also share a proposed Thesis Outline.

04 August 2024 2,077 0 View

Radiogenomics Cancer Research Challenges?

what are the top 3 challenges to the advancement of the field of Radiogenomics in cancer research? is it the availability of easily available low-cost matched imaging and biosamples with clinical...

03 August 2024 5,828 4 View

Stephen Joy

This may or may not be helpful, but here goes: I assume that the raters are receiving some sort of training in how to use the rating instrument. At the end of the training period (whether this is done in a workshop setting or as an on-line activity), they should all be presented with a set of "cases" to rate. Those cases (whether real or fictitious) should contain enough material to allow for ratings to be made, but in a format similar to what might be available in the small facilities (e.g., doctor's reports, bloodwork, nurses' progress notes). Then calculate the inter-rater agreement based on those training cases and use that figure. As long as you are up-front about what you did and why you did it this way, it seems acceptable. Not optimal, maybe, but since you can't have the same raters see all the cases, what more could be expected?

Claudio S Hutz

The ideal situation would be that all cases would be examined by the same raters. Sometimes it is difficult because people who are being assessed are in differente places. One possibility would be to videotape the assessment session and have some raters do the assessment of all cases.

Dirk Richter

Thanks, Stephen and Claudio. We have decided to work with real cases and the staff's notes are (as quite often in those institutions) not very reliable. Thus, the assessment has to be done by staff members who know the person to be assessed for quite a time. So, the problems remains and is now more a statistical problem about which Interrater measure and which procedure (e.g., calculating the mean of all rater pairs) seems to be most appropriate.

About what I thought. By fictitious cases, all I meant was ones that can be used to establish inter-rater reliability. Those pseudo-cases could be documented well. They wouldn't be part of the study sample.

If you don't have a subset of cases (real or otherwise) rated by everyone, then you have no way of knowing whether some raters are systematically more lenient or more stringent than the rest. Again, those can be training cases you make up. But there's no statistic I ever heard of that can tell you whether two raters agree when they're rating entirely different samples.