Hello!

I made a small post mortem computed tomography study on 5 corpses with 5 different scan protocols to evaluate the "best" protocol (=best image quality) for a native cardiac examination. Therefore i created a questionnaire with 8 questions (for every corpse) and a 5-scale rating (very good; good; acceptable; bad; very bad). I have at least 4 raters, so I need a interrater-reliability for multiple raters and I also have many ratings (1 observer gives 200 ratings)

There are a few options, but I don't know which one I should use: Fleiss Kappa, Krippendorffs Alpha, Intra-class-Correlation, Kendalls W

At first I tried the Fleiss Kappa, but the value was very low (0.12) - The reason for that is probably that I have to calculate the weighted Fleiss kappa, but many scientific papers say that Fleiss Kappa is for nominal data.

So I´m a bit confused right know, because I think there's not really a consensus which interrater-reliability is the "best".

I hope someone has more experience and could give me an advice for my study.

Best regards,

David Riegler

More David Riegler's questions See All
Similar questions and discussions