Hello!
I made a small post mortem computed tomography study on 5 corpses with 5 different scan protocols to evaluate the "best" protocol (=best image quality) for a native cardiac examination. Therefore i created a questionnaire with 8 questions (for every corpse) and a 5-scale rating (very good; good; acceptable; bad; very bad). I have at least 4 raters, so I need a interrater-reliability for multiple raters and I also have many ratings (1 observer gives 200 ratings)
There are a few options, but I don't know which one I should use: Fleiss Kappa, Krippendorffs Alpha, Intra-class-Correlation, Kendalls W
At first I tried the Fleiss Kappa, but the value was very low (0.12) - The reason for that is probably that I have to calculate the weighted Fleiss kappa, but many scientific papers say that Fleiss Kappa is for nominal data.
So I´m a bit confused right know, because I think there's not really a consensus which interrater-reliability is the "best".
I hope someone has more experience and could give me an advice for my study.
Best regards,
David Riegler