I am aiming to look at interrater reliability.

The exact tests are yet to be decided, but I'm working on the assumption that the resultant data will be ordinal (or can be coded as such).

I want to look at the interrater reliability between two groups (expert and novice).

I believe I'll be using the krippendorff alpha based on the flexibly for missing data, better for multiple raters....and it's what others have done!!

For my proposal they want a sample size estimation. I can find examples for Cohen's Kappa and the usual power/sample size calculation, but I'm not sure which to use.

Also, would I be needing the sample size as per raters or data points. Ie 4 raters scoring 100 test would give more data points than 100 raters scoring 3 tests.

Any signposting or comments would be great. Thanks

More Alex Sheldon's questions See All
Similar questions and discussions