Hello,

I have a dataset of 850 subjects and 6 potential raters. Each subject can be classified into 4 nominal categories. My idea is to split the dataset equally among the raters but holding out a subset for an initial calculation of agreement.

I wonder what should be the size of this subset so that all raters can evaluate it and calculate the agreement. From what I have read, the proper calculation would be Lights' Kappa (since there are more than two raters with fully-crossed design [Hallgren, 2012]).

Is there any software package that can help me estimate the size of that subset?

Best,

Bruno

Hallgren, K .A. (2012), Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutor Quant Methods Psychol. 2012 ; 8(1): 23–34.

More Bruno Penteado's questions See All
Similar questions and discussions