I have a question about reliability/agreement.
100 raters (teachers) evaluated 1 student presentation based on 5 criteria (introduction, eye contact, etc.). Each rater rated each criterion. Each criterion was rated on a 5-point interval scale.
I would now like to calculate the inter-rater reliability for each criterion individually. In other words, to what extent do the teachers agree on the assessment of the criterion of eye contact, for example.
Since I only want to analyze the reliability of 1 item, I believe that many common reliability methods are not applicable (Krippendorf's alpha, ICC).
So my question would be how I could calculate this agreement/reliability? Thank you very much for your help.