I have classified a set of code review comments by myself and need to check the inter-rater reliability between me and my 2 supervisors. Obviously, they cannot categorize all 417 comments I categorized for this. How can I determine a suitable number of comments for them to categorize so that I can calculate the inter-rater reliability?

Similar questions and discussions