To assess interrater agreement/reliability of a tool (containing categorical answers) that aims at screening health related issues in older persons with intellectual impairments we have to deal with the situation that we have about 30 rater pairs. Background is that the persons to be assessed are living in many but small institutions and it needs some familiarity with the person to rate these issues properly. I've screened the IRR literature which was not surprisingly scarce as this is a rather uncommon IRR situation. I would be grateful to get suggestions on the most adequate IR measure and how to get one final IR value in the end.