I would like to test the reliability of a new questionnaire assessing global physical function in elderly patients. The questionnaire consists of 20 items with a 5 point (0-4) Likert response for each question with all responses oriented in the same direction. There are five sub domains. In testing for reliability I would like to use the weighted kappa statistic as our group feels the questionnaire uses ordinal data.
In the pretesting we have 47 patients who have completed a test and retest of the questionnaire with an interval of 2 weeks. I have calculated individual weighted kappa statistics for each patient test-retest pair. At this point I am unclear on how to proceed with demonstrating good/bad reliability of the questionnaire.
Firstly is it valid to obtain the mean of all the weighted kappa scores and use this? Or is it preferred to obtain the mean scores of all patient’s first and then perform the weighted kappa?
Secondly I note that in the literature the inter class correlation is often used instead of kappa even for likert type questionnaire could I be pointed toward the reasoning behind this.
Thank you.