How to obtain an overall weighted kappa for test-retest reliability to validate a novel questionnaire? Am I doing it correctly?

11 November 2019 3 732 Report

I would like to test the reliability of a new questionnaire assessing global physical function in elderly patients. The questionnaire consists of 20 items with a 5 point (0-4) Likert response for each question with all responses oriented in the same direction. There are five sub domains. In testing for reliability I would like to use the weighted kappa statistic as our group feels the questionnaire uses ordinal data.

In the pretesting we have 47 patients who have completed a test and retest of the questionnaire with an interval of 2 weeks. I have calculated individual weighted kappa statistics for each patient test-retest pair. At this point I am unclear on how to proceed with demonstrating good/bad reliability of the questionnaire.

Firstly is it valid to obtain the mean of all the weighted kappa scores and use this? Or is it preferred to obtain the mean scores of all patient’s first and then perform the weighted kappa?

Secondly I note that in the literature the inter class correlation is often used instead of kappa even for likert type questionnaire could I be pointed toward the reasoning behind this.

Thank you.

Christopher Koch Popular answer

Weighted kappa is used to assess inter-rater reliability. That does not seem to be the best option for a questionnaire. You might try this article for determining reliability with ordinal scales.

Estimating ordinal reliability for Likert-type and ordinal item response data: A conceptual, empirical, and practical guide (Gadermann and Zumbo, 2012)

https://pareonline.net/getvn.asp?v=17&n=3

Christopher Koch

Weighted kappa is used to assess inter-rater reliability. That does not seem to be the best option for a questionnaire. You might try this article for determining reliability with ordinal scales.

Estimating ordinal reliability for Likert-type and ordinal item response data: A conceptual, empirical, and practical guide (Gadermann and Zumbo, 2012)

https://pareonline.net/getvn.asp?v=17&n=3

Elisabeth Svensson

If you want to perform good statistics you should take into account the fact that you have ordered categorical data. Ths means that you should use rank-based statistical methods.

Svensson E. Guidelines to statistical evaluation of data from ratings scales and questionnaires. Journal of Rehabilitation Medicine 2001;33 (1): 47-8. doi:

10.1080/165019701300006542

Svensson E, Avdic A. Guidelines to calculation by the free software and interpretation of the measures of disagreement applied to reliability studies {homepage in the internet]. Available from http://www.oru.se/hh/Elisabeth_Svensson/Svenssons_metod.

Svensson E. A coefficient of agreement adjusted for bias in paired ordered categorical data. Biometrical Journal 1997;39:643-57.

Svensson, E. Application of a rank-invariant method to evaluate reliability of ordered categorical assessments. Journal of Epidemiology and Biostatistics, 1998; 3 (4):403-409.

Svensson E. Statistical methods for repeated qualitative assessments on scales. Int J Audiol. 2003; 42 Suppl 1:13-22.

Svensson E, Schillberg B, Kling AM, Nyström B. Reliability of the Balanced Inventory for Spinal Disorders, a questionnaire for evaluation of outcomes in patients with various spinal disorders. Journal of Spinal Disorders & Techniques 2012; 25:196-204. doi:10.1097/BSD.0b013e31821534da.

Gosman-Hedström G, Svensson E. Parallel reliability of the Functional Independence Measure and the Barthel ADL index. Disability and Rehabilitation 2000; 22 (16), 702-15.

Svensson E. Different ranking approaches defining association and agreement measures of paired ordinal data. Statistics in Medicine, 2012;31:3104-3117. (wileyonlinelibrary.com) DOI:10.1002/sim.5382.

Allvin R., Ehnfors M., Rawal N, Svensson E, Idvall E. Development of a questionnaire to measure patient-reported postoperative recovery: content validity and intra-patient reliability. Journal of Evaluation in Clinical Practice 2009;15(3):411-419. doi: 10.1111/j.1365-2753.2008.01027

1 Recommendation

Can you help by adding an answer?

Answer

What is the reason for current dropping in OER , LSV curve?

What may be the reasons for failures of Tube toi Tube Sheet Joints in Boiler Drum ?

I need the datasets of Microgrid for system identification?

What is the problem with these tissue culture plants?

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

I need a reliable source or an example supported by excel sheet to understand Fuzzy Vikor?

Reason for discontinuities in my Band structure?

Is a reliability test necessary in my survey on translations?

Is it fare for editors of reputed Journals to put on hold manuscripts more than six months without review updates, Are they harassing researchers?

Can you suggest reliable procedures to get displacements from accelerations in frequency domain ?