I am examining the reliability via test retest of a questionnaire that only has categorical variables. As a result I will be using kappa stats but I am unclear how to determine sample size. Any suggestions?
Kappa is a slightly tricky case for sample size calculation. Unlike, say, a t-test, there is no standard null hypothesis value. To test that kappa is greater than zero is a bit of nonsense – why would two raters who know anything at all about the topic not agree better than chance?
So you have to start by defining a level of kappa that is your 'null hypothesis' level. You might decide that a kappa of 0·4 was the upper limit of what would be unacceptably low.
You then have to define what would be the minimum acceptable level of agreement. You might decide that a kappa of 0·75 was the lowest value that would indicate "substantial" agreement.
Finally, you need to have an idea of the prevalence of the feature being rated.
Because there are so many potential scenarios, it's hard to give even a rough estimate of the required sample size.
You can use R to calculate sample sizes with the N.cohen.kappa command in the irr package.
For all kappa-like agreement coefficients, the sample size denoted as n depends on the relative error r and the difference Pa-Pe between the overall agreement probability Pa and the chance-agreement probability Pe also
N is the population size ,
r is the relative precision
follow this link given to get the method
Ref: Inter-Rater Reliability Discussion Corner by Kilem L. Gwet, Ph.D.
Kappa is a slightly tricky case for sample size calculation. Unlike, say, a t-test, there is no standard null hypothesis value. To test that kappa is greater than zero is a bit of nonsense – why would two raters who know anything at all about the topic not agree better than chance?
So you have to start by defining a level of kappa that is your 'null hypothesis' level. You might decide that a kappa of 0·4 was the upper limit of what would be unacceptably low.
You then have to define what would be the minimum acceptable level of agreement. You might decide that a kappa of 0·75 was the lowest value that would indicate "substantial" agreement.
Finally, you need to have an idea of the prevalence of the feature being rated.
Because there are so many potential scenarios, it's hard to give even a rough estimate of the required sample size.
You can use R to calculate sample sizes with the N.cohen.kappa command in the irr package.
Thank you so much for the responses. I was pretty sure that it would be difficult to figure out a sample size. I just need to be able to defend my choices. My study is trying to create a questionnaire that examines if people with chronic obstructive pulmonary disease are engaging in personal disaster preparedness and while it would be interesting to know on a scale of one to five how important having 3 days of food and water is, what I really want is to know if they have 3 days of food and water and this the categorical variable snag. As a results to examine the test retest I need to use kappa and thus finding the sample size needed to show my test retest is valid is going to be fun.