I'm using Kappa Statistics to asses test-retest of a questionnaire that is all categorical data. I am just wondering if I should be calculating confidence intervals and if I should how do I go about doing that?
As far as I know, there are no tests of significance for Kappa, which is why people typically use a cut-off value to determine whether the degree of agreement as "adequate."
The kappa function in the freeware psych package in R (http://www.personality-project.org/r/html/kappa.html) also provides an estimate for the kappa confidence interval. Also, you can always construct a bootstrap interval.
There is a good explanation in Altman, D.G. et al (2000) Statistics with confidence, 2nd ed. Bristol: BMJ books (which is rather cheap). Here is cited the formula for Kappa CL and a worked-through example.
Along with the book comes an excellent program which can calculate all sorts of confidence limits, including for Kappa.
As some people have problems with installing this now quite old program, you can contact me if needed.
I dug a little deeper and it turns out that there is an exact confidence interval for kappa, as opposed to the "approximation" originally proposed by Cohen -- although it turns out that his approximation is often good enough.
According to my search, the free PASS statistical software will do the necessary calculations.
I'd go for either a bootstrap estimation, or a Bayesian modeling of your problem, where your kappa is a derived statistic computed for each sample (but watch your priors !).