I'm attempting to code open-ended responses to a survey asking respondents to like their likes and dislikes of different modes of videogame play. I want to get a sense of the distribution of these likes and dislikes across the different modes of play. Each response is relatively short- from a few words to a few sentences. Each response can generate one or multiple codes. I'm attempting to gain some level of agreement on the codes by comparing the codes generated by 2 raters on a random 10% sample. I've then generated a kappa for each code. The problems I'm running into are these:
1. some codes are so poorly represented in the sample that they generate either perfect agreement (a Kappa of 1) or no agreement.
2. some codes aren't represented in the sample at all
3. some codes are so obvious that they generate a Kappa of 1
My questions are:
1. Should I be concerned about perfect agreement when the code is blindingly obvious? e.g. 'no dislike'.
2. Can I generate a new random 10% sample and only look for the codes that were not significant or generated a perfect level of agreement due to poor representation? Or do I have to look for all the codes, including the ones that previously generated acceptable and non-controversial statistics?
I'm finding it hard to locate information on this kind of analysis and any help would be greatly appreciated.
Thanks!