I have been coding the likes and dislikes of playing videogames in different modes of play. After merging categories, ditching some that had too few cases, coding and recoding, and finally generating Kappas between two raters I have some codes that have have non-significant Kappas, some that are significant but low (under .4) and some that are significant and high (over .8).

Two questions:

1. I would assume that the non-significant Kappas indicate that those codes need to be looked at again and possibly eliminated or merged where possible, however would the same hold true for codes that generated low Kappas?

2. How would you treat the difference in agreement between these codes in terms of their interpretation and application? 


