I have 2 raters who are coding 197 entries. There are 6 codes (including NA: not enough information to code). The codes are essentially categorical. To make it more complicated, there can be 1 or 2 codes per entry. All but "NA" can be paired with another code. I would like to look at consistency not just absolute agreement with the paired codes because agreement on one of the codes codes is better than agreement on neither.
I kept NA as a legitimate code rather than as missing data because there were instances in which one rater thought it was NA but the other entered a different code.