I am working on increasing inter-rater reliability for a video coding project, and my advisor and I came to the conclusion that a weighted kappa would be the appropriate measurement to use (raters are watching video clips and rating levels of different attitudes/behaviors on a 1-5 scale, so an ordinal measure). I installed the spss extension to calculate weighted kappa through point-and-click.
But, when checking through an example, I got Kappa scores that were inconsistent with what I thought would happen, and I am wondering if the program is wrong, or if my understanding is off (and this measure may be unfit). I attached a picture to show the rating scores from 3 raters (A, M, & E) for a variable, and the kappa output I received for each pairwise comparison. When comparing the other raters to A, both M and E only disagree with A once, and each only a 1-point difference away. But, the weighed kappas paired with A are different, it appears because the 1-point difference is with a higher or lower number (4/3 versus 2/3). Is this correct? Shouldn't weighted kappa consider all 1-point differences equally and just considering whether it's 1/2/3 numbers away for the reliability?