I have a question about determining the change in the decision reliability pre-post introduction of a system to support decision-making.
I have designed a fully crossed study design where 5 raters are used to rate a set of indicators with each having 3 categories (Peak, Trough, Normal). First, the set of indicators are rated and the inter-rater reliability is assessed in the current state (pre). Next, a digital system is introduced to support decision-making and after a specific time period (4 weeks) the same set of indicators are rated by the 5 raters to determine the reliability of the post state.
For determining the reliability at the pre-post states, I have specifically chosen Cohen's Kappa since the data are nominal. As advised by others researchers and since the raters are not assigned randomly (hence Fleiss' Kappa was ignored) the final reliability of each state will be determined by considering the 5 raters as pairs and evaluating the mean Kappa across the pairs.
Hence, the proposed study design will result in two multi-rater Kappa scores at the two time points. However, I am unsure which test could be used to test if the two Kappa scores at the two time points are different from each other? Any thoughts would be very helpful for my research.
Thank you very much!