I am currently doing my MA dissertation and required to code my data, but I don't have other coders to ensure interrater reliability (due to time constraints). As Mackey and Gass (2005) suggest, I repeated the data coding in 2 different periods (Time 1 and Time 2) for intra-rater reliability; however, the results in Time 1 and Time 2 were slightly different. If this happened in the case of multiple coders, they could discuss the disagreement in their coding and decide one definite set of coded materials. As I am the only researcher in a situation in which negotiation with other coders aren't possible, how can I decide which coding to use in my research? Thank you.
Additional info: I am doing research on (corpus) linguistics, specifically how writers express doubts in their research papers by looking at how many times, for example, the modal verb "may" appears in their texts. Since "may" can have multiple meanings other than expressing doubts (e.g. to express permission as in "You may go now"), I need to exclude those which do not function to reflect uncertainty. I have tried converting them into categorical data (e.g. 1 for expressions of doubts and 0 for non-expression of doubts) and I am thinking of using Cohen's Kappa for reliability test of my coding in Time 1 and Time 2. And perhaps I can try to resolve the little difference in both times by asking other people to help me judge/decide the definite sets of data to use.