I'm developing a questionnaire to assess observational data in an area of research which is notoriously prone to low inter-rater reliability. Hence, I'm looking for factors in questionnaire design that are generally known to show positive impact on inter-rater agreement. Unfortunately, any means of physically and/or verbally interacting with the raters prior to data collection is not an option, so methods such as Frame of Reference (FOR) training are not an option in my case.
My initial choice regarding the response format would be a behavioral anchored scale, but previous research have shown insufficient improvements in agreement of raters. I don't wish to open up a discussion on the topic on idiosyncratic variation - which will remain a problem - but instead focus on possible improvements regarding response format / scale construction. Perhaps someone can supply me with interesting research regarding this topic?
Many thanks!