In trying to establish inter-rater reliability, what are the appropriate statistics to be employed for a tool with a 5-point Likert rating scale?

More Sravanti Ghosh's questions See All

Does psychological resilience impact coping with a chronic illness?

I am interested to explore the role of psychological resilience in adjusting and coping with a chronic illness such as diabetes.This is the more or less broad area of interest. However, I am...

08 September 2013 3,831 41 View

How does one derive the standard deviation of a scale?

Dear all, I am working on analyzing data from a survey on student satisfaction. The survey contains items with a 7-point Likert response format that produce 12 scales related to different areas...

05 August 2024 2,141 4 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Hello! I have this scale which had 10 items initially. I had to remove items 8 and 10 because they correlated negatively with the scale, and then I removed item 9 because Cronbach's alpha and...

01 August 2024 4,606 7 View

I need a reliable source or an example supported by excel sheet to understand Fuzzy Vikor?

27 July 2024 5,916 1 View

Is a reliability test necessary in my survey on translations?

Dear all, I gave 116 respondents 18 translated sentences and asked them to indicate their levels of acceptance of these translations on a five-point scale. Some translations result from strategies...

24 July 2024 8,245 5 View

Can you suggest reliable procedures to get displacements from accelerations in frequency domain ?

I have identified many solutions. I need suggestion from somebody with application experience of this topic to identify the most reliable and robust procedure.

21 July 2024 3,465 5 View

What is the Scopus and Beall's dilemma?

I've found that some journals are both Scopus-indexed and listed on Beall's list as predatory or potentially predatory. Why does this discrepancy occur? Are there any more reliable platforms than...

12 July 2024 5,158 1 View

Are open access journals reliable and difference between an open-access journal and a paywall journal?

12 July 2024 8,971 2 View

What are the reliability and validity of a measure in qualitative research and ways to enhance the trustworthiness of qualitative data?

12 July 2024 5,374 1 View

What is trustworthiness in qualitative research and how can you improve reliability accuracy and validity?

12 July 2024 9,035 6 View

Ronán Michael Conroy

Kappa with inverse-square weights gives you the same answer as the type 1 ICC, which I think is probably the best option simply because it's hard to articulate an interpretation for a given value of W (though I would love to hear people try!).

Jochen Wilhelm

Another option for ordinal data is the rank-correlation (Spearman).

I also suggest to make a graphical representation of the results and interpret the picture. Examples are heatmaps or scatterplots (see attachments).

Note: the symbol size in the scatterplot and the color intensity in the heatmap indicates the number of corresponding ratings.

Robert James McClelland

Cohen's Kappa may suit?

Cohen’s Kappa (1960) is one of the most widely used indices of rater

agreement it overcomes the problem of chance agreement. It gives a value of between 0 and 1. 0 being the lowest level of agreement 1 being the highest. There are clear benchmarks for the strength of agreement have been constructed

for its use to aid in communicability. However it is susceptible to marginal homogeneity.

There is also Gwet's AC1 (2008) this is an alternative statistic. Like the Kappa it takes a value between 0 and 1. It is recommended that the same benchmark scales as those for the Kappa can be used for communication of levels of agreement. It is not sensitive to marginal homogeneity and positively biases for trait prevalence. It can be extended to multiple raters; It can deal with both nominal and ordinal data and it can deal with missing data.

Eve B Carlson

I think what kind of data you are analyzing might matter here. What kinds of items? What construct? What makes most sense might vary depending on whether the ratings are opinions that could reasonably vary or whether there is some known value that all raters should agree upon. You might be setting too high a bar if you use a statistic like Kappa, depending on what is being measured.

Paul E. Spector

The following paper might be helpful:

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428. doi: http://dx.doi.org/10.1037/0033-2909.86.2.420

Sravanti Ghosh

My readings suggest Cohen's Kappa provides levels of agreement between raters for nominal ratings. Also it may be limited in its usage with two raters only. What about cases with multiple raters and for Likert rating scale items which are considered as interval or ratio data? I am primarily interested in exploring the degree of consistency in ratings per item across the raters, I was suggested Kendall Coefficient of concordance. Is that appropriate? Please help ...

Your readings then don't include the calculation of weighted kappa. Using inverse-square weightings is commonly done, as this greatly increases the penalties for extreme disagreements, as well as giving a statistic that can be interpreted as an intraclass correlation

Thank you João Tiago Oliveira , that's one of the paper I have read in this respect.