For a multilevel analysis, we need to aggregate a lower level variable to a higher level variable. What is the cutoff point for Intraclass correlation (ICC(1) and ICC(2)? Is there any agreement for the cutoff point?
LeBreton and Senter (2008) have suggested that an ICC(1)=.05 represents a small to medium effect (p. 838), Bliese (1998) has simulated conditions where only 1% of the variance is attributed to group membership ICC(1)=.01) and, still, strong group-level relationships were detected that were not evident in the lower level data.
For assessing reliability of group-level means, ICC(2) 0.75 are excellent (Fleiss, 1986).
References:
Bliese, P. D. (1998). Group size, ICC values, and group-level correlations: A simulation. Organizational Research Methods, 1 , 355–373.
Fleiss, J. (1986). The Design and Analysis of Clinical Experiments. Wiley, New York.
James, D. L., Demaree, R. G., & Wolf, G. (1984). Estimating within-group interrater reliability with and without response bias. The Journal of Applied Psychology, 69, 85–98 LeBreton, J. M., & Senter, J. L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11 , 815–852
THis paper (sadly not yet complete!) discusses some of the issues especially in relation to measurement error of aggregated variables - I know of no rule of thumb;educational studies typically have 10 percent of the variation at the higher level and use aggregate variables
Paper is on RGate:
Do multilevel models ever give different results?.
Yes. Intraclass correlation. ICC(1) and ICC(2) varies a lot depending on the group size (the number of people who assess higher level variables). That is, if we have a small group, it is very difficult to get relatively good numbers. Any justification for this situation?
It shows one way to ameliorate this problem - that is use another multilevel model with level 1 explanatory variable as the response variable and use the model to estimate precision-weighted estimates at level 2 which are then used with the original response
The importance of using such ' contextual' or aggregate level 2 variables is considered here
I would like to provide some help here, but I confess to not understanding the original question fully. I believe that by intraclass correlations, reference is being made to intraclass correlation coefficients (ICCs) - the statistic that is used to determine the amount of consistency / agreement between sets of scores / data.
However, from that point I am lost. ICCs are often identified by two integers within parentheses - e.g., ICC(1,1) or ICC(2,1) where the first integer refers to the model of the ICC and the second integer refers to the form of the ICC. Furthermore, ICCs need to be identified in terms of their type (consistency or absolute) in addition to their model and form. It is important to be clear about what model, form, and type of ICC is being referred to at any point in time because there are 10 different kinds of ICC depending on those three variables.
So when someone refers to something like ICC(1), I am unsure what is being referred to.
If someone could set me on the right path regarding this, I might be able to offer additional help. In the interim, I would recommend a cutoff of at least .70 before agreement between variables could be considered "respectable", and at least .90 for clinical situations.
I tentatively suggest that an article that I wrote recently might be helpful. The reference is:
Trevethan, R. Intraclass correlation coefficients: clearing the air, extending some cautions, and making some requests. Health Services Outcomes and Research Methodology. doi:10.1007/s10742-016-0156-6
This article has not been allocated a volume and page numbers yet, but it is available online.
In it, I refer to different cutoffs for ICCs and provide references that indicate different sets of cutoffs as being suitable depending on particular contexts.
I would be happy to provide more advice if someone could clarify the original question for me.
LeBreton and Senter (2008) have suggested that an ICC(1)=.05 represents a small to medium effect (p. 838), Bliese (1998) has simulated conditions where only 1% of the variance is attributed to group membership ICC(1)=.01) and, still, strong group-level relationships were detected that were not evident in the lower level data.
For assessing reliability of group-level means, ICC(2) 0.75 are excellent (Fleiss, 1986).
References:
Bliese, P. D. (1998). Group size, ICC values, and group-level correlations: A simulation. Organizational Research Methods, 1 , 355–373.
Fleiss, J. (1986). The Design and Analysis of Clinical Experiments. Wiley, New York.
James, D. L., Demaree, R. G., & Wolf, G. (1984). Estimating within-group interrater reliability with and without response bias. The Journal of Applied Psychology, 69, 85–98 LeBreton, J. M., & Senter, J. L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11 , 815–852
I want assess patient radiology service satisfaction on 40 hospitals (primary, referral and teaching). As there is clustering effect, could reflect any about the cut-off point of ICC to calculate the design effect?
Araya Mesfin Nigatu The standard approach these days is to model the 'clustering ' effect and automatically correct the standard errors for dependency and simultaneously estimate the size of the between-hospital variation. It is seen not as a nuisance but of potential substantive importance.
Article Multilevel approaches to modelling contexuality: from nuisan...
@ Robert. Hi, Robert. Regarding your question, as my research is multilevel research and involves emergent construct, i might be able to clarify the confusion.
When testing emergent construct’s consistence and agreement, the model is by default one-way random effect model. so ICC (1) and ICC(2) in this context are ICC(1,1) and ICC(1,2) respectively. ICC (1) is for within agreement test, ICC(2) is for between group disagreement test. Bliese (2000) explained very well in his paper.
i am not sure whether I made it clear. Sorry, I self learned and English is not my first languages. Bliese (1998) and LeBreton (2003) proposed way to calculate ICCs independent of group size.
my question is: is there clear cutoff values for ICC 1 and 2 for emergent construct in psychological research . Also for random coefficient model, ICC 1 can test nonindependence, is there a clear cutoff value for that icc?
Liwei Xiao, I guess you are asking your question of me. I'm sorry, but I don't understand what you've written above as:
"ICC (1) and ICC(2) in this context are ICC(1,1) and ICC(1,2) respectively. ICC (1) is for within agreement test, ICC(2) is for between group disagreement test."
However, maybe the following would help:
1. I think there are not clear cutoff values for ICCs but, rather, recommended values. Different people make different recommendations. In my article, I mention the values that I believe are best. Those values are not as "generous" as the values that some people use. In my view, some researchers use more lenient values, and I suspect that's because they want their ICCs to look more impressive than they really are. I demonstrate that in my article.
2. I wonder whether my article about ICCs would help you. I have placed a copy on RG because it's no longer under a copyright embargo.
All the best as you continue to learn English. I very much respect people who take on a new language, particularly if they do so to work effectively within an academic context.
Robert Trevethan Hi Robert, that is using ICCs to test absolute agreement/consistence of emergent constructs. To justify reliability of emergent constructs, Jame’s rWG and ICCs are recommended to be used in conjunction.
There is recommended value of ICC (1) for nonindependence test, but I did not find any recommendation for ICCs for emergent constructs’ reliability test.