I am working with a medical doctor to assess the quality of measurements made by different types of instruments, i.e., a method comparison study. We have measured the value of a particular protein [I am not sure that a protein or not, I am on the statistics part!] using 4 different measurement techniques. One of the measurement techniques is a gold standard and we want to know whether the other techniques agree with the gold standard or not. We have more than 100 patients and we made two measurement for each technnique. Hence we have more than (100 x 2 x 4 = 800) measurements. My questions are:

1. I checked the literature and found the Bland-Altman plots. I found that they were previously designed for single measurement (Bland and Altman 1986) but then extended to repeated measurements in 1999. I will use them. But I found that there are another set of tests called ICC test (IntraClass Correlation Coefficients) which is very nicely explained by Koo and Li (2016) in A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research . However I couldn't figure out whether I can also use ICC for compare these methods since for these methods the paper always states "rater"s and I guess "the rater" always refers to a clinician but not the method.

2. We have a similar problem: 4 different techniques, more than 100 patients, two measurement per each patient) but the output is categorical i.e., disease or no disease. I understand that I cannot use Bland Altman and found Cohen's Kappa which is used to measure inter-rater reliability (and also Intra-rater reliability) for qualitative (categorical) items. However, similar to the thing that I have said above, I am not sure whether I can use it here or not.

More Mehmet Güray Güler's questions See All
Similar questions and discussions