I’ve performed a study in which 53 subjects each evaluated whether a set of 25 imaging studies was normal or abnormal. I also have the “gold standard” of whether each of the 25 imaging studies was truly normal or abnormal. What is the best way to assess the overall accuracy of the subjects’ evaluations relative to the gold standard? Can I just report gross accuracy (% correct) with a confidence interval? Can I calculate the sensitivity and specificity for each subject and then report the mean sensitivity and specificity?
I don’t want to report inter-rater reliability (kappa, ICC) because it doesn’t matter if the subjects agree with each other, only if they got it right.
Part 2 of this study involves 53 subjects that each evaluated whether a set of 25 imaging studies should be in one of three categories (low, medium, high). The gold standard for the correct category out of 3 is also known for each of the 25 imaging studies. What would be the best way to assess how accurate the subjects were at identifying which category out of 3 each study was in?
Thanks in advance for any advice.