As you know, in epidemiology literature, to report the prevalence of a disorder such as depression in the population, the best cut point is used, according to the Youden index or AUC chart. Let's see an example in this regard:
Consider a tool with 2 questions to diagnose depression (PHQ2) whose total score will vary from 0 to 6. In the psychometric evaluation of this tool, the sensitivity and specificity values were checked for scores 1 to 6, and a cut off point with the best sensitivity and specificity was determined. For example, in the case of this tool, for a score of 3 and above, the value of sensitivity and specificity is 0.61 and 0.92, respectively, and it is the best cut off point for reporting the prevalence of depression.
So far the usual method has been explained.
Now, I present the problems of this method and provide a solution for it. My question is, are the problems raised correct? And is the solution presented below conceptually and theoretically compatible with epidemiology, disorder diagnosis, probability laws, or does it have a fundamental and conceptual contradiction?
The drawback of this usual method is that it considers all the people whose PHQ2 score is 3 or more as depressed and all the people whose PHQ2 score is 2 or less is considered non-depressed. For example, according to all studies that have compared the relationship between PHQ2 values and a golden standard, between 16 and 36% of those whose PHQ2 score is 3 are really depressed, and those whose PHQ2 score is 4 or more are only 40 to 60% depressed. And so for other cut points the prevalence of being truly depressed will change.
Solution: When we have the sensitivity and specificity of all the cut points from 1 to 6, we can calculate the probability of being depressed or not of people who got each score separately for that cut points. For example, using the specificity and sensitivity reported in a study, the probability of depression under the condition of zero PHQ2 score, with this calculation method, will be 0.4%. This means that in this study, only 0.4% of those who had a PHQ2 score of zero were depressed. This probability will be 3.4%, 10.5%, 24.1% and 42% for scores 1, 2, 3 and 4 or more respectively.
By having these probabilities for each cut point and having the PHQ2 scores of people in a community, the true probability of depression, which is the same as the prevalence of depression in that community, is obtained.
Therefore, in epidemiology and not in-office diagnosis, it seems that to estimate the prevalence of depression, instead of counting all the people whose PHQ2 score is 3 or more, it is better to count the percentage of people who have each of the PHQ2 scores.