It appears that what you are looking for is receiver operating characteristics? See for example https://en.wikipedia.org/wiki/Receiver_operating_characteristic
You can compare two sensitivities as you compare two proportions. If you need an understanding of predictive ability of thesescales, then you can use AUC values to compare (because sens and spec depend on the cut-off).
I think Mr. Mossong is correct. It seems you're looking to map the Receiver Operating Characteristic (ROC) curve(s).
I've used this sort of thing in the context of radiation detection. A ROC curve maps out the trade-off between:
- low threshold of detection = high sensitivity to small signals + high false alarm rate
- high threshold of detection = low sensitivity to small signals + low false alarm rate
The other consideration is signal size - One usually maps out ROC curves for various levels of known signal. Large signals are easy to detect, so the ROC curve will indicate that you can operate with a fairly high threshold -- you'll detect the large signal when it's there, and not be bothered with false alarms. Smaller signals are more difficult. So ROC curves help you understand the trade-off between "how low a signal do I need to reliably detect...and what true positive rate do I consider 'reliable detection'?" and "how many false alarms can I live with?"
I agree with the previous suggestions, you can plot the sensitivities and specificities at different cut-points for your scales for predicting the outcome as different ROC curves, and then compare the areas under each curve (AUCs). If the scales are all from the same cases in predicting the outcome, because you need to take the correlations between scales into account.
So I compared the three AUC values, and they did not statistically differ from each other. On the other hand one scale has a sensitivity of 67% whereas the other's sensitivity is 80%. Is this to say that, although the sensitivity values are largely different, the scales are statistically similar at reporting sensitivity values?
Sedat, you don't really need to do any statistical test in this case... just look at the highest AUC (or, eventually, the best balance between sensitivity and specificity for your purposes: e.g. for screening tests, what matters is a higher specificity because you don't care that much about false positives; for a confirmation test, on the other hand, you need high sensitivity).
Well, I compared the ROC curves, and none of the AUCs were statistically different from each other.
The research is about three scales for the detection of high risk individuals for suicide (predictors), and there is also a clinical assessment to identify the high risk patients (gold standard). How well the scales performed in identifying the individuals at risk was determined by ROC analyses, and the cut-off scores for each scale was calculated. The statistical analyses also revealed sensitivty and specificity values for each scale. I was wondering whether these sensitivity and specificity values were statistically different from each other. Such a difference would help choose the scale to be used for identification of the cases at risk. This is what I am trying to do. I need to demonstrate whether the sensitivity and specificity of the scales is statistically different.
Do you think that the AUCs would be enough to say that the scales are equally effective to detect the cases at risk? Or do I still need to compare the sensitivity and specificity values calculated for the cut-off values?
Yes, but how do you decide which sensitivity and specificity is better than the other one, say for example when one scale's sensitivity is 70%, and specificity is 80% whereas the other one's sensitivity is 80%, and specificity is 70%?
To answer this question, you should have clear in your mind which is the purpose of your questionnaire (and especially of you threshold score). Wanna use it as a screening tool? Then, you want specificity to be higher, because for screening purposes it is better to be sure to take all the true positives (even though it "costs" taking also a certain number of false positives): this way, you are sure that you are disclosing all the subjects at risk of suicide and you can take care of them. If it is intended to be a diagnostic tool, perhaps you'd rather prefer to be sure about the diagnosis (i.e. that if the questionnaire says a subject is high-risk, he actually is) and thus you want the higher sensitivity.
There's no sens/spec better than another - it all depends on what you need.