In leon gordis, these methods were explained under validity and I felt it logically correct, but few other books covered these measures under diagnostic agreement.
Sensitivity and specificity are used in context of a method of assessment and measure validity. Usually any new test is assessed against a gold standard in which case sensitivity and specificity is about diagnostic agreement so both places where they are described have legitimacy.
Sensitivity and specificity are measures for validity The sensitivity and specificity are measures for accurracy and power of the test which is done through measuring the degree of agreement between a test and a gold standard. Reliability of the test is measure of the precision of a test which is done through measuring the wideth of confidence intervals for the measures of accurracy.
Sensitivity and specificity a long side with the 2 predictive values are measures of validity of screening test . These 4 measures can be assessed by compare results of the screening test against gold standard. Reliability means consistency of a screening test results by change in time and person, It can be assessed by agreement.
Validity is the extent to which a measurement is well-founded and corresponds accurately to the real world. In that regard, specificity and sensitivity assess the validity of your test.
Validity means that a test is measuring what it purports to measure. Reliability means that the extent to which the test will give the same result on repeated measurements. Sensitivity and specificity together determine diagnostic power i.e. validity and reliability of a test.
A dichotomous diagnosis problem gives rise to a four cell contingency table with three degrees of freedom: TP+FP+TN+FN=N. This means that any three independent measures suffice to completely specify the outcomes. Any further measures used will necessarily be dependent on those first three independent measures.
Specificity and Sensitivity are measures of validity or informedness with respect to the disease rather than accuracy or reliability of the test, and are commonly plotted against each other in ROC (Receiver Operating Characteristics), allowing optimization of the operating point, e.g. by tuning or setting a threshold. However, usually what is plotted is mirror image: Specificity (True Positive Rate) vs 1-Sensitivity (False Positive Rate). The best operating point in terms of maximizing informedness is the one that is furthest from the chance diagonal (TPR=FPR). Under the assumptions that the (error) cost associated with the set of all negative cases matches that for all positive cases, this is also the minimum cost solution. Informedness = ΔP = TPR-FPR = Specificity+Sensitivity-1 is the distance of the operating point above the chance line, and give the probability of an informed decision (as opposed to guessing). Informedness = 0 corresponds to guessing, that is operating at chance level or being on the chance line. Different cost assumptions change the gradient of the equal cost lines from the 45 degrees of the chance line, but these isocost lines remain parallel, and the optimum cost points are those that lie on the isocost line that is the tangent to the curve closest to the point (TPR,FPR)=(1,0) or (Sp,Se)=(1,1).
The area under the curve (ROC AUC) allows seeing not only how good this optimum operating point is for the current cost and prevalence conditions, but how much leeway there is for effective operation as they change, which goes to reliability. The Gini coefficient, Gini = 2AUC-1 corresponds to Informedness for the triangular curve joining the operating point to the chance line at (0,0) and (1,1). To the extent that an actual operating point tuning curve dominants this, and there is a region between the curve and the triangle, this is identifying places that will do better than would be expected by chance-level interpolation - the Informedness component of Gini or AUC corresponds to Certainty, and the extratriangular component corresponds to Consistency (this ROC ConCert approach thus goes beyond evaluating a single operating point with its individual Specificity, Sensitivity, Positive Predictive Value and Negative Predictive Value and addresses reliability).
An equivalent graphical analysis can be done with PPV and NPV, defining Markedness = PPV+NPV-1. Whereas Informedness reflects the probability of an informed diagnosis of the true condition, Markedness reflects the probability of the true condition affecting or marking the fused set of variables (test outcomes, symptoms, etc.) used for diagnosis. It tells you more about the test than the patient!
This actually depends on whether your test and the reference have correlated errors and how close your reference is to the truth. If errors are uncorrelated or the test is close to the truth, sensitivity and specificity are good measures of validity. If errors are correlated, sensitivity and specificity only indicate how close your test is to the imperfect reference, i.e. a measure of reliability.
This is covered with a lot of clarity in the linked text.
The basic components of validity are sensitivity and specificity. Related measures are the false negative and false positive rates, in addition to overall agreement. These reflect the test ability to measure what it is supposed to measure. Reliability is a measure of internal consistency of the test with itself when repeatedly used to measure the same "event"
Thanks for starting this discussion Umesh. Glad to see all these answers. One additional point, the outcome should be truly dichotomous for sensitivity and specificity to work as indicators of validity. That is, the outcome needs to be valid before you can test the validity of the predictor. Many outcome variables are treated as dichotomous when they are not, such as mental disorders which are sometimes codes as diagnosed vs. not diagnosed, or similar. In such cases, where there are many mid-points, sensitivity and specificity, and other checks, will not be entirely accurate checks of validity but might give some rough indicators of variable associations.
Hello I have a question what level of these Sensitivity,specificity and positive and negative predictive values are accepted as a sufficient level (70% or 80% or 90% ?)..