Let us suppose we have a new cheap and simple diagnostic test we want to evaluate against the expensive and complex gold standard for a highly lethal disease.
The gold standard test is dichotomous (positive or negative), but the new test returns two continuous results: let's call them "Result A" and "Result B".
Assuming the disease can be accurately diagnosed with the gold standard test, we want to
1) estimate the posterior probability of disease given the prior and the new test results A and B, i.e. P(D+|A,B)
2) define the best threshold values for both A and B
Given the high lethality, we're more interested in avoiding false negatives.
Let's suppose we have data like the ones in figure 1 (randomly generated data). Big red dots and small grey dots are patients whose gold standard test did result respectively positive and negative.
Which is the best model to evaluate such a test?
Logistic regression and ROC curve?
Clustering in machine learning?
Other?
Thank you.