I plan to use marker combination in my thesis. Is it right way to use values predicted by logistic regression with markers considered as predicted variables?
If I follow, you need to first estimate a logistic regression model with your two biomarkers as the explanatory variables, and save the predicted probabilities from that model. Then use predicted probability as the "test" variable in the ROC procedure. The link below has an example. It is a bit long-winded, but gets there in the end. ;-)
Hello Mara. Hanley & McNeil (1983) described one method for comparing AUCs (1st link below). But I believe the test due to DeLong et al. (1988) may be more commonly used nowadays (2nd link). However, before going ahead, you may wish to take a look at Demler et al. (2012), which I just found this morning, and need to read myself (3rd link).
HTH.
EDITED (22-Jul-2016): I just noticed that the link for Demler et al. (2012) is not working, so I've replaced it with another link that does (currently) work.
We can run a binary logistic regression to get the probability and then run a ROC curve using the probability as the test variable.
The whole process goes like this:
1) Analyse 2) Regression 3) Binary logistic, put in the state variable as the dependent variable, subsequently enter the variables you wish to combine into the covariates, then click on "save" and check the box "probabilities". This process will compute a new variable in your SPSS database, termed "PRE_1". 4) Run the ROC curve using "PRE_1" as test variable. 5) Obtain the result.
I want to know if I have 2 markers( one increased with the disease and the other one decreased) as prothrombin time and serum albumin in liver disease, if I did the "probabilities" by binary logistic regression , in the roc curve it asks if the smallest number means progression of disease or the largest what is the choice now
@ WA Gadalla: I assume you mean you used the LOGISTIC REGRESSION procedure with the two markers as explanatory variables, and then saved the predicted probabilities to a new variable in the dataset. Further, I assume you used 0-1 coding for the outcome variable, with 1=progression, 0=no progression. If all of that is so, then the predicted probability = p(outcome=1 | marker values). Again, if all of this is so, you should "tell" the ROC procedure that a higher predicted probability is associated with progression of disease.
thanks Dr Bruce Weaver · for your concern, but when I do that these tables appeared. I don't know what that means? in the first table Block 0: Beginning
it predicted all are patients
variable in equation constant only, and all the entered variable in table entitled variable not in the equation this means what?
in the second table Block 1 methods
it predicts one observed healthy case as patient and vice versa
also all variable now in table entitled variable in the equation but some variables have p>0.05 and others p
I recommend watching a video about logistic regression output in SPSS, which you can find easily on youtube. Some short answers here:
Those p values indicate the significance of corresponding variables. You should include those variables that you find necessary to stay in the model, irrelevant of their significance.
Since you are using predicted probabilities after the logistic model, you should select larger test result indicate positive test option. Those probabilities are obtained through a menu option under Save button.
If you wish to determine a cut-off point after ROC analysis, you use that table to find the appropriate cut-off.
Guys - there is something unclear to me. If we assume that for example between predictors (here: markers) may occur an interaction, didn't we need to use interaction factor in predicting probability? Or - when one of variables is insignificant in regression equation - to predict we shouldn't exclude it to make probability correct?
And then - other question: what does mean best cutting point for obtained ROC curve? I mean - probability we got already, and we got already difference between group (groupping/state variable). So - what new cutting point will brings us here?
"If we assume that for example between predictors (here: markers) may occur an interaction, didn't we need to use interaction factor in predicting probability?"
You can include interactions, polynomial terms, etc. in your logistic regression model, no problem. That doesn't change anything about how you save and then use the predicted probabilities.
"Or - when one of variables is insignificant in regression equation - to predict we shouldn't exclude it to make probability correct?"
Keeping or dropping variables simply on the basis of statistical significance is generally a bad idea. It may still be important to control for a variable, even if it is not statistically significant. See the point about models with all variables being significant at the Vanderbilt U Biostats Manuscript Checklist (link below), for example.
"And then - other question: what does mean best cutting point for obtained ROC curve? I mean - probability we got already, and we got already difference between group (groupping/state variable). So - what new cutting point will brings us here?"
You are attempting to identify an appropriate cut-point for use with new cases where disease state is not known. Bear in mind that depending on what one is trying to achieve, algorithmic methods that try to maximize both sensitivity & specificity (e.g., Youden's index) may not yield the best cut-point. You must take into account the relative costs of false positives & false negatives when determining the best cut-point.
Thank You for Your answer, but saving predicted values in SPSS makes difference for result depends of You're using intereaction effect, or not. Try it - i'm using 20.0-22.0 versions. Anyway, the prediction is based on the model components, why would it give the same predicted values of probability with and without the interaction component, since they are a component of the model?
I understand the problems caused by exclusion, for example, predictors using stepwise method, though - even publication indicated by You allows some elements of backward elimination method. This is a broader problem - some theoreticians believe that if the independent variable has no relation at all to a dependent one on a larger model and no method does not point to its relationship, then it is a source of unnecessary variance. For example, if you include the age variable in the name model, you might want to control the impact of this variable, but it might be better to look for a different model.
And about last question. I understand the predictive value of the model, but my question was about something simpler: how to interpret practically the cut-off point for interaction (obtained for a regression model containing two variables) we are discussing here? What does it mean for any of the variables? This is the cut-off point for the probability, not the variables that are subject of the study. Their interactions can be presented for example in the form of clustering graphs taking into account probability... but what is the cutoff point for the data derived from such a model (I remind: the cutoff point of the ROC curve of the probability of logistic regression model with two predictors)?
Hello Tomasz. I am a bit pressed for time today, and have not digested everything you wrote in your last post. For now, I'm just responding to the first paragraph, which has confused me a bit. At first I thought you were saying that the predicted probabilities differ depending on whether or not one includes the interaction; but then I think you said you get the same predicted probabilities with and without the interaction term. Only if the interaction effect is exactly 0 would you get the same predicted probabilities from those two models. Here is an (SPSS) example that generates a scatter-plot of predicted probabilities with and without the interaction term included. Clearly, the points do not fall on the main diagonal.
I appreciate your reply, it does not answer (any of) my questions (and in a moment in the thicket they will lose their sense). Maybe we should back to the conversation when you have time and read carefully? :)
Hello Tomasz. I'm on vacation now, and so still don't have a lot of time, so I'll just comment briefly. You wrote:
"And about last question. I understand the predictive value of the model, but my question was about something simpler: how to interpret practically the cut-off point for interaction (obtained for a regression model containing two variables) we are discussing here?"
I don't understand what you mean when you say cut-off point for the interaction. In my mind you specify a logistic regression model you think is appropriate. It may or may not include interaction terms. In either case, you save the predicted probabilities, and use them as the "test" variable in creating the ROC curve. So cut-offs are cut-offs on the predicted probability. This is why I don't know what you mean by cut-off point for the interaction. (The only possibility that comes to mind is you're asking what is the criterion for deciding whether to retain the interaction in your logistic regression model--but I'm not at all certain that's what you mean.) Perhaps one of the other followers of this thread can help?
If you do not have time even to read the question, then it does not make sense to write answer. Please stop it. :/
I asked if for markers interaction in logistic regression should be included the interaction factor for equation, because after all, we get other values predicted according to degree of assembly of the model. In a sense - which method is recommended by literature to get predicted (with interaction effect) values. You wrote it to the macros to generate the predicted values. (?)
https://www.medcalc.org/manual/roc-curves.php Just to let know what the cut-off point is.;)
I was interested in interpretation of the cut-off point for predicted probability with roc curve values. How to interpret the value for interaction factor - here for predicted values for it - below and above cut-of point obtained. ROC is counted for interaction so how to interpret the result for individual markers obtained from interaction?
I understand that in this way you can get three dimensional space chart for, for example, both markers which allows you to simply define areas, where the markers behave well / badly... although in 3D space in my opinion the graph for continuous variables is better. I was interested in whether the results of such an equation could be interpreted for individual factors (markers) and whether it would be better to separately use cut-off for individual factors, and separately for interaction ... But I think the topic can be terminated. ;)