Interaction analyses through logisitic regression: what's the correct way to build the logit model?

Glauco Akelinghton Freire Vitiello @Glauco_Vitiello

05 May 2018 8 4K Report

I'm interested in assess effects gene-gene and gene-environment interactions on disease susceptibility using binary logistic regression models. All examples I've seen so far on this issue build their models in the following form:

logit = B0 + B1X + B2Y + B3(X*Y)

And I understand it as an estimation of the interaction of X and Y variables weighting their isolated effects in the same model.

But what if I know that X and Y don't have any significant effect in isolation on the outcome, but I'm interested in testing just whether their interaction might have. Is it plausible to build a model just with the interaction term plus the confounding variables (logit = B0 + B1Z + B2(X*Y))?

Thanks in advance!

Fabrizio Maturo Popular answer

Dear Glauco,

I suggest you to keep the direct effects of X and Y even if they are not significant and perform the extended version of the simple slope analysis to the case of logistic regression. In effect, it is questionable to remove the single effects because they play the role of "control variables" such as any econometric model.

Regards

David Eugene Booth

I suggest you follow the usual practice in experimental design. See the attached for some examples. Best, David Booth

Fabrizio Maturo

Dear Glauco,

Regards

Paulo R Benchimol-Barbosa

Hi Glauco,

Peace and blessings!

Interesting and intriguing question.

Well, let's get thing straight, first.

If two variables, X and Y, have individually negligible effect on the outcome, you may already have your answer.

It seems unreasonable to test the interaction of two negligible variables on a particular outcome using any multivariate model, at first, if both variables have negligible effect on their own. They will naturally be excluded from the multivariate model as the model is being developed. Furthermore, considering two variables as independent from each other, and making it simplified, the probability of the product of those variables (X and Y) will be the product of the probabilities of each one. Therefore, one can expect a zero probability as consequence (interaction, if you will).

That been said, in deterministic chaotic systems, those interactions may apply. I am not sure it is the case you are conveying in this forum.

Moreover, in deterministic nonchaotic systems applied to biomedical sciences, I cannot remind of an exception when two negligible variables yield a significant interaction on outcome in a multivariate model.

In genetic modeling, on the other hand, gene-gene interaction may yield relevant impact on outcome due to an effect on a third party metabolic pathway and/or compound, or else, even though those hypothetical genes individually have negligible effect on the outcome.

I agree with Dr Maturo. For academic purposes, I suggest, if I may, not to exclude individual terms from the multivariate model.

Let me know your comments, if you will, and the 'outcome' of your development.

Sincerely,

Glauco Akelinghton Freire Vitiello

Dr Booth, Dr Maturo and Dr Benchimol-Barbosa, thank you so much for your responses, they were very helpful and instigating.

I considered use stepwise entry methods in order to select variables in the model. Indeed, performing backward method on my data with both individual variants and interaction factors and a confounding variable, just the interaction factor and the confounding were retained in the model, and was exactly for this reason I came up here with this question. However, as shown by the work shared by Dr. Booth it may result in unstable models that might not be reproducible.

The problem I'm investigating is the contribution of polymorphisms affecting the expression of a cytokine and its' receptor in breast cancer susceptibility. We've seen that they are not significantly associated with brest cancer in isolation, the hypothesis is that the effects are only significant when both variants act together, increasing the expression of both proteins. However, working on cancer we have a lot of examples on our lab that may apply to the same hypothesis, e.g.: enzymes involved in viral immunity that were associated with breast cancer and might have effect when there is a viral infection, or metabolizing enzymes that might have its' effect only when associated with its' cognate substrate.

David Eugene Booth

Stepwise regression is very unstable and can't be trusted. See the paper I sent you. You seem to be confused about the difference between explanatory and predictive models. Please look at the previous paper I sent and the one I am sending you now. HTH, David Booth

Ajit kumar Roy

It is sometimes observed that main effects are not significant but interaction effect is found significant.In such situation interpretation will be based on interaction effect only.

Glauco Akelinghton Freire Vitiello

Thank you, Dr Roy!

How to assess APOBEC-generated mutations in Sanger-sequenced PCR amplicons from viral sequences?

How can I prepare input files for case-control permutation analysis using PHASE 2.1.1 software?

Does sample incubation overnight (4ºC) increases sensitivity of BD OptEIA ELISA assay for TGFB1?

How can I prepare virus for a TEM or SEM imaging?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?