If both dependent and independent variables are categorical, do I have to check for confounding? And what test should I use to find the confounding variables using SPSS?
Hi Abdualrahman, I think you still have to check for confounding. If taking out a variable of a multivariate logistic regression model makes the coefficients of the remaining independent variables change substantially, then that particular variable could be a confounder. Some people say that 10% is enough for the change to be substantial. Cheers, Puguh from Indonesia
You can calculate the Variance Inflation Factor (vif) for all the variables considered, if that is too high for any of the variables, this indicates a high colinearity. If you are using R you can use vif(model). In SPSS VIF values are also calculated and may be shown in the coefficients table.
My view is that confounding variables is a theoretic matter, not a statistical one.
A model you build for your data may well make you suspicious that there is a confounding variable, but no model will tell you which is the confounding variable. You will have to find that out. And to find the confounding variable you will have to turn to theory.
Let me exemplify this.
There is a huge correlation between crime rate and the number of churches in the cities in the USA. Now, I hope nobody thinks that the number of churches CAUSES the crime rate. In fact there is a confounding variable that is directly related to both crime rate and number of churches. You may try to guess what this variable is, but it easier to guess correctly if you have the theoretical knowledge required (in this domain).
The answer is the size of the city. Indeed, the bigger the city the more churches there are in the city, and also the bigger the city the higher the crime rate in the city.
Another example is that of a quadratic relationship between an independent variable (IV) and a dependent variable (DV), e.g., very high VD values for very low IV values, then lower and lower VD values as IV values increase, somewhere a minimum value on the DV for a mid-range IV value, then DV values that increase again as IV values increase. It is very likely that this quadratic relation is the result of a confounding IV variable that was not taken into account. Here again, how to find out what that confounding IV is will depend on the theory of the domain this example comes from.
Anyway, once you think you "know" what the confounding variable(s) may be, then you do your modelling work to check whether you were right.
But I really do not see how one could do this the other way around. It does not make much sense...
It depends on your biological model. Before start testing for effect modifyiers or confounders you should define this model. In multivariate regression, especially in mixed linear models, the risk for over fitting is quite large. Therefore you should make very clear when you construct such models that you do so based on theoretical model a priori. In studies with small sample sizes you are extreemly at risk for such errors based on over fitting (rule of thumb: 9 subjects for each included variable). In order to prevent over fitting it should be wise to perform univariate analysis (correlations or difference testing when using multiple populations) and to reduce te number of variables. Insert only variables with a p-value of 10%). When using interaction terms you can determine what the effect of the variable is within your model.
It quite hard to find these confounders with just one method. However remember that the validity of your model is only limited to your sample and that it is only a best fit of reality.
Just to add another textbook suggestion to that by Pieter Fouche, this is another good one, by Kleinbaum and Klein: http://www.amazon.co.uk/Logistic-Regression-Self-Learning-Statistics-Biology/dp/1441917411/ref=sr_1_4?s=books&ie=UTF8&qid=1399370140&sr=1-4&keywords=logistic+regression