Multicollinearity is a problem in regression analysis that occurs when two independent variables are highly correlated, e.g. r = 0.90, or higher.
The relationship between the independent variables and the dependent variables is distorted by the very strong relationship between the independent variables, leading to the likelihood that our interpretation of relationships will be incorrect.
In the worst case, if the variables are perfectly correlated, the regression cannot be computed.
SPSS guards against the failure to compute a regression solution by arbitrarily omitting the collinear variable from the analysis.
Multicollinearity is detected by examining the tolerance for each independent variable. Tolerance is the amount of variability in one independent variable that is no explained by the other independent variables.
Tolerance values less than 0.10 indicate collinearity.
If we discover collinearity in the regression output, we should reject the interpretation of the relationships as false until the issue is resolved.
Multicollinearity can be resolved by combining the highly correlated variables through principal component analysis, or omitting a variable from the analysis.
You check the correlation between each of your independent variable. It should not be too high (> .7). If two independent variables have a correlation of 0.7 you can omit one.
You can also in SPSS check your Tolerande and VIF values. If Tolerance value is less than .10 and VIF above 10 then multicollinearity is at hand.
some fields. sometime you need to keep 1 variable in your study to achieve research objectives even if the correlation still higher than .7 but less than .8. u may agree with me?
PCA is normaly used for avoiding multicollinearity, as well as simply procedure as performing PCA and select the most relevant variable per component (normally in Finances, where the interpretation of the variables of the model is very important, in other disciplines you can use all the variables per component achieving a better fitting results).
Let's be Y=dependent variable, X bock=independet and correlated variables.
However, I always preffer PLS. It can be used instead of PCA, as the result are different components that summaryze the information of the correlated variables, but the components are determined depending on their relation with the dependent variable to be studied (relation between X and Y), thus better results for prediction can be achieved. PCA determine the components regarding the relation inside the X block.
Run a factor analysis and see how the variables cluster through their correlation. Than you can eliminate the variables that are superfluous because their variance is already represented by other more relevant variables.
Yes Gloriam, but this variables eliminated because their are better explained by the others of the same factor may be the variables that better explain de DEPENDENT variable (Y). Maybe the percentage of variance that you are loseing eliminating them is the necessary to explain Y.
Thus, PLS is a very good solution...because it selects the factors and variables depending on the relation with the variable Y.
Morphological indices are used to describe the micro-structure of bone, but many provide redundant information. We selected the relevant indices based on how they could help predicting the mechanical behaviour of our bone samples.
We first determined the best predictor among these indices (the one that will remain in the model no matter what) and then conducted a step-wise backward selection of the other indices based on the variance inflation factor (VIF).
Article Bone Volume Fraction and Fabric Anisotropy Are Better Determ...
Article Not only stiffness, but also yield strength of the trabecula...
You may remove the highly correlated predictors from the model. In e.g. chemometrics, the normal procedure is to calculate Principal Components, which reduce the number of predictors, which are then by definition uncorrelated. Then, the PLS prediction modeling can be done.
PCA is a good approach for PLS modeling to avoid multicollinearity, however given that PCA uses the variance-covariance matrix all data must be standardized given the scale bias of the covariance statistic, which makes the coefficients in the PLS model difficult for direct interpretation; PLS is better approach, but when using Least Squares regression as one of many possible approaches, remove independent variables with Variance Inflation Factors (VIF) > 10 to avoid multicollinearity among the regressors. Consider Bayesian Additive Regression Trees (BART) as a modeling method, you may be surprised by validation results.
Bayesian approach may be considered when your data have collinearity problem. The Bayesian can be seen as Ridge regression with Ridge parameter relates to the prior regression parameters. You can set the prior close to 1 and don’t worry about the high BIAS of your estimate. I have done the simulation the Bayesian approach has a better power compare than the Ridge regression.
None of these responses are incorrect, but there is an "art" aspect to this that I haven't seen mentioned. From a theoretical standpoint can you argue that of the two regressor variables, one may be explaining some of the same behavior in the dependent variable? If so drop the other, if its less important to your story. What story you are trying to tell matters in this case. I struggled with this until I read Gujarati's thoughts on multicollinearity in his econometrics text. I suggest you read it. It'll help you recast your question.
First you shall be careful in the selection or making of your model and it is designed to avoid any form of repetition in the collection of the data and secondly if it is present in your data Check those below procedures first to find level, degree and severity of that multicollinearity in your data, before you think of it is solution if is present in your data because even the solution is according to above mention factors
1-The Tolerance level
2-The Farrar- Glauber tests
3-Eigen values and Eigen vectors
4-The Variance inflation factor (VIF)
And meanwhile a total elimination or avoidance of multicollinearity is not possible all procedures only reduce it to a minimum level and you run the procedure again to see the level it reduce numerically after you select method of it is reduction e.g
1-Dropping one of the most correlated variable in the model
Option 1: Remove from the model one of the variables that has a high VIF (Variance Inflation Factor), preferably, the one that has less correlation with the dependent variable (is what is usually done).