It is told that in the questionnaire design, if the similar types of items (IV and DV) are placed side be side will contribute to multicollinearity problem in statistical analysis. Need advice.
It is not suggested to use the items under the construct if the multicollinearity SCORE is HIGH. HIGH MULTICOLLINEARITY AMONG THE ITEMS MEANS....THE ITEMS IS NOT INDEPENDENT EACH OTHER.....THEREFORE IT WILL AFFECT THE QUESTIONNAIRE RELIABILITY.....THEREFORE ONLY ITEMS WITH LOW SCORE OF MULTICOLLINEARITY SHOULD BE RETAINED FOR FURTHER ANALYSIS....NOW BACK TO YOUR QUESTION IT IS IMPOSSIBLE TO HAVE SAME ITEMS UNDER THE CONSTRUCT OF IV AND DV....BECAUSE IT WILL CONTRIBUTE TO HIGH SCORE OF MULTICOLLINEARITY...
Likert Scales assume that items are repeated measures of the same construct. High correlations between items will lead to a high reliability coefficient. This allows the researcher to sum up or average the items. Thus, each one of those sums or averages is an equal interval scale, and can be used in correlations and linear models.
Nonetheless, very high correlation between variables in linear models (of any kind) is a problem, and the researcher has to consider whether the highly correlated variables are measures of the same construct and using both does not improve the model fit. in such cases you should only keep one of the highly correlated variables in your model.
To sum up, the difference is: since individual items do not enter a linear model as predictors they actually need to be correlated, in order to form an equal-interval-scale. When an equal-interval-scale enters a linear model, it should not be extremely highly correlated with other predictors (|r|>0.9).
I worked with continuous data, and did not realize you were talking about a likert scale, but if you are, then the answer from Nikolaos seems to make sense to me. I am not familiar with likert scales, but it sounds like he was talking about cross checks? (Is that right Nikolaos?) At any rate, where he said to only keep one highly correlated variable sounds good, and in fact, if anything like working with continuous data, it is usually best not to use more regressors than necessary because of overfitting to noise, and other interactions between regressors in addition to multi collinearity
Yes potentially. One has to carefully judge based on the available evidence from existing literature and select the variables in such a way that there is no set of highly correlated variables. Or else, same exercise has to be carried out at the time of analysis with the help of correlation tables and the significance (p-value) of different exposure variables with the outcome. In any case, at least some amount of individual judgement is required. Standard practice is that find out p-values taking one independent at a time and select those with p-values less than 0.1. Look at the correlation table and if a group of variables are highly correlated, take only the one with highest p-value, to proceed with multiple regression.
Multicollinearity is a problem that occurs when you have a high level of correlation among your independent variables. It does not refer to the correlation between your IV and DV.
I agree with you Mr David L MORGAN.....because MULTICOLLINEARITY is to test whether the items under each construct of IV is high. IF MULRICOLLINEARITY IS HIGH IT SHOWS THAT THERE IS A PROBLEM OF RELIABILITY OF THE ITEMS...
SO IT IS NOT ADVISABLE TO MAINTAIN ITEMS WHICH HAS HIGH MULTICOLLINEARITY....FOR FURTHER ANALYSIS....BUT IF THE MULTICOLLINEARITY IS LOW.....IT IS ADVISABLE TO RETAIN....
David Morgan has answered your question succinctly. I am reflecting on your question: about IV and DV being placed close together in a questionnaire. Is the concern there more to do with order bias (rather than multi collinearity)? In other words, the order of questions on the IV leads the respondent to answer the DV questions in a pattern.
Therefore, the high correlation between IV and DV is NOT multicollinearity but rather that any relationship between DV and IV is an artefact of the order of questions. One way would be to have two different versions of the questionnaire administered: where DV and IV are ordered differently. Then the means of each order for the variables (construct measurement) are compared. If the means (medians if not normally distributed) are not different, there is no suggestion of order bias. Therefore, any correlation (or regression) between IV and DV is a test for the hypothesis. Good luck with your research.
Sorry for taking so long to respond... James, I was referring to the creation of attitude scales in classical psychometrics. Likert's procedure aims to construct an equal-interval measure from the ordinal responses to items (see chapter on the measurement of attitudes in the first reference). It considers the set of items as repeated measures of the same quantity.
A check of this assumption is a high Cronbach's alpha, because (among other of its properties) it is an index of the proportion of common variance between the items. Thus, high correlations between the items are desirable, because they make it more plausible that their sum is an equal interval scale. All this is data preparation that takes place before entering these scales in linear models, either as IV's or DV's. After they're entered in linear models multicollinearity is to be avoided as with any other set of measures.
Eagly, A.H. & Chaiken, S. (1993) The Psychology of Attitudes. Wiley
Carmines, E. G. & Zeller, R. A. (1987) Reliability and Validity Assessment. Sage
Multicollinearity is a data problem that can adversely impact regression interpretation by limiting the size of the R‐squared and confounding the contribution of independent variables. For this reason, two measures, tolerance and VIF, are used to assess the degree of collinearity among independent variables.