What are the properties of instrumental variable regression and when do we say that instrumental variables are weak?
Dear Respected colleague,
First a dataset should always be explored to see if it meets the assumptions of the statistical methods applied. The multivariate data analyses we are intending assume normality, linearity and absence of multicollinearity.
Normality refers to the shape of the data distribution for an individual variable and its correspondence to the normal distribution. the assumptions of normality can be examined by looking at histograms of the data, and by checking skewness and kurtosis. The distribution is considered normal when it is bell shaped and values of skewness and kurtosis are close to zero.
The linearity of the relationship between the dependent and independent variables represents the way changes in the dependent variable are associated with the independent variables, namely, that there is a straight-line relationship between the independent variables and dependent variable. This assumption is essential as regression analysis only tests for a linear relationship between the independent variables and dependent variable. Pearson correlation can capture the linear association between variables.
If the assumptions of regression analysis are met, then the errors associated with one variable are not correlated with the errors of any other variables . Independence of residuals can be examined via the Durban – Watson statistic which tests for correlations between errors. Specifically, it tests whether adjacent residuals are correlated. As a rule of thumb, reserchers suggests that Durban – Watson test values less than 1 or greater than 3 are definitely cause of concern, however, values closer to 2 indicate that the residuals are acceptable.
Multicollinearity is the existence of a strong linear relationship among variables, and prevents the effect of each variable being identified. Researchers recommend examining the variable inflation factor (VIF) and tolerance level (TOL) as a second tool for of multicollinearity diagnostics. VIF represents the increase in variance that exists due to collinearities and interrelationships among the variables. VIFs larger than 10 indicate strong multicollinearity and as a rule of thumb VIFs should be less than 0.1.
The next step is to assess the overall model fit in supporting the research hypotheses. This is done by, firstly, examining the adjusted R squared (R2) to see the percentage of total variance of the dependent variables explained by the regression model. Whereas R2 tell us how much variation in the dependent variable is accounted for by the regression model, the adjusted value tells us how much variance in the dependent variable would be accounted for if the model had been derived from the population from which the sample was taken. Specifically, it reflects the goodness of fit of the model to the population taking into account the sample size and the number of predictors used .
Next, the statistical significance of the regression analysis must be examined through ANOVA and F ratios. Analysis of Variance (ANOVA) consists of calculations that provide information about levels of variability within a regression model and form a basis for tests of significance. F-testing checks the statistical significance of the observed differences among the means.
I think the properties of instrumental variables lies in the relationship between variables and their attendant parameters in an equation. For example a variable Z may be correlated with e (error term).
The properties of instrumental variables are said to be weak when the outcome of tests reveal certain figures that do not fall within the range specified in the model. Such a weakness can be seen if the F -statistic on the excluded instruments in the first stage is