01 January 1970 3 2K Report

The output of a multiple regression contains the intercept, parameter estimate coefficients (Beta), "t" and 'F' values, R2 and the test of significance (p - values). In R software, they can be displayed using various functions, such as "summary ( )", " coef ( )" and "lm.beta( )" functions. From these statistics and coefficients, we try to estimate the variable with the highest significant in the model.

Variable importance in the model is mostly indicated by R2 and the p - values. The variables with marginal or low significance have p -values higher than the threshold significance (for instance, p = 0.05), and their inclusion or exclusion does not affect the percentage of variance explained by the model (we can use confidence intervals to be more precise).

Insignificant variables are often be eliminated from the model using backward, forward and stepwise elimination procedures.

The standardized coefficients and their corresponding p - values may also provide a standardized way to compare the effects of independent variables which have different metric units. Nevertheless, because the independent variables are usually correlated, we need find a more robust variable importance selection analysis such as dominance analysis, elastic net, random forest and Boruta to determine the actual importance of an independent variable. How do we select a variable importance selection criteria?

More Job Omweno's questions See All
Similar questions and discussions