Dear Md Kamrul Hasan , " i.e. intercept is not different from zero at a given level of probability " should read "... at a given level of confidence". The whole testing is based on frequentist philsosophy where model parameters (like the intercept) are fixed (their values are just unknown). As they are not results or "random experiments" they have no probability assigend. Therefore we talk about confidcence and not about probability (note that confidence intervals are not probability intervals; a 95% conficence interval is not an intervan that contains the true values with 95% probability).
You can talk about the probability of parameters having certain values only in a Bayesian context, where probability is a measure of believe. But this requires a statement of these probabilities prior to the observations (it can only be calculated how observations should change given believes), and it has nothing to do with p-values from significance tests.
Nothing more than it would for any other regression coeff. It means the constant term is hard to pin down with whatever alpha you used, ie we can't reject the possibility that the regression curve goes through the origin using your choice of alpha. Best, D. Booth
Usually we can tell from the subject matter if there should be an intercept term or not. Consider this method: When all independent variables are zero, based on your subject matter knowledge, would you then expect y to be zero as well? If so, then do not include an intercept term in your model, and it is automatically set to zero.
This could happen regardless of whatever you might consider "significant."
In Ken Brewer's book, Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press, somewhere just passed page 100, he notes that in survey statistics it is better not to include an intercept, just like you don't include other independent variables, unless they will help. He indicated that you should be cautious about using an intercept.
The data will tell you when there is heteroscedasticity. Fit, in general, can be examined by "graphical residual analysis." Overfit can be checked by cross-validation or something similar using other data. Graphical residual analysis can be extended to account for heteroscedasticity:
I suppose that if you are unsure, and want to compare a model with and without an intercept, you could plot graphical residual analyses for each on the same scatterplot for comparison. You could have predicted y on the x-axis, and estimated residuals on the y-axis, and each model could be color coded or use different shapes to mark the points on the scatterplot.
The constant is not significant implies that the regression line will go through a point which is not statistically different from the (x, y)=(0, 0) (crossing of x, y) i.e. intercept is not different from zero at a given level of probability. Constant is the average value of the dependent variable when all the independent variables are zero. This is the base value which is changed due to a change in the independent variables.
Dear Md Kamrul Hasan , " i.e. intercept is not different from zero at a given level of probability " should read "... at a given level of confidence". The whole testing is based on frequentist philsosophy where model parameters (like the intercept) are fixed (their values are just unknown). As they are not results or "random experiments" they have no probability assigend. Therefore we talk about confidcence and not about probability (note that confidence intervals are not probability intervals; a 95% conficence interval is not an intervan that contains the true values with 95% probability).
You can talk about the probability of parameters having certain values only in a Bayesian context, where probability is a measure of believe. But this requires a statement of these probabilities prior to the observations (it can only be calculated how observations should change given believes), and it has nothing to do with p-values from significance tests.
I generally find it useful to specify the model in such a way that the intercept is a meaningful value for the problem being researched, and does not represent a value that is impossible or very extreme. For example, in a study of adult alcohol consumption I would not usually put a raw Age variable as a predictor, as the intercept would then be the mean consumption of a new born child! ( Indeed, I would not be surprised to find a negative estimate.) So I typically center Age around its median or some typical sensible value; for example Age-45, and include that variable in the model. It does not alter the slope for Age, but now the intercept gives the mean consumption for the typical adult. I think it is pity that in many published papers that the estimated intercept remains un-interpreted.
Fair point Kelvyn Jones. Just to make this point clear: It's surely convenient to have the intercept being directly interpretable in a sensible way. However, this is irrelevant technically (the model is not "wrong" if the intercept has no sensible real-world interpretation, and it is not "more correct" if it has). There might be problems with numerical stability of the calculations when the data are extremely far away from zero (relative to the variance), in which case centering can improve the numerical stability (that's a purely technical issue).
Importantly, centering variables in models with interactions also impacts the main effect estimates! (in models with interactions, correlations can be another source of numerical instability that can be reduced by centering).
Jochen Wilhelm I too agree that technically it is not incorrect; I see it as substantively important - the technique is a means, not an end.
Centering continuous predictors becomes a really big issue in mixed models with allowed-to-vary intercepts and slopes. Convergence may not be easily achieved when the intercepts are way outside the range of the X data and the estimated covariance (between intercepts and slopes) can be very misleading - apparent negative without centering, changing to positive with centering. You can even get covariances that imply impossible correlations outside a range of -1 to +1; there are published papers with this technical fault. And that is why when I (used to) teach this stuff, I stressed both sensible centering and looking at variance functions over the range of the X data. And that is why our MLwiN software has such elements built in for easy use.
The attached diagrams show a small empirical example of house prices as a function of size in a set of districts (the key example in Book Developing multilevel models for analysing contextuality, he...
). The apparently very different results ( positive as versus negative covariance between the slopes and intercepts at the district level ) is a result of centering. The subsequent diagrams show the same variance function and the same varying relations plot for both specifications. But it is easy to be misled as to what is going on.