As others have written the intercept is the mean of the response when all predictors are zero. You may wish to test that is this estimate is different from a specific hypothesized value and this does not have to be zero. It has much to do with your theory and expectations. Naively taking the estimate and dividing it by is standard error will allow you to evaluate if the estimate is significantly different from zero which may or may not be a sensible thing to do.
You can radically change the intercept estimate by re-scaling the X variable say by centering that is subtracting its grand mean (xi - xbar)- this will not change the slope term in a standard regression model.
There are times when you want to force the intercept to be effectively zero - this is known as regression through the origin = so that when X is 0, Y is forced to be 0. This can be a suitable procedure when Y is the gold standard of measurement and X is an easier way to measure the same thing - you know from theory when X=0 so does Y. You can do this by omitting the intercept term in the model - software will have a command such as NOINTERCEPT.
What is the importance of significance of intercept only in regression analysis? - ResearchGate. Available from: https://www.researchgate.net/post/What_is_the_importance_of_significance_of_intercept_only_in_regression_analysis/1 [accessed Apr 13, 2016].
- If the theory and the meaning of intercept are sense to you => don't set it to 0
- If the theory and the meaning of intercept are not sense to you => you have to check whether it is significant or not and then take it out of model and recheck the consistence and effficience of your model
I see that you work in agriculture, which may be a particularly rich field for the use of "regression through the origin" (really to the origin), as in survey statistics. This is known as ratio estimation. Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons is a text primarily on design-based sampling and estimation, yet he mentions model-based methods and relates ratio (regression through the origin) estimation to agricultural applications, through cluster sampling, which has origins at least as far back as Fairfield Smith, H.(1938), “An Empirical Law Describing Heterogeneity in
the Yields of Agricultural Crops,” Jour. Agric.Sci., Vol. 28, pp. 1 - 23. This relates then to heteroscedasticity (heterogeneity of variances of residuals). And heteroscedasticity is a hugely important natural feature of regression to the origin in survey statistics, agriculture, and econometrics. Further, Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press, is a text which includes an example on page 109 from agriculture. It shows a scatterplot of farm production as predicted by farm size, and though this should go to the origin, and does, it is slightly nonlinear because it is stated there in Brewer that the quality of farm land decreases with size. But the obvious heteroscedasticity in the data nearly overwhelms this difference, in my opinion, just looking at the scatterplot. (Stratifying by size in a spline-type approach could, i think, yield two very near linear parts with the larger part having a non-zero intercept out of scope/range for that x-size stratum.) I have also seen electric sales data plotted against a previous electric sales census, for that same data item, where the data showed such a tight fit to a straight line to the origin, that no appreciable variance, even heteroscedasticity for y in the larger x cases, was even visually evident, but from the data, the coefficient of heteroscedasticity was estimable, and as usual, greater than the level of the classical ratio estimator.
So, you may have data which from theory and practice may well be linear and even more convincingly include the origin, but I think you then must consider that appreciable heteroscedasticity naturally goes with that, especially, but not only with cases with one regressor. You might find something useful in https://www.researchgate.net/publication/263036348_Properties_of_Weighted_Least_Squares_Regression_for_Cutoff_Sampling_in_Establishment_Surveys or in https://www.researchgate.net/publication/261596397_Ken_Brewer_and_the_coefficient_of_heteroscedasticity_as_used_in_sample_survey_inference
If you are estimating totals from a finite population, you might find the following helpful: https://www.researchgate.net/publication/261947825_Projected_Variance_for_the_Model-based_Classical_Ratio_Estimator_Estimating_Sample_Size_Requirements
Cheers - Jim
PS - Brewer(2002) cautions against using an intercept unless you really need one, and in survey statistics it is usually a mistake to use one in these applications. It is also usually best to use just one regressor. See Brewer (2002), chapter 7. I think that in agriculture and econometrics, as well as survey statistics, it is best to have to justify when you use an intercept. Otherwise don't use it. This is for simple models. (For more complex models, you need to know the meaning of an intercept, if used. It can become something of a "junk drawer" for omitted variables, and you'd need to be aware of that.)
PSS - When results theoretically (from the subject matter) should include the origin, looking at a confidence interval about an estimated intercept may indicate this empirically, but perhaps not, depending at least partially upon sample size.
Article Ken Brewer and the coefficient of heteroscedasticity as used...
Conference Paper Projected Variance for the Model-based Classical Ratio Estim...
Article Properties of Weighted Least Squares Regression for Cutoff S...
An observation about the wording of your question: "When should we force the intercept to zero?" It is revealing that perhaps most people may think, or may be trained to think, that justification may be needed when we "force" a zero intercept, by not having such a term, just as we may not have some other term in the regression. It may just seem more natural to expect to include an intercept term. But is it more 'natural?' Perhaps, instead, we should think "When are we forced to include an intercept term?" When and why would we expect a regression to include a constant? What justifies it?
However I gained the idea that in some cases (e.g. allometric relationships in forestry) data should confirm the "biologically-expected" zero intercept. In case it is not true, your data might be biased and the intercept term could be a fair estimator of a systematic error.
What about generalized linear models without intercept? Do you know particular applications for GLM without intercept specially for gamma distributed responses. ?
Don't know that I ever thought of an intercept as a place to 'accumulate' systematic measurement error, but that sounds interesting. I guess I more generally think of an intercept as more likely a place to compensate (but not very well) for omitted variables.
Reexamining my response from April 15, 2016, I said that "Cochran, W.G.(1977), Sampling Techniques, 3rd ed, John Wiley & Sons is a text primarily on design-based sampling and estimation, yet he mentions model-based methods and relates ratio (regression through the origin) estimation to agricultural applications, through cluster sampling...." But it appears that I really should have just said that he relates the variance through cluster sampling, not the lack of an intercept. (See bottom of page 256 and top of page 257, compared to page 243.)
He does discuss classical ratio estimation (which implicitly assumes a particular coefficient of heteroscedasticity) on pages 158 through 160, and says on page 160 that if a graph is consistent with this that "...the ratio estimator will be hard to beat."
So it looks like I made a jump there that he may not have made, at least that I can recall now. Sorry.
Brewer, whom i also mentioned, did take up this discussion of heteroscedasticity, and was skeptical about including intercepts, in many cases, as noted previously. (When I mentioned "nonlinearity" there, I should have only noted curvature, not necessarily nonlinear regression, I suppose.)
At any rate, if you don't have a reason for an intercept, adding one in does not help. Previously I noted that "I think that in agriculture and econometrics, as well as survey statistics, it is best to have to justify when you use an intercept. Otherwise don't use it. This is for simple models. (For more complex models, you need to know the meaning of an intercept, if used. It can become something of a "junk drawer" for omitted variables [I think], and you'd need to be aware of that.)
Regardless, you should examine heteroscedasticity. Please see https://www.researchgate.net/project/OLS-Regression-Should-Not-Be-a-Default-for-WLS-Regression, and updates.
For an intercept, zero is often exactly correct. Leaving it to be estimated in such a case just results in noise.
For one reference:
Brewer, K.R.W.(2002), Combined Survey Sampling Inference: Weighing Basu's Elephants, Arnold: London and Oxford University Press, page 110: "It is more often the case than not, in survey sampling, that the most appropriate supplementary variable is close to being proportional to its corresponding survey variable, and that their natural relationship or line of best fit is a straight line through the origin."
In multiple linear regression, if your application is such that when all 'independent' variable values are zero, you would expect y to be zero, then adding an intercept term just degrades your model. (If you do so, then the estimated standard error for the estimated intercept, for a substantial sample size, will often be greater than the estimated intercept.)
"Error" does not exactly equate to an intercept term. It would be more appropriate to say that an intercept term is more like an independent variable/predictor.
As stated earlier, if y should be zero when all x's are zero, then there should be no intercept term. The estimated residuals should add to zero, or near zero (zero in expectation), and if you think you need an intercept term to help with that, then you are probably using ordinary least squares (OLS) regression when you should be using weighted least squares (WLS) regression. See https://www.researchgate.net/project/OLS-Regression-Should-Not-Be-a-Default-for-WLS-Regression, and various updates to that project, shown there.
Think of an intercept term as just another independent variable. You don't just throw in any variable, and you shouldn't just throw in an intercept term without a good reason, whatever the application. I suspect a lot of people just use OLS all of the time too. When y-values, and more specifically, predicted y-values, differ in size, as they generally do, you should have heteroscedasticity of your estimated residuals. In the project linked above, you will see links to a paper, "Essential Heteroscedasticity." That explains it. But also there are problems with the model and with data which can modify this impact. There are references to another paper, "Estimating the Coefficient of Heteroscedasticity," and an excel tool for determining how much heteroscedasticity you have.
Sometimes people have omitted variable bias when they leave out a predictor/independent variable that was needed. But if you throw in variables, or an intercept term, when not needed, then you can increase variance. (You can research "bias-variance tradeoff.") If you overfit a model to a specific data set, that can be bad too (see "cross-validation"), but otherwise you can compare models using "graphical residual analyses." It is a bad habit to throw in an intercept term, or any independent variables, without a good reason.
So, "When should we force the intercept to zero?" Don't "force" anything. Use it, like an independent variable: use it when needed, and don't when it isn't needed. If you put it in when not needed you may get an estimate which is typically low in absolute value for its standard error (though the standard error depends on sample size). But you have just added noise to your model, needlessly, when obviously the best estimate is zero. In such a case, the intercept term was a bad idea. See Brewer, K.R.W.(2002), Combined Survey Sampling Inference: Weighing Basu's Elephants, Arnold: London and Oxford University Press, pages 109-110.
......
So, if you always use an intercept term, I expect that you probably always assume homoscedasticity, and you should question both of those assumptions. If this still seems OK, then I expect that your models and/or data are dealing with problematic situations where you are not able to find a very good set of predictors (not too many, not too few, and just the right ones), and/or data quality is problematic. Perhaps you also need more than one model to cover the population (in subpopulations or strata).
Primarily, don't always assume an intercept and homoscedasticity of estimated residuals. An intercept, like any independent variable, may or may not be appropriate. Homoscedasticity should not be the case and could indicate a problem with your model. I have been working on ideas for a paper which also notes this.
If you always assume an intercept term, do you always assume homoscedasticity of estimated residuals? Perhaps you have a very complex situation where it is hard or impossible to obtain a very good mix of predictors and you are then using an intercept in place of one or more predictors that you cannot obtain? Perhaps you should read my response of a few minutes ago.
At any rate, you may not be able to obtain a very good model, but you can compare alternative model performances by using "graphical residual analyses" to attempt to arrive at the best results that you can. You can put predicted y on the x-axis of a scatterplot, with estimated residuals on the y-axis, and if you plot the performances of more than one model for a given sample on the same scatterplot, you can compare performances in a more complete manner than with any individual statistic. You can use "cross-validation" to avoid overfitting to a particular sample. Perhaps comparative graphical residual analyses on first one sample and then another would be very helpful.
This is very simple: whenever the theory of what you measure says so.
Example 1: if you want to determine a resistance (R) by measuring currents (I) through a resistor with different voltages (U), you know that I=0 whenever U=0, with any resistor, since R=U/I.
Example 2: the force (F) needed to stretch a string over a distance x: zero force means zero distance; F=kx
Based on my experience of past, I will try to answer this.
In my medical research or field research, many times we come across the two or more variables of our interest, appears to be correlated. Let us take simple examples like Height and Weight of Children, Income and expenditure , Price and demand, Expenditure on advertisement and increase in sales. In all these examples X and Y are correlated linearly but there is no perfect correlation. Unless r is +1 or -1, we cannot expect the intercept to be zero. Consider P∝1/V, thus PV=x which are called deterministric models where r=1, the intercept can be zero but not necessary. Unless, we have reason to believe that y is a resultant of some multiple of X, we cannot expect intercept to be zero. By mathematical definition, the expected value of y when X is equal to zero is taken as the intercept value. So, forcing intercept to zero is same as assuming that y and x ratio is always a constant. Consider the following data
x = 1 3 5 10
y= 3 9 15 30
in this case b=3 and the intercept is zero. Observe that Y is essentially a multiple x (3 times).
Consider another example
x = 1 3 5 10
y = 4 10 16 31
In this case the correlation is +1 but the intercept is not 0 but 1.
So, if we consider such data where the correlation is not 1 and the regression equation can be given as y = a + bx where a is intercept, b is slope. As mentioned before, a is nothing but the expected value of y when x=0. However, it's interpretation is interesting. Based on different set of data, the intercept may mean differently. Generally, "a" can be termed as an average error which you commit in predicting y based on selected x values. Like in Height and weight relationship or Income and expenditure relationship or advertisement expenditure and gained in sales relationship, the intercept "a" certainly represents the error which we are committing in predicting the dependent variable. In business model, "a" is taken as constant expenditure while x is taken as variable expenditure. So, interpretations may vary based on the data but general conclusions can be drawn. I summarize as follows:
1. The value of Y when X is zero is termed as the intercept value.
2. In prediction models [ y = a + bx], a is nothing but the average error which we are committing in prediction of y's for selected x values.
3. In business models, "a" is termed as constant expenditure while x is termed as variable expenditure.
4. If r ≠ 0, the intercept cannot be zero.
5. If r= 0 and y/x is always constant then only you have reason to believe that a=0 otherwise not.
6. Like height in cms and in inches should be perfectly related and for such data, you have every reason to think that a= 0. The a value if obtained for such a data should be taken as an average error committed in conversion.
7. Finally, to answer your question, I must say that when you expect X and Y are perfectly correlated and expect Y to X ratio to be a constant then only you should force non-zero a to be as 0.
Remember, the regression equation is just the equation of a line - it gives us a slope and intercept. In most situations, the relevant range of dependent and independent variable data are far away from 0, like y = aggregate private consumption and x = aggregate income. Who cares where the intercept is? The intercept only tells us where the main line is running thru the relevant range of data - it has no meaning. In fact, we're very likely linearizing something that has non-linear aspects to it anyway. It's roughly linear where we're looking, and that's it.