Andre, a regression without a constant may produce a negative R-squared. However, this shouldn't be interpreted as usual (the intuition breaks down) but as a signal that a constant should be added should be added to the model. Here's a link to an explanation:
Consider the following example: we want to know if the number of hours of study (variable a ) is associated with academic performance (variable b). It could be that the variables are negatively associated (r
You're absolutely right, but the result remains the same. Indeed, the R² ( coefficient of determination) measures the quality of the adjusted estimates of the regression equation.
This coefficient allows to estimate the proportion of the variance of the variable Y explained by the regression.
We know that the value of an observation yi can be decomposed into two parts: a part explained by the model and a residual. The dispersion of all observations is divided into variance explained by the regression and residual variance, unexplained. R² is then defined as the proportion of variance explained in relation to the total variance, ie [(1 - sum of squared residuals) / total sum of squares].
Andre, a regression without a constant may produce a negative R-squared. However, this shouldn't be interpreted as usual (the intuition breaks down) but as a signal that a constant should be added should be added to the model. Here's a link to an explanation:
A negative R2 is possible depending on the formula used. One version of calculating R2 can only give positive numbers as it is effectively the square of r. On the other hand a common method of computing R2 is 1 - sum of square in model/sum of square for uncorrelated (horizontal line) - if the model is completely inappropriate it will give a worse sum of squares than a flat line. This is not common, but I have seen it in my own datasets a few times. Basically a negative R2 means you are not on the right planet with your model, never mind in the ballpark. Either the data is complete nonsense or you should be useing a different type of function to fit (e.g. trying to fit a linear line to a complex polynomial shape).
thank you for your reponse, but R2 could be negative, because softwares use common method (as cited by James Beattie, that i thank too a lot for his explecation) f computing R2 is 1 - sum of square in model/sum of square for uncorrelated (horizontal line) - if the model is completely inappropriate it will give a worse sum of squares than a flat line.
I am sorry but all these answers forget a very elementary reason:
1. of course a squared quantity cannot be negative (real numbers).
2. The main point is the following: You make the confusion between the definition of the R2 (cannot be negative) and an estimate of the R2. In softwares we obtain only estimates of the R2, and of course these estimates can be negative in some cases.
Thank you ver much dear Gauchi for your reply and explanation, I completely agree with you, Finally, it all depends on which manner we see this problem: either as mathematical or statistical vision (which use a software).
The coefficient of determination can be negative (CoD). The square of Pearson's correlation coefficient cannot be negative. The difference is that a coefficient of determination can be applied to data that were not used in the regression. When this happens then the sum of squares of residuals (RSS) can be greater than the total sum of squares (TSS). Then 1 - RSS/TSS < 0. This negative value indicates that the data are not explained by the model. In other words, the mean of the data is a better model than the regression. If CoD is used as an accuracy measure, then the data should not be the regression data. In this situation, a negative CoD is common.