Can we have a negative R squared in fitting a simple linear regression for exemple, and how we can explain this negative value?

01 January 1970 13 5K Report

Statistics & modeling

Arthur Morris Popular answer

Andre, a regression without a constant may produce a negative R-squared. However, this shouldn't be interpreted as usual (the intuition breaks down) but as a signal that a constant should be added should be added to the model. Here's a link to an explanation:

http://web.maths.unsw.edu.au/~adelle/Garvan/Assays/GoodnessOfFit.html

Benalia Ouanouki

Yes, we can have a negative value of R2.

Consider the following example: we want to know if the number of hours of study (variable a ) is associated with academic performance (variable b). It could be that the variables are negatively associated (r

Hocine Remini

Thank u very much dear Benalia for your clear answer and also your simple example

André Calero Valdez

I would argue there is an error somewhere.

R^2 can by definition not be negative, unless you are using complex numbers. Which you aren't.

You can nonetheless have a negative correlation coefficient r, which would yield a negative slope (B or beta).

Benalia Ouanouki

You're absolutely right, but the result remains the same. Indeed, the R² ( coefficient of determination) measures the quality of the adjusted estimates of the regression equation.

This coefficient allows to estimate the proportion of the variance of the variable Y explained by the regression.

We know that the value of an observation yi can be decomposed into two parts: a part explained by the model and a residual. The dispersion of all observations is divided into variance explained by the regression and residual variance, unexplained. R² is then defined as the proportion of variance explained in relation to the total variance, ie [(1 - sum of squared residuals) / total sum of squares].

SCR

R2 = 1 - --------------

TSC

Arthur Morris

http://web.maths.unsw.edu.au/~adelle/Garvan/Assays/GoodnessOfFit.html

James Renwick Beattie

A negative R2 is possible depending on the formula used. One version of calculating R2 can only give positive numbers as it is effectively the square of r. On the other hand a common method of computing R2 is 1 - sum of square in model/sum of square for uncorrelated (horizontal line) - if the model is completely inappropriate it will give a worse sum of squares than a flat line. This is not common, but I have seen it in my own datasets a few times. Basically a negative R2 means you are not on the right planet with your model, never mind in the ballpark. Either the data is complete nonsense or you should be useing a different type of function to fit (e.g. trying to fit a linear line to a complex polynomial shape).

good explanation here.

http://stats.stackexchange.com/questions/12900/when-is-r-squared-negative

Hocine Remini

thank you for your reponse, but R2 could be negative, because softwares use common method (as cited by James Beattie, that i thank too a lot for his explecation) f computing R2 is 1 - sum of square in model/sum of square for uncorrelated (horizontal line) - if the model is completely inappropriate it will give a worse sum of squares than a flat line.

Arne M. Ratjen

Only the adjusted R2 can be negative.

Jean-Pierre Gauchi

I am sorry but all these answers forget a very elementary reason:

1. of course a squared quantity cannot be negative (real numbers).

2. The main point is the following: You make the confusion between the definition of the R2 (cannot be negative) and an estimate of the R2. In softwares we obtain only estimates of the R2, and of course these estimates can be negative in some cases.

Therefore it is only a STATISTICAL reason !!!

Hocine Remini

Thank you ver much dear Gauchi for your reply and explanation, I completely agree with you, Finally, it all depends on which manner we see this problem: either as mathematical or statistical vision (which use a software).

Girish Singh

R square can be negative if regression is a worse fit.

Ernst Henle

The coefficient of determination can be negative (CoD). The square of Pearson's correlation coefficient cannot be negative. The difference is that a coefficient of determination can be applied to data that were not used in the regression. When this happens then the sum of squares of residuals (RSS) can be greater than the total sum of squares (TSS). Then 1 - RSS/TSS < 0. This negative value indicates that the data are not explained by the model. In other words, the mean of the data is a better model than the regression. If CoD is used as an accuracy measure, then the data should not be the regression data. In this situation, a negative CoD is common.

Badges
Science topic

Similar topics
Mathematics
Statistics

Removing the impact factor indicator from ResearchGate without consulting the RG users?

How to learn more about SPSS and its Application?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Geotechnical Engineering (Proceedings of the ICE) time review?

GC-MS retention index prediticon?

Can anyone provide me with molecular docking softwares/ websites?

Posthoc test lettering in JAMOVI?

How to back transform the results generated from analyses using log transformed with In(X+1) data?

What should a Mechanical Engineering PhD scholar focus on during their PhD to enhance their chances of securing a postdoctoral position?

May I know the exact Quartile of the journal- Advanced Engineering Materials (Wiley) for material science category?

I am working on my Master's thesis on the biogeography of the genus Ruagea and I would like to ask, could someone help me to check whether my result?