A low R-squared value indicates that your independent variable is not explaining much in the variation of your dependent variable - regardless of the variable significance, this is letting you know that the identified independent variable, even though significant, is not accounting for much of the mean of your dependent variable. You may want to look into adding more non-correlated independent variables to your model - variables that some how relate to your dependent variable (context).
Would you expect to explain a high proportion of variance? Models that explain a high proportion of R^2 are in many domains very implausible and capitalise on chance. I'd particularly expect a single predictor not to explain a lot of variation in many domains.
Also, R^2 is influenced by all sorts of things that aren't really about explanatory power (e.g., range restriction, measurement error and so forth).
As this single independent variable is explaining a low variation in the outcome variable, I suggest you include other explanatory variables in the regression before accepting the result of your initial analysis.
Just because R2 is small doesn’t mean that your model is bad or worthless of being interpreted. Even small R2 can have unique contribution in relation to your field of study. I think a model with small R2 that has unique contribution may be more relevant than the one with large R2 without unique contribution. It all depends on your field of research. In social sciences it is not possible to include all the relevant predictors to explain an outcome variable. This might lead to a lower R2 value. And remember, your model has only one predictor variable.
Just because R2 is small doesn’t mean that your model is bad or worthless of being interpreted. Even small R2 can have unique contribution in relation to your field of study. I think a model with small R2 that has unique contribution may be more relevant than the one with large R2 without unique contribution. It all depends on your field of research. In social sciences it is not possible to include all the relevant predictors to explain an outcome variable. This might lead to a lower R2 value. And remember, your model has only one predictor variable. But you can still add some IVs or relevant control variables to improve the model as suggested by others.
R-squared is nothing but a linear fit. However, if the data is procured from real sources, it may not be a linear fit all the time. Hence, either use AIC,SC and HQ as measures of model fit. You can even use RMSE, MAE, MAPE as well. If the plotting indicate polynomial fit, please use MARS (Multiple Adaptive Regression Spline) instead of Regular Gaussian Regression.
Many earnings regressions are considered valid with a low R2 . The objective of the regression being to show the significance of the particular regressor on the regressand, it would be acceptable. However if your research question was to explain all pertinent factors in the regressand, you have a long way to go as you need to identify a complete set of variables that affect your y variable(regressand/DV in such case)
For example even in the case when you are looking for only the significance of the relationship between the chosen X any other significant variables(if available, they would be easy to discover in a literature review or in discussion of your valid theories) could function as control variables and show the robustness of the proved relationship in the presenece of other endogenous variables. A linear regression always needs to control for endogeneity from omitted variables, correlated variables (multicollinearity) and bidirectional causality among others. In Corporate finance that implies panel data from a cross section of firms across industries.
A univariate regression ignores endogeneity and its causes for the purposes of establishing the choice of variables and relationship with y and for that purpose the univariate regression is good evidence even with a low R2. Alow R2 / Coefficient of Determination is showing that X is one among all independent variables affecting the dependent variable. In some cases it may not be possible to get any better R2 even with more variables (typical in explaing earnings variation). Theory needs to support your relationship completely and robustly for you to stop with univariate regressions with just one Independent Variable (X)
How much data do you have? What does the scatterplot look like? Is there a pattern in the data that follows a pattern other than linear. if so, then explore is higher order or nonlinear model. As far as linear, adding other independent explanatory variables certainly has merit, but the question is which one(s)? Do you have any further information on the data, for example geographic location, time, anything that can use to subgroup the data. The patterns in the subgroups may be clearer. There appears to be a relationship with the explanatory variable you're using, but there's obviously so much more that's unexplained by the variables you're using.
Significance of r or R-squared depends on the strength or the relationship (i.e. rho) and the sample size. If the sample is very large, even a miniscule correlation coefficient may be statistically significant, yet the relationship may have no predictive value. A possible explanation may be that the relationship could be non-linear, e.g. as X increases, Y increases initially, but for larger X, Y may decrease as X increases (for example, if X is the amount of fertiliser, too large values of X could mean over-fertilising and become toxic). A scatter plot of Y vs X should show this up. In the case of more than one independent variable, you will have to plot the residuals against the dependent and independent variables to check for non-linearity.
I think low R-square is not the major concern if the tested variables are significant. However, in your own case, you have a single independent variable. The best way to go about it is to include additional iv;s and possibly controls variables. you can as well use scientific approach to determine if the model suffered from omission or mis-specification.
R-squared of 0.05 is not low, it is very low. Almost zero, I would say. Moreover, residuals analysis should show patterns or any other sign to think that there must be any other explanatory variable, at least if there are few. The question about the acceptance of the regression model is up to you. You must look at the regression coefficient and decide if its practical interpretation makes sense or not. I can imagine that with such a low R-square, your linear model is almost horizontal, therefore, although p-value
If R^2 is .05, your r is between somewhere between .212 and .224 (for .045 and .055). The larger of these has a p of about .09 (two-tailed) for n=59. You might want to clarify what you found. On your final point, you should decide whether to accept the results or not without reference to how they have come out.
Please let the data speak: maybe you have one or more omitted variables, maybe the functional form of the regression form is incorrect (so you have to add some quadratic, or cubic terms...). At the same time a transformation can be an alternative (if appropriate). Maybe could be the effect of a group of outlier (maybe not one...). In general a good policy on the statistical\econometric model diagnostic is to observe the scatterplot of your data with a regression line\ nonparametric regression Lowess in order to show the data structure. Of course there could be cases in which the number of number of observations can impact on the R^2 but I don't think is the case. More simply you have a model which omit some relevant variables or miss to represent the real data structure.
Maximum-accuracy machine-learning methods identify more accurate and more useful models than regression-based approaches (unless one's objective is to accurately predict responses which are at or near the sample mean response):
Here is a brief article which introduces the maximum-accuracy machine-learning methodology--for which NO distributional assumptions are required, and exact p-values are provided:
If legacy methods don't do the job for you, modern machine-learning methods may return superior solutions--that is why machine-learning is the hottest topic in statistical modeling today.
R^2 is a descriptive statistic. It indicates the proportion of sample variance explained by the predictors in the model. Adjusted R^2 is an inferential statistic - estimates the proportion of variance explained by your predictors in the population your sample is drawn from. Specifically it corrects for both small sample bias and the number of predictors - as small samples tend to overestimate explained variance and adding predictors can explain variance by chance (R^2 can only increase or stay the same when you add predictors).
So you can explain around 36% of variation with your model in the population sampled. You still need to be cautious as this is a point estimate (with uncertainty that could be described with an interval estimate) and because the explanatory power might be better or worse if your sample isn't representative of the population you want to generalise to. For instance R^2 might get a lot worse outside the range of predictor values you have or if you look at the general population rather than a specific sample (students at one university, patients at one hospital). Also things like measurement error, range restriction impact R^2 type statistics.
You have only one independent variable, but you see no heteroscedasticity? That should not happen here (especially if you have a zero intercept), unless you have omitted variables/perhaps not the best independent variable, or you actually do have heteroscedasticity which is not that apparent visually. See the recent "Examples" update in https://www.researchgate.net/project/OLS-Regression-Should-Not-Be-a-Default-for-WLS-Regression.
Am also facing the same problem of too small R2 where am having 5 Independent variables and one dependent variable. My supervisor tells me that a very low R2 could mean that my data has a problem. Is it true?. Kindly advise
David Muriithi I think instead of applying a linear statistical model or OLS apply deep learning (ML) methods. It'll fit better for non-linear fits. You have to see low AIC/SC values. ML models being self-corrective provides a better fit in real life data, which is mostly non-linear. Hope this helps.