It is said that always high R-square value does not ensure a good model. Many a time, with a low R-square value and significant p-values can be treated a good model.I want to understand how low R-square value is acceptable.
In general, we are looking for high R-square in regresion based models. However, this is not always the case, and in some cituations you might have low R-square and your model can still be useful.
Please have a look at the following articles which explain where and how low R-square might be accepted in some cases.
In a R-squared regression model is not the best selection criterion, the statistical significance of the parameters is fundamental, but above all, the diagnosis of the residues, that is, the residues must have: constant variance; independence and follow a normal distribution. Since 0 < R-squared < 1, the value 0.12 (12%) indicates that the predictor (s) are explaining only 12% of the response variable variability, which is not a reasonable explanation. At least 60% of the explanation can be considered satisfactory, although the square is no longer used in the scientific literature as a decision criterion for choosing models, but the significance of the parameters and the diagnosis (residual analysis) of the estimated model.
Because an r-square or R-square, adjusted or not, can be misleading, and a lone p-value is meaningless, and here, even comparing them does not help a lot to see relative merit of predictors since collinearity, and other things might impact it, I think you should concentrate on graphical residual analysis. If you put the predicted y values on the x-axis, and the estimated residuals on the y-axis, you can find information online that will help you to interpret it. That will be a much better indicator of fit than R-square, even though R-square gives you a (often not very meaningful) number. If you save some data for testing (cross-validation), you can try to avoid overfitting.
More can be done with an extended graphical residual analysis to estimate the coefficient of heteroscedasticity for a regression weight when you have (naturally occurring) heteroscedasticity, which is generally a matter of degree, not yes or no, Do I have it? For that, see https://www.researchgate.net/project/OLS-Regression-Should-Not-Be-a-Default-for-WLS-Regression,
Brewer, K.R.W.(2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press, especially pages 111, and 87, 130, 137, 142, and 203, and particularly
Yes, thanks for the question. That would usually be written y with a hat on it, but it was easier to do that and squeeze columns in. It is for your preliminary predictions, so it would represent the OLS results you obtained. I then use that as an approximation good enough to estimate the coefficient of heteroscedasticity used to obtain a regression weight expression. In SAS PROC REG, for example, then the regression weight, w, goes in when you run again to have WLS (weighted least squares) regression results.
The best size measure would be the WLS prediction, but that is what you are working toward. The next best size measure that can easily be made available would be y_hat.
So you put in your y and your OLS predicted y values in the first two columns, to estimate the coefficient of heteroscedasticity, to use in the regression weight expression, to do a WLS regression.
Please also read the sheet on defaults.
Thank you.
PS - Attached is the more 'ideal' expression for regression weight, but in practice we use an approximation for the WLS prediction y*, and use y_hat instead.
Regarding the original question, buried above I noted that "I think you should concentrate on graphical residual analysis." That could give you a good indication of the usefulness of your model.
Note that a "graphical residual analysis" looks at model "fit." But you could "overfit" to the data used. If you could save out some data not used in model selection or estimation of regression coefficients, you could see how well you predict for those cases. If a graphical residual analysis indicates a good fit, but you do not predict well for new cases, that could indicate overfitting to those data used. For a formal such study, you could research "cross-validation." But if you do not have a lot of data, checking just a few cases might be helpful. (I say "checking" to try to avoid conflict with the terminology used in formal statistical learning procedures.)
Here, however, I think the concern will probably just be with whether or not the fit indicated by the graphical residual analysis appears adequate. If you put predicted y on the x-axis and estimated residuals on the y-axis, there is information on the internet about how to interpret results. If you want to compare possible models, you can do this for more than one model on the same scatterplot. That would guarantee a direct comparison on the same scale.