Is there any study that mentions that the model is feasible with such low values in social science or do I disregard that linear relationship? (data that consisted of outliers were removed and sample size 250)
When analyzing individual (not aggregated) data such low values are not unusual - you have to decide is it practically useful and have the assumptions behind the analysis been met. Individuals are typically very heterogonous in their attitudes, actions and behaviours.
I am reminded of a famous clinical trial of the effect of taking aspirin on heart attack - the odds ratio was so dramatic that the trial was stopped and placebo group advised to take aspirin. And yet the odds ratio of a heart attack for placebo compared to taking aspirin was, a rather a lowly 1.83 and the R2 was a puny 0.0011; yet this was sufficient for action.
Your arguments are strengthened if you testing a relationship and have not gone on a fishing expedition and you have tried to take account of theoretically relevant confounders. Epidemiology has gone to some extent from 'what are the causes of this outcome?' to 'does this potential cause have an effect?.
I would also add if you are modeling binary(0 and 1) outcomes it is exceedingly difficult to achieve high R2 values as the predicted probability values are not very likely to be exactly 1 and 0!
Finally we have to accept that there are outcomes where chance does genuinely part a large part so we now have evidence that luck plays a bigger part in some cancers than genes and lifestyle; see
Such model will not capture the essence of the study rather I would prefer to say that this sort of relationship should not be attempted. I prefer to recheck the variables (the dependent one) or model form (may be non-linear, if you are sure for relationship). Pl see.
Kevlyn's observations are worth looking into. It may possibly be worthwhile to also account for any hierarchical structure in data (if there is one), treating the various nesting levels as random variables. Such a regression model, with X variables assumed as fixed effect terms, may be fitted using ReML. While this may or may not affect R-sq, it is will provide a regression model that the data "actually" support. Depending on the purpose of the model, R-sq may or may not be right statistic to use to judge the suitability of a model. For example, if the purpose is to use the model for "prediction" (not explanation), you could try using some kind of cross-validation to measure the predictive accuracy of the model in terms of prediction error.
Agreed with previous scholars might need to re-evaluate the research model based on additional literature review. Following are the guideline for R-squared (R2) values:
1) Falk and Miller (1992) recommended that R2 values should be equal to or greater than 0.10 in order for the variance explained of a particular endogenous construct to be deemed adequate
2) Hair et al. (2011) & Hair et al. (2013) suggested in scholarly research that focuses on marketing issues, R2 values of 0.75, 0.50, or 0.25 for endogenous latent variables can, as a rough rule of thumb, be respectively described as substantial, moderate or weak
If the basic objective is to examine the effect of one or two variables on another variable (on the right hand side), one has to look at the sign and statistical significance of the explanatory variables -low R square may not matter much -what matters is this significance. For a fully specified model the objective is to predict the behaviour of dependent variable on the basis of explanatory variables -here poor R square means that the explanatory power of the model is very low -many explanatory variables are left out of the analysis and/or the model is mis-specified.
Thank you for your insights especially Kelvyn Jones.
A statistician advised me in order to increase the R square is to do a correlation analysis between the items of the two variables and the ones with weak correlations to remove the items and then run the test again. The R square should increase substantially.
But i prefer to keep the items as is and will give a justification thanks to your help
Gary King has a nice rant about the limits of R2 as a guide in assessing model quality
King, Gary. "How not to lie with statistics: Avoiding common mistakes in quantitative political science." American Journal of Political Science (1986): 666-687.
If you are using a binary/ordinal model and relying upon a pseudo-R2 the measure is even worse. At last check, Stata was discouraging people from reporting the pseudo-R2 statistic. There are better ways to evaluate model fit for binary/ordinal models (i.e. separation plots)
Greenhill, Brian, Michael D. Ward, and Audrey Sacks. "The separation plot: a new visual method for evaluating the fit of binary models." American Journal of Political Science 55.4 (2011): 991-1002.
I will say r-squared of 5% is pretty low, even in social science context, but your underlying theory and assumption should drive the model specification and selection. Keep in mind that the r-squared is just one of several things you must pay attention to when judging a model's fit with the data.
To some degree, this is an issue of statistical versus substantive signification. A key point to make about statistical significance for R-sq is that it depends on sample size. Even a very low R-sq can be statistically significant if the N is large enough.
By comparison, substantive significance is always somewhat subjective. Han Ping Fung reports on some typical standards for substantive significance. The basic argument here is that if you fail to explain any variance, then something is wrong with your theoretical model or your measures -- or the concept you are trying to explain really is random and nothing will explain it.
I suspect the statistician you consulted was suggesting that something is wrong with your measures and that you should look at those variables more closely. In particular, if they are scales that you constructed, something might have gone wrong in that process. Given how frequently that does happen, I would "up vote" the suggestion to look more closely at the correlations between the individual items.
Are you looking at continuous data? Scatterplots can give you great insight into your data and be far more helpful than R-square. I'd also suggest looking into the "variance of the prediction error," and especially for simple linear regression, the standard error of the estimated slope. These would be more informative.
Are you using multiple regression? Then adjusted R-square is better, but it still is not a great measure. There are various factors that can influence it.
Sure. I agreed with the above answers. Further, you can consider the significant relationship between the variables. It depends on your assumption and practical issues. I also have the experience like this.