I have a dataset which I want to fit a non-linear model. I've tried hyperbolic and logarithmic models that fitted with the same R Square. But I don't know which one is better. Can anyone help me with that?
Good question. Comparing models using R-squared is problematic. One problem is that R-squared has no underlying distribution, so it's difficult to know how to test if one R-squared is better than another. The second issue, and I think one that is much more intuitive to understand, is that R-squared values increase as more terms are added to the model. Therefore, using these R-squared values to compare models will lead you to pick the more complex model, in almost every situation.
Solutions:
- Use adjusted R-squared: It's the same formula as R-squared except that you divide the Sum of Square (SS) of the residuals (the numerator) by "n-K," where n = number of data points and K = number of parameters in your model. Divide the total SS (the denominator) by "n-1."
- Use Akaike Information Criterion (adjusted for small sample sizes) to compare between models.
Depending on what your overall modeling goal is, you may also consider the kind of "nonlinear" model you've chosen. The hyperbolic and logarithmic models that you mention are good if you think that the underlying data have some natural functional form. But just like quadratic relationships (or other polynomials), these models are approximations of the data. In other words, the user is forcing the relationship on the data. I have no problem with this in most cases; for instance, for modeling growth rates of a cat or dog, the logistic growth curve is reasonable, simply because dogs start small, grow fast, and then reach adult size. Depending on your data, you may want to look into options that allow you to "let the data speak for themselves." One modeling framework I enjoy is the Generalized Additive Model, but there are others as well.
Good question. Comparing models using R-squared is problematic. One problem is that R-squared has no underlying distribution, so it's difficult to know how to test if one R-squared is better than another. The second issue, and I think one that is much more intuitive to understand, is that R-squared values increase as more terms are added to the model. Therefore, using these R-squared values to compare models will lead you to pick the more complex model, in almost every situation.
Solutions:
- Use adjusted R-squared: It's the same formula as R-squared except that you divide the Sum of Square (SS) of the residuals (the numerator) by "n-K," where n = number of data points and K = number of parameters in your model. Divide the total SS (the denominator) by "n-1."
- Use Akaike Information Criterion (adjusted for small sample sizes) to compare between models.
Depending on what your overall modeling goal is, you may also consider the kind of "nonlinear" model you've chosen. The hyperbolic and logarithmic models that you mention are good if you think that the underlying data have some natural functional form. But just like quadratic relationships (or other polynomials), these models are approximations of the data. In other words, the user is forcing the relationship on the data. I have no problem with this in most cases; for instance, for modeling growth rates of a cat or dog, the logistic growth curve is reasonable, simply because dogs start small, grow fast, and then reach adult size. Depending on your data, you may want to look into options that allow you to "let the data speak for themselves." One modeling framework I enjoy is the Generalized Additive Model, but there are others as well.
Good answer from Patrick. AIC is preferrable over R². With R² you can actually only compare nested models, what is not the problem here. My advice is similar to what Patrick states at the end: think which functional form or the relation is *reasonable*. This is an argumetation outside of the data, on the basis of of your expert-knowledge of the topic (e.g, biology) and the underlying physical, chemical, biological,... relationships.
If both models perform equally well within the range of the available data, you may ask which of the models may give more resonable extrapolations.
Finally, if you have two seemingly equally good models, why not present both?
When selecting a model it is also desirable to investigate the properties of their residues. For an adequate model residues must be noncorrelated random variables with zero mean. Noncorrelation of residues in the simplest case can be estimated by the Durbin - Watson criterion.
Dear Ehsan first of all you should investigate the relationship between the dependent variable (y) and the independent variable (x) using a scatter plot. If the plot revels a quadratic behavior you should use a quadratic model like a second polinomial model.
So, you should explain why you are using a non-lineal model.
In the context of model selection, a R-square, adjust R-square or AIC criteria alone does not provide enough ensure about the best model. Attached I send you the classical paper that Ascombre (1973) reveals a clearly wrong when considering a R-square to select a model.
I recommend that you also accurately investigate the residual of each tried model to decide which model to select.