What are the criteria for non-linear model selection?

Hi Ehsan,

Good question. Comparing models using R-squared is problematic. One problem is that R-squared has no underlying distribution, so it's difficult to know how to test if one R-squared is better than another. The second issue, and I think one that is much more intuitive to understand, is that R-squared values increase as more terms are added to the model. Therefore, using these R-squared values to compare models will lead you to pick the more complex model, in almost every situation.

Solutions:

- Use adjusted R-squared: It's the same formula as R-squared except that you divide the Sum of Square (SS) of the residuals (the numerator) by "n-K," where n = number of data points and K = number of parameters in your model. Divide the total SS (the denominator) by "n-1."

- Use Akaike Information Criterion (adjusted for small sample sizes) to compare between models.

Depending on what your overall modeling goal is, you may also consider the kind of "nonlinear" model you've chosen. The hyperbolic and logarithmic models that you mention are good if you think that the underlying data have some natural functional form. But just like quadratic relationships (or other polynomials), these models are approximations of the data. In other words, the user is forcing the relationship on the data. I have no problem with this in most cases; for instance, for modeling growth rates of a cat or dog, the logistic growth curve is reasonable, simply because dogs start small, grow fast, and then reach adult size. Depending on your data, you may want to look into options that allow you to "let the data speak for themselves." One modeling framework I enjoy is the Generalized Additive Model, but there are others as well.

I hope this helps.

J. Patrick Kelley

Hi Ehsan,

Solutions:

- Use Akaike Information Criterion (adjusted for small sample sizes) to compare between models.

I hope this helps.

Jochen Wilhelm

Good answer from Patrick. AIC is preferrable over R². With R² you can actually only compare nested models, what is not the problem here. My advice is similar to what Patrick states at the end: think which functional form or the relation is *reasonable*. This is an argumetation outside of the data, on the basis of of your expert-knowledge of the topic (e.g, biology) and the underlying physical, chemical, biological,... relationships.

If both models perform equally well within the range of the available data, you may ask which of the models may give more resonable extrapolations.

Finally, if you have two seemingly equally good models, why not present both?

Ehsan Khedive

Thank you so much both. I'll apply your advices.

Vladimir Bakhrushin

When selecting a model it is also desirable to investigate the properties of their residues. For an adequate model residues must be noncorrelated random variables with zero mean. Noncorrelation of residues in the simplest case can be estimated by the Durbin - Watson criterion.

Hemanta K. Baruah

Dear Ehsan,

In ResearchGate, I have one question on misuse of R^2. I think, you may follow that question.

Thiago Augusto Da Cunha

Dear Ehsan first of all you should investigate the relationship between the dependent variable (y) and the independent variable (x) using a scatter plot. If the plot revels a quadratic behavior you should use a quadratic model like a second polinomial model.

So, you should explain why you are using a non-lineal model.

In the context of model selection, a R-square, adjust R-square or AIC criteria alone does not provide enough ensure about the best model. Attached I send you the classical paper that Ascombre (1973) reveals a clearly wrong when considering a R-square to select a model.

I recommend that you also accurately investigate the residual of each tried model to decide which model to select.

Best regards.

How can I prepare virus for a TEM or SEM imaging?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?