How to assess goodness of fit for a non-linear model?

09 September 2013 39 8K Report

Suppose a non-linear smooth function is fitted to some data (e.g. means and standard errors for cell survival after various radiation doses). What are some useful ways to assess goodness of fit for the model, without comparing to other models?

Michael E Young Popular answer

You inserted the qualifier "without comparing to other models" which really limits the possible answers. I think you're asking for something equivalent to the R-squared or other effect metrics that are in standard use in linear modeling. If your error distribution is approximately normal, then the standard metrics can be used although curve fitting like you're describing is prone to overfitting and would necessitate something like a cross-validated assessment of the metric.

You can always just plot the fit against the raw data to judge the goodness of fit - just use your eyes!

Steffen Borchers

Hello Igor

Basically, "fitted to some data" describes a particular optimization problem that has been solved, i guess sum of least squares in your case. So goodness of fit then translates into the distance of your solution to the global optimum solution.

I would suggest to estimate the confidence intervals of your parameters; Tight intervals and identifiable parameters here mean that you have a "good" model suitable e.g. for prediction and control/therapy purposes. To this end, the uncertainty of your data needs to be described and taken into account (e.g. using set-based methods ;)

If you have non-identifiable parameters, then the model is over-parametrized.

A model should always be as simple as possible, but no simpler.

best

Michael E Young

You can always just plot the fit against the raw data to judge the goodness of fit - just use your eyes!

Kuan-Wei Tseng

I think some criteria for semi- and nonparametric models could be applied in your model, such as the average squared error, the mean average squared error, integrated squared error, average predictive square error, (generalized) cross-validation and so on. These criteria do not require comparing with alternative models. For details, see Fahrmeir and Tutz, 2001.

Nikolay Samusik

Compare the variance of the initial data with the variance that is left after substracting the predicted values.

Néill Sweeney

If the experiment includes replications(independent trials with the same predictors e.g. radiation dose), calculating the pure error is usually useful. It's an easy calculation if you are fitting by least squares. The pure error is the minimum any regression function can achieve.

In a sense though, it is a comparison to a model, the so-called saturated model where there is a seperate prediction for each set of predictor variables used in the experiment.

Marco Durante

Pearson's chi-square test for goodness-of-fit and Fisher's F-test for the number of parameters. Norman Albright in Berkeley studied all the robust fitting procedures for survival curves. See Radiation Research 1987 Nov;112(2):331-40.

H.E. Lehtihet

Dear Steffen,

"So goodness of fit then translates into the distance of your solution to the global optimum solution."

1) Are you sure about that ?

2) Does this imply that if I manage somehow to get to the global optimum solution, I could then say that I have obtained the "perfect" fit ?

Steffen Borchers

Dear H.E. Lehtihet, If the model used for fitting is given (and not compared to other models as stated by OP), and your objective function fixed too, then the best you can do is to calculate the global optimal solution. The question discussed here as far as I understood OP correctly is which criterium to use to qualify the fitted model, and we have seen several useful suggestions here. Hope this helps you, best

H.E. Lehtihet

Dear Steffen,

Thank you for your last clarifications. The reason I have asked for these clarifications is due to the ambiguity of your first answer, which seemed to mix up between the best-fit parameters (solutions of the optimization process) and the goodness of fit (GoF) for the model.

When fitting data, the evaluation of the GoF is almost never a trivial task. Even in the linear case, there exist some issues as can be read in the following paper :

http://arxiv.org/pdf/1008.4686v1.pdf

In the non linear case, the problem becomes much more complicated and, of course, is not free of issues either. (see for example the following paper regarding the use of R-squared)

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892436/

In the case of the question discussed in this thread, the problem is even more complicated, given the fact that Igor has imposed (as you have said) quite stringent additional constraints. Here, Michael has pointed out what I believe is the major difficulty, namely: the cross-validation. Unfortunately, such a difficulty cannot be eliminated using a simple metric such as the distance to the optimal solution. Such a technique is equivalent to the "chi-by-eye" technique mentioned by Michael in his side remark.

Igor Shuryak

Thank you everyone for your suggestions! Especially useful are the references from Marco and HE. I will read them and ask more details as soon as possible.

In essence, I asked this question because I am interested in the following: how to formally (no only subjectively) tell whether or not a model fits the data reasonably? This is a different question from "does model A fit better than model B".

Igor Shuryak

I read the provided references with interest!

I wonder about the following issues:

If model fits to the same data are compared, AICc seems like a good method. But if there is only one model, AICc will not help. Is reduced chi-squared a good choice for goodness of fit assessment for one model? For example, the paper (http://arxiv.org/abs/1012.3754) claims that it is not. I would appreciate your suggestions!

H.E. Lehtihet

Dear Igor,

"if there is only one model, AICc will not help".

Indeed, both AIC and BIC are useful when you have several models and thus will not be of much help in your case.

Regarding the use of the reduced chi2, besides the issues you point out, this tool would not be applicable anyway if the errors do not follow a Gaussian distribution, as already underlined in the paper you cite.

The following paper might be of some interest to you, although it does not offer an answer to your original question.

http://www.ub.edu/gdne/amaydeusp_archivos/encyclopedia_of_education10.pdf

Hope this helps.

Igor Shuryak

Dear H.E.,

Thanks again for your comments and reference! So, what would you suggest as a simple way to estimate goodness of fit for a single nonlinear model? Perhaps some Monte Carlo based methods?

H.E. Lehtihet

Dear Igor,

Are you asking me this question because you know from our previous discussions in other threads that I like MC-based methods ? (LOL !!!)

More seriously, I don't feel competent to give a knowledgeable answer to your original question on absolute GoF testing. This is like your other interesting but difficult question about the case of small number of data. I have followed that thread from the beginning but without contributing, except for the side remark about the hitchhiker).

Non-parametric MC-based methods will help you get confidence limits for your parameters but they already assume that your model is good, so I don't think they can be used as a reliable absolute-GoF test.

The only thing I can suggest would be to use several different absolute-GoF indices. If you manage to get an acceptable score for each of these indices, then you could conclude with some confidence that your model is indeed good. The problem is that not all of them might be applicable in your case.

For some Abs-GoF indices, see the excellent paper "Structural Equation modelling : Guidelines for Determining Model fit", by Hooper, Coughlan and Mullen (2008).

If you don't find it, I can send you a pdf version.

Hope this helps.

Alessandro Giuliani

A very quick and efficient solution is simply to compute Y (est) = f(X) being f the non-linear model of interest , X the indpendent variable(s) and Y (est) the estimate given by the model of the variable of interest Y(obs. The goodness of fit of the model will be immediately estimated in terms of the Pearson correlation coefficient between Y(est) and Y(obs). Not only, but given we expect the best model linking Y(est) and Y(obs) is nothing different from Y(est) = Y(obs) and thus a line having intercept = 0 and angular coefficient = 1, if computing the best linear fit between Y(est) and Y(obs) I get an intercept signifcantly different from zero this indicates I have a problem with a systematc effect I did not take into consideration, if I have an angular coefficient different from unity this means I did not take into consideration a regressor or the order of the model is wrong...

Igor Shuryak

Dear H.E. and Alessandro, thank you for the very useful suggestions!

I wonder what you (and anybody else interested) think about the following simplistic (perhaps incorrect) idea:

Assume there are n data points x(i), y(i) where i=1..n. Each point has a Gaussian standard deviation s(i).

The model function makes predictions f(i) using x(i).

The goodness of fit for the model at each data point can be estimated by

g(i)=1-erf( sqrt(2)*abs(f(i)-y(i))/(2*s(i)) )

Then the goodness of fit for the whole model would be something like G=(product( g(i), i=1..n))^(1/n)

Does this make sense?

Marco Durante

With the additional clarification, I still think that the best way is the Weighted Chi-Square Goodness of Fit Test. In robust fitting procedures, when estimates of experimental errors is difficult, experimental uncertainties are multiplied by the normalized chi-square to reduce the weighted chi-square of the fit. Of course this procedure does not apply if you want to test the model (not the quality of the data). Hope this helps, interesting discussion!

H.E. Lehtihet

Dear Igor, Sorry for the delay in responding.

The index you propose is reminiscent to piece-wise GoF indices mentioned in one of the references I gave previously. However, beside the fact that I do not know which assumptions your approach is implicitly making on the error distribution (and perhaps also on the (in)dependencies between these errors at distinct data points), I would suspect that your technique would have a certain tendency to downgrade somewhat an otherwise quite acceptable fit and perhaps more so in the case of a large number of data.

On the other hand, you can still test somewhat the consistency of your approach. For example, you could check the literature for some benchmark data, fitting models and AIC rankings. Then, you could simply apply your own approach on those same models and data to see if you manage to get, at least, the same ranking.

Hope this helps.

Igor Shuryak

Thank you, Marco, for your input!

Igor Shuryak

Dear H.E.,

No problem, and thanks for your interest!

I am not a statistician. The formula I wrote certainly assumes that the errors are independent and Gaussian. Probably it also implicitly assumes lots of other things which I am not aware of.

In general I think that if the value generated by this formula is high (close to 1), this suggests that model predictions are (on average) within the range of the error bars of the data. However, systematic defects in model predictions (for example if the model consistently overpredicts the data by a small amount, so all the residuals are small but positive) will be missed. To check for such things the approach suggested by Alessandro (doing a linear regression of the predicted values vs the data and checking how the intercept and slope differ from 0 and 1, respectively) sound reasonable to me.

However, if the formula generates a small value (e.g.

H.E. Lehtihet

Dear Igor,

"Why do you think sample size should affect the results a lot?"

My statement was simply based on a crude estimation of the behavior of your proposed GoF. It seems to me that a bad fit at a single location, and no matter how good the fit might be elsewhere, would downgrade considerably the overall score given par your GoF. In the case of a large number of points, the probability that such spurious points exist would not decrease.

Usually, GoF indices are the result of some averaging operation that certainly accounts for all the data points but that is not too sensible to how good or bad a fit is at any single point.

Igor Shuryak

Dear H.E.,

Thanks again for your answer! I have the following thoughts:

1. Perhaps effects of "outlier" data points on any GoF index are easiest to test by bootstrapping methods - i.e. to see how sensitive is the GoF from the proposed model to perturbations of the data set?

2. To "discourage" the model from small systematic deviations (e.g. from overestimating all data points by a small amount), perhaps an easy way is to multiply the GoF index by the binomial probability coefficient n!/(k!*(n-k)!), where n is the number of data points an k is the number of positive residuals? For large n this can of course be approximated.

As always, would be grateful for input from anybody interested!

H.E. Lehtihet

Dear Igor,

1) Indeed, bootstrapping may give you some information regarding sensitivity. However, I don't see how this information can be used subsequently to assess the goodness of fit.

2) I don't think so (or perhaps I do not understand exactly what you mean). A fitting procedure includes two phases : (A) obtaining the best-fit parameters for the selected model; then (B) evaluating the GoF index for the resulting fitted model. The first operation is an optimization whereas the second is an evaluation. Therefore, if you should 'discourage' the model or include any bias, you can do so but only in phase (A) to help guide the optimization process. However, if I understood correctly, you intend to include a modifier in phase (B) and not in phase (A). By doing so, you will modify, of course, the evaluation and thus the score of the fitted model but you will not modify how this model was obtained earlier.

Igor Shuryak

Dear H.E.,

Thanks again for your reply! I will try to be more clear about the points above:

1. I was thinking for example about the following situation: Suppose perturbing the data set by bootstrapping shows that good fits of the model are obtained in all cases when a particular data point happened to be excluded. But when that point was included, the fits were much worse. This could be an argument for saying that the model is generally not too bad for this data set, but one point happened to be an outlier - perhaps by chance, or perhaps because it represents some yet unexplained effect. Does this make sense?

2. Perhaps a clearer way to write what I meant is the following: Suppose there are n data points. The goal of the fitting procedure is to minimize some function G(n) = SUM[ g1(i)-g2(i), i=1..n ], where, for example, g1(i)=ln[(f(i)-y(i))^2/s(i)^2] and g2(i)=ln[(i)!/[(k(i))!*(i-k(i))!]], y(i) are measured data, s(i) are standard deviations, f(i) are model predictions, and k(i) is the number of positive residuals. The goal of using this would be to "strongly encourage" the model to go through the "middle" of the data (i.e. to have equal numbers of positive and negative residuals) and discourage systematic deviations.

Once again, would be grateful for input from you and anybody else interested!

H.E. Lehtihet

Dear Igor,

1) Yes it does make sense as you are not rejecting the possibility that the outlier might actually be due to "some yet unexplained effect."

2) My objection concerned the use of a modifier in phase (B) as a way to bias the model. On the other hand, as long as you are working in phase (A), i.e.: optimization, you can introduce modifiers to guide the optimization process and to promote the best-fit solution according to what you think is desirable. This is like modifying the optimization criteria. A simple example can be given in linear fitting when minimizing sum(e²). One drawback is the "square effect". Any outlier that happens to be very off, will attract significantly the best-fit solution, which will end up being offset w.r.t most data points. One way to "discourage" this undesirable effect is to use instead sum(|e|) but at the price of complicating greatly the computation (non-smooth optimization). In any case, this modification is done in phase (A) and not in the next phase (evaluation of the GoF).

Christopher James Davia

Try this:

This team is applying quantities derived from dynamical systems theory to the problem of assessing the health or sickness of a single cell.

The role of mitochondria and mit-dna in oncogenesis. Quantum Biosyst. 2, 250–281. Marchionni M., Caramel S., Stagnaro S. (2013).

Igor Shuryak

Dear H.E.,

I appreciate your useful answer!

My next questions concern the following hypothetical situations:

1. Suppose a customized fitting procedure (e.g. like the one minimizing G(n) which I described above using binomial coefficients to guide favor fit with equal numbers of positive and negative residuals) is used. Can this produce any estimate of "absolute GoF" criteria? For example, use this procedure on several data sets and several models and plot the resulting fits and the values of say G(n)/n. This can in principle produce a situation where values of G(n)/n < some threshold value X represent "reasonable" fits, and if G(n)/n>X the fits are "unreasonable". Of course this is an approximation, but does it make sense in principle?

2. Suppose 2 models (A and B) are fitted to the same data set using the customized procedure minimizing G(n). If it then reasonable to compare the fits of these 2 models by calculating say AICc for each, using the data and predictions from each model, and saying that the one with lower AICc (say model B) fits somewhat better? I ask this because it is in principle possible that if these same models are fitted to the same data using a different procedure (say minimizing AICc instead of the custom function G(n)), it may turn out that model A (istead of model B) has lower AICc.

As always, would be grateful for your input!

Igor Shuryak

Christopher: Thanks for your answer, but perhaps there is something missing (e.g. a link)?

Christopher James Davia

Igor

Whoops - it's there now.

H.E. Lehtihet

Dear Igor,

If my understanding of your post is correct, I would answer no to both question.

1) It seems that you would like to use basically the same function for both phase (A) and phase (B). If you do so, the result will never be credible. It is the same as being the reviewer of your own paper. Usually, the function we use in phase (A) reflects only our fitting criteria (and not the goodness of fit). We use this function in an optimization process to get the best-fit parameters for our model (it is like doing our best when presenting a paper). Once this is done, we must turn to phase (B) to evaluate the GoF. This evaluation must be blind and independent (like any good reviewing process).

2) Here, it seems that you would like to do the opposite. In other words, you would like to use a standard GoF (AICc) as a function to be used in the optimization process. I don't think you can do that for the same reason (independence) I described above.

Hope this helps

Igor Shuryak

Thanks, Christopher, now I got it!

Igor Shuryak

Dear H.E.,

Thanks as usual for an informative answer!

Your point about question 1 seems very reasonable to me. However, I am confused about question 2. There what I intended was to use different functions for different stages: use some custom function (e.g. G(n)) during the optimization to get best-fit model predictions, and then use a standard function (e.g. AICc) to evaluate the GoF of these predictions. Does this make sense?

H.E. Lehtihet

Dear Igor,

Thank you for clarifying question 2.

I was simply misled by your use of the word "instead" in your previous post : "...minimizing AICc instead of the custom function G(n)".

Now, I fully understand what you intend to do and it makes much more sense with perhaps a few words of caution.

Please note that AICc is not an absolute GoF. It provides a ranking between models while including some parsimony criteria so that the best-ranked model is not necessarily the one that fits best the data.

The following paper might be of interest for you.

http://www2.unil.ch/popgen/modsel/biblio/BurnhamBES11.pdf

See, in particular, technical issues N°2, N°11 and N°14 that could be relevant to your case but I am not sure of that.

Hope this helps

Igor Shuryak

Dear H.E.,

Thank you once again for your interest and for a very useful reference!

I am aware that AICs is useful for comparing models, but not for absolute GoF estimation.

I now wonder actually that the following may be a very straightforward (but simplistic) way to estimate absolute GoF: simply report the percentage of residuals which are >1 and >2 standard deviations away from the data points. Along with a plot showing the model fit to the data, this simple summary should provide a reasonable idea of what percentage of the data points are fitted poorly. Does this make sense?

Thanks in advance!

H.E. Lehtihet

Dear Igor,

Absolute GoF evaluation is not an easy task. The technique you propose is simplistic indeed and I would not trust it as a GoF index. On the other hand, this technique might be useful as a 'BoF' index that will help evaluate the 'badness' of a fit and reject inadequate models.

Igor Shuryak

Dear H.E.,

Thank you for your useful comment! Indeed it makes sense that the simplistic method of calculating the percentage of model predictions which are >1 standard deviations away from the data points can identify a "bad" fit (where such a percentage would be large), but cannot tell the difference between a "pretty good" and a "very good" fit (in both cases the percentage may be zero).

Emilio José Chaves

Igor, my vision is that if you make a test to explore your data, that test must contain some link with the model to be used as a fitting curve. For example, you start from a Lorenz curve with 10 data points (Xi; Li) -ordered from tall to low variable- and you test it creating a graph (Xi; Fi) where Fi=ln(Li)/ln(Xi). Then there is behind a link with a model of the Lorenz curve with shape L(X) = X^F(X), and if you get a continuous fitting curve for F(X) then you only need to derive L(X) to obtain the CDF, in mediae of distribution, with the shape: CDF(X) = L(x)*[ F/x + lnX*F´(X) ]

where ... F´(X) = derivate of F(X).

When F(X)=Fc constant you have a Pareto distribution so F´(X)= 0, so its CDF is

CDF(X) = L(X)*Fc/X and given that L(X)=x^(Fc) it becomes CDF(X)=Fc*x^(Fc-1)

This paretian expression comes in mediae vs cummulative fraction of population.

I have worked many data sets in that way, I normally measure the fitting in mean absolute deviation terms, and in most cases the fitting is good, even for extreme values and high dispersions.

There are some logical retrictions because CDF is very sensible to the derivate F´(X) and in some cases the CDF function may increase, which breaks the decreasing order premise. But this is another question for another moment.

The advantage is that you may compare several samples and observe their structural functions F(X), L(X) and CDF(X) graphically to sustain other analysis.

I hope it contributes to your question. emilio

Turtogtokh Erdenebaatar

Dear Mr. Igor Shuryak,

I was wondering if you get some useful information or kinds of literature and references, please share it with me. I'm also working on the same problem. Igor Shuryak

Badges
Science topic

More Igor Shuryak's questions See All

How do I eliminate noise variables when using ensemble prediction methods like randomGLM in R?

The task involves predicting a binary outcome in a small data set (sample sizes of 20-70) using many (>100) variables as potential predictors. The main problem is that the number of predictors...

10 November 2015 9,260 11 View

Is it justified to fit a mathematical model to different types of data simultaneously?

For example, cells were exposed to different concentrations of a toxic chemical. Two types of data were measured: (1) the critical concentration above which no cell growth occurred, (2) for lower...

03 April 2015 5,159 24 View

How to appropriately account for errors when fitting a mathematical model to summary data which represent percentiles (e.g. median) of a histogram?

The data provided are percentiles (median, 75th and 90th) from several histograms. The number of observations comprising each histogram is known. The shape of the data distribution in each...

31 December 2014 7,014 5 View

How to estimate a sample size for a mouse study intended to compare several potential dose response shapes using Akaike information criterion?

Suppose a mouse study is being designed which aims to address the question of which dose response shape (e.g. linear, quadratic, exponential, etc) best describes the response to a particular...

04 May 2014 3,157 11 View

Is there human or animal in vivo data on reprogramming of non-tumorigenic cancer cells to cancer stem cells during cancer radiotherapy?

Some in vitro data using cancer cell lines has been published, which suggests reprogramming of non-tumorigenic cancer cells to cancer stem cells induced by ionizing radiation. Is there some...

03 April 2014 6,428 1 View

What are some software tools (preferably free) to fit mathematical models containing differential equations to experimental data?

Suppose a mathematical model consisting of a system of differential equations is constructed to describe some biological phenomena. Analytic solutions are not available - only numerical ones. Are...

03 April 2014 551 14 View

How to statistically analyze experimental data which turned out to violate the assumptions used for initial power/sample size estimation?

For example, suppose a certain method was chosen for power/sample size calculation for a future experiment, with the intent to use the same method to analyze the experimental data. But the...

31 December 2013 5,465 20 View

What are the most efficient methods for power/sample size estimation for mouse studies with longitudinal data (e.g. tumor volumes measured over time)?

Among the many available methods, which are the most efficient (i.e. require the smallest sample size for defined alpha and beta) and relatively straightforward to implement?

11 December 2013 9,977 10 View

What are some useful techniques to statistically compare exponentially distributed data?

Suppose there are several treatment groups. Within each group there are exponentially distributed data, i.e. the number of counts is proportional to exp[-A*x^B], where A and B are parameters. What...

09 October 2013 476 56 View

How useful (or not) is partial rank correlation coefficient (PRCC) for estimating the sensitivity of non-linear models to parameter values?

I used PRCC in one paper (A model of interactions between radiation-induced oxidative stress, protein and DNA damage in Deinococcus radiodurans, available on my page) because reviewers requested...

07 August 2013 6,189 3 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View