I want to fit my data with the Gompertz curve. So I should find the best free parameters in Gompertz function that leads to low bias and variance. What’s your suggestion?
I have not used Gompertz, though i would not mind learning something about it, but if you have test data available, perhaps you could compare some candidate models using z-scores, cross-validation and/or bootstrapping, if you are considering bias as well as variance. Estimated variances of prediction errors are very useful, and can be done in 'real time,' but you test for bias with data, post hoc. Bias can be more important, so it is good that you are considering it.
Heteroscedasticity may be important too, so hopefully you can place that properly in your error structure. (Beware then of hypothesis testing.)
Cheers - Jim
PS -
Regarding hypothesis tests, you should pay attention to effect size. Perhaps this will help:
Perhaps you do not need hypothesis tests. You might look at model selection and assessment/validation in books like Hastie, Tibshirani, and Friedman (statistical learning, springer) or John Fox (applied regression analysis, sage).
PSS - You may have an "unbiased" model, where the estimated residuals in your sample add to zero, or close to zero if only the expected sum of residuals is expected to be zero for your model, but the model will still have the bias of not exactly modeling the "true" situation. So model validation using test data can be very helpful.
Hope I did not misunderstand your question too badly.
Article Practical Interpretation of Hypothesis Tests - letter to the...
Whereas these notes from Emory seem, again, 'at a glance,' to somewhat gloss over that and give Mathematica programming for it, but it looks like assumptions are made that may not be entirely clear, though it does give examples:
The goodness of fit depends on the statistical method used to estimates the parameters.
Next, you have to consider whether the method is linear or non-linear. In my page on researchgate, I have some suggestion; however in Portuguese, but you can have access to the references because most of them are in English.
I think that whenever you use a "training" set of data to estimate coefficients for your model, and then a different set of data to check out performance, you are more or less looking at all the kinds of error that could come up, though measurement error does make this less than a straightforward comparison of predicted vs "true" y values. Even taking out one observation at a time to see how a model based on the others would have estimated (predicted) for it, can be helpful. There are a number of books and papers on model validation, and model selection for that matter, that you could look into, among the ones that might help would be the Hastie, etal and Fox books I mentioned.
I have a paper on ResearchGate where I did a general validation of a model across a number of categories for official statistics, the third in a series of papers ending in 2001, but I doubt it is what you need. I suggest you look at textbook examples of validation.
Best wishes - Jim
PS - This means that you could use a number of methods, but the key is "how" you use them. The "training" set of data has to be used to form your model that is then applied to other data. Often this is referred to as "splitting" the data. Compare what you have predicted, pretending you did not have those y data, to what you did actually have. You can then choose methods from those noted above, or perhaps others.
I know my model can be fitted to the Gompertz curve and I just want to prove it. So, I think there is no learning in my case and should find the best free parameters of the Gompertz curve just by trial and error. What is your idea? Is there any more systematic method?
I don't know. Trial and error might get you close enough, i suppose, but it looks like there are three parameters so that could be tricky. From the "sciencedirect" link I found and gave above, it looks like numerical solutions are used here to obtain the parameters (as opposed to a closed form solution), so if you don't have software programmed for Gompertz, you may need to try trial and error, I suppose. As I said, I don't know Gompertz, but that sciencedirect link gave the following reference:
J.E. Dennis, R.B. Schnabel
Numerical Methods for Unconstrained Optimization and Nonlinear Equations, 1996, SIAM, Philadelphia
They (Dragan Jukić, Gordana Kralik, Rudolf Scitovski) also noted this:
P.E. Gill, W. Murray, M.H. Wright
Practical Optimization, 1981, Academic Press, London
Note that they also had the following reference, which sounds promising:
Now, I want to use the chi-square test. I have 50 samples so that each of them contains 100 data points and want to see if the 100 data points have the Gompertz distribution. The point is,the chi-square test just take one sample (in my case, the 100 data points) as an input. So, do you have any solution?
I'm not clear on what you mean, but if you are trying to apply hypothesis testing, as I noted above, that is not very clearly interpretable (though many use it ... often not well). --- If you fit a curve to your data, you could plot your estimated residuals. (You could research "residual analysis.") The sum of your estimated residuals should be close to zero. If you are looking at more than one possibility, you could choose the curve with the smallest mean square residuals. (But there is always the possibility of overfitting to your particular data set if you tried different models without test data as well as training data, but I'm not clear on your goals.) Other than that, as I indicated, I'm not really so expert on curve fitting. Perhaps you should go to the top of your profile page to the search box. You can change the icon there from a person search to a publication search. Perhaps there you will find a paper that will help you better, and that you can reference.
If I understand the discussion correctly, it sounds like a case where the bias/variance issue won't be a concern. If you simply have a set of data, and you know you want to fit it to the Gompertz curve, it is just a matter of finding the best fit.
You will want to use a procedure like PROC NLIN in SAS or nls in R to find the best fit curve.
To “prove” that it’s a good fit, you can look at three things. 1) a p-value for the overall model. 2) a pseudo r-square value. 3) VERY IMPORTANTLY, the correct distribution of the residuals. If the residuals show a pattern, then your model actually doesn’t fit the data very well. See this link. It may show the “bias” you are concerned with. http://condor.depaul.edu/sjost/it223/documents/resid-plots.gif.
You can see an example of fitting a curve with R on the following link, in the “nonlinear regression” section. It’s not the best data for an example, but it explains the process a little more. http://rcompanion.org/rcompanion/e_03.html.