Why must we check interpolation and extrapolation of the function?

Hi,

You usually check this when you intend to use the function as a predictor for other "input" values (intra or extra data set range).

Simple Linear Regression is just one of the methods of linear regression where the slope of the fitted function (line) is given by the Pearson correlation coefficient (if memory serves me right).***

*** I went to verify this and actually (quoting wikipedia): "The slope of the fitted line is equal to the correlation between y and x corrected by the ratio of standard deviations of these variables." in https://en.wikipedia.org/wiki/Simple_linear_regression

Although this notion of dependence is perhaps the most widely recognized it should not be used blindly. The figure in the link below gives you four different sets with the exact same correlation of 0.816:

https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Anscombe%27s_quartet_3.svg/800px-Anscombe%27s_quartet_3.svg.png

The font is this wikipedia article: https://en.wikipedia.org/wiki/Correlation_and_dependence

Notice that the same function is sometimes a reasonable predictor, and other times really terrible. This is the main reason to take into account the intrinsic "interpolation" characteristics of this method.

As for extrapolation I'll leave an example by Professor Gerard Dallal given in his website (or at least one of his websites):

http://www.jerrydallal.com/lhsp/slr.htm

And I quote:

"Extrapolation is making a prediction outside the range of values of the predictor in the sample used to generate the model. The more removed the prediction is from the range of values used to fit the model, the riskier the prediction becomes because there is no way to check that the relationship continues to be linear. For example, an individual with 9 kg of lean body mass would be expected to have a strength of -4.9 units. This is absurd, but it does not invalidate the model because it was based on lean body masses in the range 27 to 71 kg."

Hope it answers your question.

EDIT: Sorry just noticed you've referred "Single" Linear Regression instead of "Simple" Linear Regression (which is what I was thinking when given my previous answer). I'm not aware of what is the "single" version (perhaps a misspelling?) but the conclusions I've pointed in my comment above are mostly relevant to any function acting for prediction.

Phan Nguyen Huu

Dear Sir,

Thank you very much!

My data can identify three types regression fit.

Excuse me?

By interpolation and extrapolation can determine more accurate equation not?

And if determined to be the limit of allowable variation is how much?

Thanks!

Pedro Correia

I'm not sure I understood your questions but I'll try to answer to the best of my ability.

Generally speaking - YES. The function is better if it's interpolation and (if relevant) extrapolation characteristics are better. So analyzing these may help you reach a better equation.

There is no specific limit for the variability of the function vs the data. It's your own interpretation if the function is better or not since only you can think creatively about the data you're manipulating. Again, generally speaking, it's good practice that the function follows the same behavior (slope, curvature, shape, etc.) than the data.

Phan Nguyen Huu

Dear Sir,

We thank you for helping me explain.

However, please let me ask you:

Upon finding the regression equation fit the experimental data.

I found three suitable equation: y1(x) = -0,0396.x+1,2152; y2(x) = 0,0011.x2 – 0,0631.x + 1,2739; y3(x) = 1,2622. e-0,049x

I used to interpolate and extrapolate to comparison between observed and calculated values for % difference: y1: 12.45%; y2: 25.27%; y3: 16.25%

I see large differences%

So I'm wondering?

% Difference in how much is acceptable.

or just pick a smaller equation is satisfied.

You can tell me the largest difference value is how much to help assure correct?

Thanks!

N.H.Phan

Pedro Correia

I would advise against using single valued residues to decide on the quality of functions for multivariate data (which I'm assuming the % is). There are plenty of phenomena that may change those numbers significantly without any major implication to the whole of the function quality. And even the fact you've opted for a percentage means you've already processed somehow your numbers making them useless to an outside viewer (percentage of what?!).

My advice is for you to plot the data with your functions and interpret yourself which one seems the best.

Phan Nguyen Huu

Thank you very much!

Jack Kornfield

Hopefully these will be helpful:

http://tinyurl.com/oby5hk3

http://tinyurl.com/nqd6c5s

http://tinyurl.com/opwqkb7

http://tinyurl.com/orj8n66

http://tinyurl.com/pf8axh6

Donald Myers

Just a few observations: (1) the third equation appears to not be linear regression unless the coefficient ".049" in the exponential is already known and not fitted as part of the regression. (2) Are you just using least squares to fit the coefficients in the first two models or are you using something else such as Maximum Likelihood? It is well known that in the case of a multivariate Gaussian distribution for the error terms that least squares and MLE produce the same results. Using least squares does not require any statistical assumptions but then you can derive any statistics either other than R2. (3) the regression equation is much like estimating a mean value for "y" (as a function of "x" and also it is not an "exact" interpolator/predictor, i.e. if you substitute a value for "x" where you have a value for "Y, the predicted value will not be the same as the data value.

If you look at the literature, e.g. documentation for any of the easiest software packages you will see they distinguish between obtaining a value of "y" (an estimate of the ,mean) and a "predicted value" of y even at data values. Have you looked at the R2 value, have you looked at histograms of the residuals, have you looked for influential points.

Finally you might want to look at "Loess" curve fitting, there are several open source packages available

Finally, from a purely statistical point of view, you are making assumptions about the model, making assumptions about the error term(s) (distributions, variances, means, etc). While you can use the data to do some testing about these assumptions, those tests are not valid when you extrapolate and may not be be in for interpolation. I think your problem is more fundamental than just whether it is necessary to "check" the function.

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

What is the difference between mathematical R^4 space and physical 4D unit space?

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

Controlling for pupil light reflex when analyzing pupil size time course?

What are a “Farmers Producer Organization” (FPO) and its essential features?

Strugglling with m6A dot blot any suugesstion ?

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How to get moment output in Abaqus Standart?

How to confirm the site-directed mutagenesis result without performing NGS?

How do I replace a file with a more recent version of a paper that was uploaded to ResearchGate?

Different interpolated concentrations?

Error in COMSOL simulation when solving transfer function related to SnO2 gas sensor?

How to select a remapping technique for different type of variables?

How to generate intermidiate replicas (with kinks) using linear interpolation for NEB calculations LAMMPS?

How to interpolate & generate the missing gridded data near the coastal region?

Higher order convection schemes for cell-centred unstructured finite volume formulation?

Fluent ERROR： Variable cannot be interpolated. Not allocated. what cause this error?

How does the choice of interpolation method impact the visual quality of computer-generated animations?

What are the quality assurance measures taken before and after applying interpolation/extrapolation techniques?