Least-squares only for normal distributed errors - can anyone help?

19 March 2014 56 5K Report

A parameter of a statistical model should be estimated from iid data. For simplicity, let's consider a single-parameter model. The likelihood principle sais that the best estimate for the parameter woule be the one for which the data is most likely (=MLE). This was already noted by Gauss, but consequenty further developed by Fisher. The method requires to know (or assume) a probability distribution of the residuals (the "observational error", as Gauss named it).

The MLE is the arithmetic mean (average) of the observations for a variety of parameters in a variety of models (normal, Poisson, Binomial, exponential... actually all models where the parameter determins the expected value) [this holds also for k-parameter models like gamma, beta, normal with unknown variance, ...].

On the other hand, the average is the solution of the least-squares condition. Therefore I thought that the LSE and the MLE of the expected value is numerically identical. Further it follows from the likelihood of normal distributed residuals that the MLE is mathematically identical to the LSE.

Now I read in "GAUSS’S LEAST SQUARES CONJECTURE" (JOAKIM EKSTRÖM) in Chapter 3 on page 7:

"Contrary to the discussed claim of Gauss (1809, §178), that the probability distribution of the observational error is of no importance in practice, the probability distribution is crucial as to whether the method of generalized least squares yields the most probable value under the density criterion or not. Given an independence assumption, the normal distribution is not one of many, but the only probability distribution under which the method of generalized least squares yields the most

probable value."

Why? How? I am lost. For instance, the average is the MLE for the expected count (so a Poisson distributed variable), and the average is the LS-solution. Doesn't this contradict the above paragraph?

Can someone help me getting rid of the knots in my brain?

Alex Gilgur Popular answer

Vladimir, I agree with what you say about the quality of the data. However, I can give you a dozen cases where the dependent variable is not normal by definition. For example, if you are improving the lifetime of a gadget and are studying its dependence on explanatory variables, least-squares regression will mislead you, because time between failures is an exponentially distributed variable: it's a memoryless process. If you are looking at morning commute traffic and trying to predict it, it too is a non-normal variable: it follows Erlang distribution. Normality is a nice assumption, strengthened in its position by the central limit theorem, which made it deceptively prominent, but it is just that - a simplifying assumption.

Cheers!

Krishna Kumar Kottakki

The comment is Astrom is valid. Let me share my practical experience on this aspect.

More often, it will be assumed that the data is coming from a Gaussian density. In estimation, Kalman filter gives the best solution, if all the noisy data is more or less follows the Gaussian. Note that, Kalman filter is a least square solution, for the stochastic processes with known (approximated) process and measurement models. It means that, least squares can give better representation only for the data from Gaussian density. This approximation fails, if the data is not a Gaussian, or non Gaussian. However, it is user's task to know whether data is coming from a Gaussian or non Gaussian density.

Coming back to Astrom's comment, he has given a hidden information that if the data is not well approximated by Gaussian density, then one has to go and check for best fit of the data, like Gamma density function i.e a Sum of Gaussians.

David H Abbott

Jochen, I see what you mean. My guess is that the authors should have said "yields an accurate picture of the distribution of the estimate" - i.e. gets the variance and other momemts correct or equivalently gets value of the log likelihood correctly.

Pascal Letourneau

If I understand your question correctly, here is what I would answer:

- For the sake of estimating the parameters, the "real" distribution of the errors does not mater. You can assume it is Gaussian and move on.

- For the sake of inference (precision of your estimated parameters) the "real" distribution is very important. The standard inference tests on parameters usually assume the errors are Gaussian, but if the true distribution is far from Gaussian, than your inference is wrong.

Thomas H. Jagger

The least squares result is the same as the most probable value if the negative log likelihood is the form of a quadratic sum., i.e. a Gaussian distribution will do here. In particular the expression minimize the square of the residuals, for independent normal variates amounts to a quadratic sum.

If you do NOT know the distribution, but assume it is of the form f(y-mu) where mu may be a function of some covariates, and the expected value of y is mu, and the variance of y is constant, and the observations are statistically independent then the least squares is a Best Linear Unbiased Estimate (BLUE)

http://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem

Also, I would gather it is also the limiting MLE estimator.

Michael Cardiff

Pascal's comment above is not necessarily correct, if I am reading what he is saying correctly.

If you assume Gaussian errors (i.e., use least squares estimation), and the actual errors come from a distribution that tends to produce more outliers, then least squares is not the optimal estimator and will give poor results (even for the estimate, *not just* the precision)

One approach when outliers are more prevalent in your data is to use something like IRLS (iteratively re-weighted least squares). IRLS will give different parameter estimates than LS, and is the optimal estimator if the errors are distributed according to a Laplace distribution

http://en.wikipedia.org/wiki/Laplace_distribution

http://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares

The general idea is that if you take the negative log-likelihood of the error distribution and then optimize the parameters to minimize the negative log-likelihood, you get the best estimator. For a Gaussian distribution, this results in a sum of squares minimization. For Laplace, this results in a sum of absolute value minimization.

Andrzej Latocha

Michael you're right.

Can simply solve this problem by using a more efficient numerical algorithm than

IRLS (iteratively re-weighted least squares) ?

It involves the use of a original distribution signal and the same signal moved by optimum offset.

Which assumptions are described along with a numerical example.

https://www.researchgate.net/post/What_is_the_maximum_likelihood_estimation_of_dynamic_systems

Mohamed A. El-Sayed

I agree with them. I attached here a links in the field which may be an useful for you.

http://en.wikipedia.org/wiki/Laplace_distribution

http://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares

Alex Gilgur

Go nonparametric. Use quantile regression. R and SAS both have libraries for it.

Jaya Prakash

The MLE method (statistical scheme) is equivalent to Tikhnonov regularization scheme (which is more deterministic). In our inverse problems community it is well known that the Tikhonov regularization scheme works well if the noise in the data is of Gaussian type. In case of speckle/possion noise the objective function to be minimized changes from L2-norm (Ax -b) to L1-norm of (Ax - b) subjected to constraint (L2-norm of (x)), it was numerical shown that minimization of this objective function performs well.

Demetris Christopoulos

Dear Jochen, I do not want to get lost in the sea of the relevant gigantic literature. Think about one only aspect: What do we exactly mean when we are reading or writting 'Gaussian error'? I am not so sure that we always mean an error term that follows the traditional normal-like exponential form. Most of the cases the Ordinary Least Squares (which is an orthogonal projection after all) the only requirement that needs in order to produce an unbiased estimator is just that error~iid(0,sigma^2), ie the mean value of the residuals to be zero.

Alessandro Fassò

Some properties of least squares depend only on 2nd moments of the errors.

In particular unbiasedness, consistency and BLUE optimality.

Nonetheless, in presence of outliers or skew distributed errors , LSE perform poorly.

In case of non zero mean errors or errors correlated to the predictors x, LSE is not consistent, which is the major drawback.

Vladimir Kaftan

If we speak about observation error we have to realise that it is man-made result. We have a possibility to avoid confident outliers, gross errors and systematic errors in most cases of our measurement. It make us able to receive a normal distribution of that values called like errors. So we will receive the most probable values of our measurements using the least square method. This particularity allow us to know how good we measured our object. We will see this through a test the distribution of the measurement residuals. If it is quite normal - it is Ok. If not - we have to improve our measurement technique, personal quality and so on.

Andrzej Latocha

Give an example of the real measurement system not deform the dynamics of the object, which returns the results as a Gaussian distribution without perturbation.

Small perturbations in the distribution of the LSE generate unacceptable errors.

Alex Gilgur

Cheers!

Vladimir Kaftan

Yes, Alex! Many examples of dependent but random variables are not normally distributed. But you can have got a covariation matrix describing this dependence and receive most probable values of unknowns using the least square technique expanded to correlated measurements. The normal distribution is a basic assumption for RANDOM errors. People can reach it by improving his own measurement experience and technique. It is possible not always but we must try to do it.

Demetris Christopoulos

@Vladimir, I have a question: Why should "The normal distribution is a basic assumption for RANDOM errors"? Why not the Uniform(-r,r) which has also a zero mean value?

Genelyn Ma. Sarte

I think what you read about the least squares estimators of coefficients from a linear regression model combines two results. Under the Gauss-Markov theorem, normality is not required, and the estimator is "best" only within the class of linear unbiased estimators. "Best" being the one with smallest variance. I don't recall any mention of "most probable value".

Adding the assumption of normality will make the least squares estimators the maximum likelihood estimators as well, which is what he probably meant by "the normal distribution is not one of many, but the only probability distribution under which the method of generalized least squares yields the most

probable value". I take it that "most probable value" is the maximum likelihood estimator.

Jochen Wilhelm

Could it be that the confusion comes from different interpretations of the word "estimator"? I mean, "estimator" can specify a point value (e.g. wheer the likelihood is maximal or where the sum of squared deviations are minimal) or it can specify a whole function that also provides some additional information about (e.g. the variance of the point-estimator [curvature, information matrix], ranges of plausible or implausible point-estimates [confdence or likelihood intervals] and so on).

Vladimir Kaftan

Dear Demetris, thank you for the comment. If we define the random error following Gauss, I think my sentences is more correct. But sums of rather many uniform distributed small values like you have mentioned will normally distributed. I am not sure that people can make uniform distributed errors. Let us remember any shooting results. I think that uniform distributed values should not be measurement errors.

I have to mention that all my comments are referenced to measurements and /or observation.

Vladimir Kaftan

Dear Genelyn, please have a look at the paragraph http://link.springer.com/chapter/10.1007%2F978-1-4612-0545-6_8#page-1 . It seems to me that the "most probable value" would be equal to the "maximum likelihood estimator" in the case of the normal distribution.

Genelyn Ma. Sarte

Hi, Vladimir. That's what I thought - that most probable value actually means the maximum likelihood estimator. And this can be derived when a distribution is assumed for the error component. In the case of linear regression, the least squares estimator will also be a maximum likelihood estimator under the assumption of normal errors.

Thanks for the link.

Demetris Christopoulos

Dear Vladimir, the uniform errors are more common in our life than we believe, just think about the small amounts of cents and the truncation procedures that are done every day. It is not necessary to have a bell type error distribution.

Stefan Born

I think that the quoted paragraph refers to "least square estimates" for regression models with continous real probability distributions for the measurement errors. If in this case, we assume that measurements y_i are independent and distributed according to some probability density p(y_i; f(x_i, \theta)), the log-likelihood for the whole sample would be \sum \log p(y_i; f(x_i,\theta)). For a Gaussian distribution of the measurement errors (in y) the MLE estimator of the parameters \theta of the regression model would be indeed the LS-estimate, however e.g. (- a somewhat strange example-) for exponentially distributed measurement errors (in y) this would lead to a completely different estimate, even though the MLE for the parameter of the exponential distribution at a point would still be the average of the corresponding measurements. -- Was this the source of the confusion?

Vladimir Kaftan

Dear Demetris, you are right again especially if you say about roundoff errors. I am not sure that the rounding operation is a measurement. But this error is a small component of a whole measurement error which equal to a sum of many small elementary parts related to different operations of a measurement procedure.

Jochen Wilhelm

Stefan, I am not sure if I understood what ou said in the last part: "for exponentially distributed measurement errors (in y) this would lead to a completely different estimate, even though the MLE for the parameter of the exponential distribution at a point would still be the average of the corresponding measurements. -- Was this the source of the confusion?"

As I understood:

normally distributed errors -> estimates are the LSE == averages

exponentially distributed -> "completely different estimates" (so not LSE?); but -afaik- here the MLE is used

The MLE is again the average

My confusion is that the average *is* the point that minimizes the (residual) variance, so it *is* the LSE. Thus I think that MLE also leads to (simplifies to/is equivalent to) the LSE. But only w.r.t. the point estimate, and not w.r.t. to the standard error and derived interval estimates! This knot is still not cut...

Stefan Born

Jochen maybe I'm wrong (- and I'm not quite statistician -), but I give cutting the knot another try and clarify the second bit of my last post ( and apologize beforehand ...)

1. If you estimate the parameter of a continous probability distribution whose parameter is equal to the expectation value, the average of a sample is an estimator of the expectation value. It minimizes the residual variance of a given sample and is the MLE-estimator for a normal distribution and for some others, e.g. the exponential distribution, hence in this case the LS and MLE estimators are equal, and in this narrow sense the claim of the book would be wrong (but I understood the claim as referring to regression, anyway, s. below).

The LS estimate is not equivalent to the MLE for probability densities like p(x)=1/c e^{-|x-\mu|}, for which the MLE yields the median, which proves to be an estimator of a lower variance (among different sets of samples) than the average.

In view of this I consider it rather as an accident if the MLE for a given probability distribution (whose parameter be the expectation value) is equivalent to the LS.

2. In a regression setting, I still believe that even for the exponential distribution the MLE will not lead to something that is equivalent to the LS estimate. I may have done the calculations wrong: Suppose you have positive x_i and y_i, as usual the x_i without errors and the y_i distributed according to some probability distribution with expectation value a x_i. We want to estimate a. If the probability distribution in question is the exponential distribution (that's why everything is supposed to be positive), this would mean that the probability density for y_i is 1/(ax_i) e^{-t /(ax_i)}dt, the log-likelihood of the data (given a) is \sum_i (-\log(ax_i) -y_i/(ax_i)). Maximisation of this sum yields

\sum_i( -x_i/(ax_i) +y_i/(a^2 x_i)) =0 which is equivalent to a= 1/n \sum (y_i/x_i). In the same setting the LSE is a= (\sum y_i x_i)/(\sum x_i^2). The two estimates are equal only in the case that x_1=...=x_N.

3. I still doubt whether the claim would be right that only for Gaussian distributions of the errors in y_i the MLE estimator for regression parameters is always the LS estimator. Writing down the eqations I didn't see a possible proof, but didn't see a counter-example either.

Krishna Kumar Kottakki

Here, I would like to add some flavor to the above responses. Most of existing estimation algorithms, ex: MLE, do assume that data can be fitted to a Gaussian distribution. This will give Best Linear Unbiased Estimates (BLUE), provided that the transformation is linear. Note that this never take into the effect of possible nonlinear transformations, where the MLE results in an "uncertainty" in the estimated parameter, when compared with the same parameter estimated with previous set of data. Note that user have to define the limits of uncertainty, so that the new parameter set must belong to another source, may be Gaussian or a true non-Gaussian (parent) parametric set. However, the problem can be easier if the user has apriori knowledge on the distribution of parameter(s) (Ex: Gamma density function).

Vladimir Kaftan

It is rather difficult to argue with some competent persons who "MINIMISIS the TOTAL SUM of RESIDUALS". But what about the least squares?

Alex Gilgur

Fausto, you don't need to shout. We can hear you loud and clear.

Now, the reason why sum of squares of residuals was chosen as the criterion of fit for least squares regression is because the residuals was assumed to be normally distributed. For other regressions (e.g., quantile regression - see the 1985 Koenker's paper), other criteria are used (for quantile regression, it is the sum of the absolute values, and the method of minimization is the simplex). The reason for such choice is the non-normality of the residuals.

Demetris Christopoulos

@Jochen look at the attachment: The MLE is not necessarily connected with least squares, it is a more general method that can work even without calculus.

Andrzej Latocha

@ Fausto Galetto "As a matter of fact you can find the “NORMAL EQUATIONS”[!!!!!!!!!!!!!!!!!] WITHOUT assuming the ------NORMAL DISTRIBUTION OF RESIDUALS"

It is true it is sufficient to assume that the distribution is symmetric. But then the mistake LSE is more susceptible to start and end conditions.

Vladimir Kaftan

Dear colleagues of common competence. There is the interesting brief description of the least square technique : http://www.desy.de/~blobel/blobel_leastsq.pdf. The important sentences about free distribution least square properties have done at the page 12. It sounds as: "Valid under the condition that the data are unbiased!" I think that if we will process measurements having not normally distributed errors, we can receive biased estimations having not minimal variances. We will not receive an accurate estimation after a least square adjustment if measurement errors will have large blunders, systematic biases and/or consists of different entire assembles.

Alex Gilgur

ANOVA is based on the F-test, which like (most) all great things coming from Fisher requires normal distribution...

Andrzej Latocha

@ Vladimir, Cannot multiply and divide only theoretically to infinity, without checking example such a theory becomes a divination.

Martin Treiber

Firstly, I like this thread since it is definitely a good idea to question the assumptions behind commonly used methods instead of just hitting the buttons of statistical software (although publishing false statements in UPPERCASE WITH EXCLAMATION MARKS!!! is not very scientific). In the following, I try to summarize the various aspects of least-squared errors (LSE) estimation, particularly its simplest variant, the ordinary least squares (OLS) estimator, and its connection to MLE (maximum-likelihood estimation).

(1) All the following refers to models with a single real-valued endogenous variable (outcome), y, which is the typical application for LSE. Models with discrete endogenous variables such as discrete-choice models can only be estimated using MLE, not LSE/OLS.

(2) Krishna Kumar Kottakki mentioned the BLUE property of the OLS estimator stated by the Gauss-Markow Theorem. It is crucial to fully understand this theorem since it answers nearly all of the discussed questions. In full technical detail, it says:

"If the model fulfills five properties to be specified below, OLS is the best linear unbiased estimator (BLUE)"

The five properties are:

(i) the model is parameter-linear, i.e., it is linear in the model parameter vector beta (the character ' denotes transposition),

y=beta'x

It need *not* to be linear in the exogenous variables (the influencing variables) since the input vector x may be a nonlinear (but parameter free) function of these.

This excludes, e.g., discrete-choice models since a discrete outcome (how often a certain alternative is chosen), by definition, can never be linear in any sense.

(ia) the data used to estimate the model are correctly specified, i.e., more data sets than parameters, and no multicollinearity in the data matrix X of the input .

(ii) it has additive residuals (random contributions, errors), not, for example, multiplicative ones which would be the case when considering typical growth processes in nature or economics (defining the endogenous variable in logarithmic form would help here)

(iii) the errors are unbiased, i.e. their expectation is zero

(iv) the errors are uncorrelated. They need not necessarily be independent (independent variables have a correlation of zero but not vice versa)

(v) the errors are homoscedastic, i.e., all have the same variance.

What is *not* required is that the errors are Gaussian (in spite of the name "Gauss Markow Theorem), or be i.i.d (independent and identically distributed). For example the errors of one data point may be uniformly distributed, that of the next one shifted-exponentially distributed (yes, unsymmetric distributions are allowed!), the third one triangularly distributed, and the fourth one Gaussian, as long as they have no correlations and the same variance. (I confess, this is a constructed and rather academic example but I hope it makes the point).

Try to understand the derivation of the OLS estimator (consult any standard econometric book) and you will see that all these, but no more, assumptions are needed!

Is BLUE the "best" estimator? Not necessarily, for two reasons:

Firstly, BLUE says, "the best *linear* estimator". Linear means, linear in the endogenous data vector y which is obviously true for OLS when looking at the OLS formula for beta, beta(OLS)=(X'X)^{-1}X'y. There may be nonlinear estimators that are better (the best of which is called the "efficient" estimator).

Secondly, "better" means that the *variance* of the estimation uncertainty *of the parameters* is lower. If, for some reason, you want to minimize their MAD (mean absolute deviation), or another error measure, OLS is no longer the "best" among the linear estimators. Notice that the reasoning to use MAD or another more robust uncertainty measure is not necessarily connected with outliers in the *data* since it refers to uncertainties of the *parameters*, not the data: For these, see the next two points.

(3) Outliers violate, by definition, Requirement (v) of the Gauss-Markow theorem and, probably, also Requirement (iii). Therefore, in the presence of outliers, it cannot be proven that OLS is the "best" linear estimator, whatever the measure.

(4) There are "weighted" LSE estimators generalizing OLS that correct for heteroscedastic errors (weighting data points with a low error more than that with a high error). However, unbiased errors are *absolutely* required regardless of which estimator is applied. Otherwise, you will have the situation "junk in - junk out". Fortunately, OLS (and probably weighted LSE, I have to check) automatically takes care of bias in the errors but this is not true if other estimators are used and/or if the assumptions of the Gauss-Markow theorem are violated.

(5) If, and only if, the model is parameter linear and the errors are i.i.d. Gaussian, the maximum-likelihood estimator (MLE) is identical to the OLS estimator. For heteroscedastic uncorrelated Gaussians, MLE is identical to the weighted LSE estimator. And yes, MLE can be applied for *any* continuous or discrete error distribution with *any* multivariate correlations, not just Gaussians.

Vladimir Kaftan

Dear Andrzej, my assumptions are not theories. The practical experience shows that. It is easy to generate a series with defined normal distribution, compute the mean and RMS values, then compare it with theoretical ones. After that add to the modelled series just one large blunder and compare the results again. You will see the difference.

Andrzej Latocha

Dear Vladimir, you're right. But I would be careful putting a clear conclusion whether it is a rule or an exception?

Jochen Wilhelm

Martin, very instructive and understandable answer! Thank you!

One question again to the very last paragraph "(5)":

The MLE (point estimate) is the same as the LSE (point estimate) also for many other distributions. Doesn't this contradict your statement?

Numerical example in R:

# draw some random sample of size 20 from a gamma-distribution

y = rgamma(20,0.3,2)

# LSE as the average

mean(y)

[1] 0.2673607

# LSE as coefficient in a linear model

lm(y~1)$coef

(Intercept)

0.2673607

# MLE from a GLM (default is inverse link function):

1/glm(y~1,family=Gamma)$coef

(Intercept)

0.2673607

# forced indentity link:

glm(y~1,family=Gamma(link="identity"))$coef

(Intercept)

0.2673607

#Searching the MLE "by hand":

#nLL stands for "negative log likelihood"

nLL.gamma = function(param) -sum(dgamma(y,param[1],param[2],log=TRUE))

# getting the minimum of nLL = maximum of LL

MLE.gamma = optim(c(0.3,2),nLL.gamma)$par

# the expected value is shape/rate:

MLE.gamma[1]/MLE.gamma[2]

[1] 0.2674057

#One can use the normal-likelihood instead:

nLL.normal=function(param) -sum(dnorm(y,param[1],param[2],log=TRUE))

> MLE.normal = optim(c(0.3,1),nLL.normal)$par

> MLE.normal

[1] 0.2673980 0.3492421

# first parameter is the mean,

# second is the (biased) MLE of the standard deviation, what is

sd(y)*sqrt(19/20)

[1] 0.3492427

# here a plot of the log-likelihood functions through the maximum (both normalized to 0):

xp = seq(0.1,0.6,len=50);

yp=sapply(xp, function(M) nLL.gamma(c(M,MLE.gamma[2])));

yp=min(yp)-yp;

plot(xp,yp,type="l")

yp=sapply(xp, function(M) nLL.normal(c(M,MLE.normal[2])));

yp=min(yp)-yp

lines(xp,yp,col=2)

Diana Mabel Kelmansky

If the parameter of interest is a population mean, the law of large numbers guarantees the consistency of the sample mean under very general conditions.

Diana Mabel Kelmansky

However beware of outliers. For small sample sizes, even one outlying observation can give a meaningless sample mean. Similar considerations are valid for skewed distributions.

Vladimir Kaftan

Dear Andrzej, what do you mean saying "it is a rule or an exception"? Please define it more exactly if possible.

Martin Treiber

Hi Jochen,

I must confess that the "if and only if" maybe is a too strong statement. However, it seems that your investigations are not directly related to estimating parameters of linear regression models. There, one is mainly interested at the parameters (parameter vector) beta of the deterministic part of the parameter-linear model

y(x)=beta'x + epsilon

while one is less interested in the parameters of the distribution of the errors, epsilon, themselves (one needs them, of course, for estimating the uncertainties of the estimates for beta). Unfortunately, I forgot epsilon and wrote y=beta'x in my previous post, hence maybe the confusion.

As I understand (I do not know R very well), you generated instances of data for the (somewhat trivial) "zero model"

y(x)=0+epsilon,

where epsilon obeys a gamma distribution (notice that there are no exogenous variables x in this model). Since this model does not satisfy the Gauss Markow assumptions (the gamma function is biased, its expectation is nonzero), the OLS, instead, estimates automatically the model with a shifted gamma distribution such that its expectation is zero, and shifts the bias to the intercept (see my first post). Hence, when estimating the model y(x)=beta0+epsilon, OLS systematically estimates a nonzero value for the intercept beta0 which is identical to the MLE estimation of the mean of the gamma function itself. Thus, the agreement of the OLS and MLE estimators simply means that the MLE estimator for the expectation value of the gamma function is unbiased. I doubt, however, that this carries over even to slightly less trivial models such as the simple regression model

y(x)=beta'x+epsilon=beta0*1+beta1*x1+epsilon,

At least, my analytical derivation of the identity of the OLS and MLE estimators for the beta vector for i.i.d Gaussian errors (see attachment which is extracted from my German lecture notes) does not carry over to any other distribution. So, I think, the strong statement "if and only if" is not yet refuted ;-)

Alex Gilgur

Fausto, you said that "The ANOVA of those data DOES NOT use the NORMAL distribution AND DOES NOT REQUIRE the NORMAL distribution to TEST the difference within the products!!!!!!!!!!!!!!!"

Of course. There are nonparametric methods for ANOVA, and there are nonparametric methods for regression. If I understood you behind all the shouting, you are teaching nonparametric methods to your students. I'm very happy for your students, because they will come out of school prepared for the real world where few things are truly normal.

Vladimir Kaftan

The Jochen's question was: "Least-squares only for normal distributed errors...?" In my opinion the ERROR is a key word. Certainly different data sets can have different statistical distribution. And everybody is allowed to pass any data samples through the least square method. But if we wish to receive the most probable value of a measurement (not any other types of data!) it is better to have normally distributed errors of it. In most cases people is able to correct his measurements before and after a statistical treatment. Sorry for persistence.

Alex Gilgur

Diana, the only that can guarantee normal distribution is the Central Limit Theorem. However, in order to be able to apply it, you will have to ensure that the samples are random, that there are a lot of them, and that you are building the regression based on the means of the samples.

Andrzej Latocha

Dear Vladimir "normal distributed errors" -it is an exception, exactly there is no need to correct. This is only a theoretical assumption which is not in reality and should take into account the correction.

Vladimir Kaftan

Dear Andrzej, I am trying to understand the deep meaning of your declaration. If you like to say that a theory and a practice are different concepts, you are right absolutely. But "theoretical assumptions" are checking by a practice permanently. And good theory closely agrees with a practice. The measurement error theory exists and works properly more than two hundred years. It classified errors for three types: random, systematic and gross.

The first of them has to be normally distributed by definition. Yes it is theoretical assumption but if you have a practise of a measurement you will often see that this assumption is correct. Certainly it is not a strong and rigorous law. But it helps to an observer for receiving the more accurate measurement result. If you are the best shooter you can check a distribution of a shooting results and see that it distributes normally but a bad shooter will not see the same. Your opinion is correct in this sense. There are many information about the measurement error theory. That is not bad: http://earth-info.nga.mil/GandG/publications/tr8400_1.pdf Any type of data samples have not follows by normal distribution if you mean exactly that. The data value consists of unknown characteristic of a process + error.

Andrzej Latocha

Dear Vladimir, You are too generalization and walking away from the topic of discussion.

Least-squares without correction is not useable for measurement data when even resemble a normal distribution.

Vladimir Kaftan

Dear Andrzej, I do not understand again what is the "least squares without corrections" and why "it is not usable". If the link what I have send for you "walking away from the topic of the discussion" I do not know what I have to say. The least square technique is the base of many modern data treatment algorithms. People use it successfully for hundreds years. Please do not worry and do not use it if you do not like it. I see that you know something not known for me. Congratulations.

Andrzej Latocha

Precision measurements of acceptable errors and approximations changed recently in relation to what people have accepted for centuries., ..., many hundreds years.

jean-louis Foulley

What you have to assume depends on what you want to do.

For instance, unbiasedness of LS estimators in linear models requires very few assumptions while hypothesis testing on parameters lies on normality assumptions of residuals. Minimum variance in the class of linear extimators requires to use generalized least squares (GLS). If you want to apply LS, just check for homoscedasticity.

So why you don't look at numerical and graphical diagnostics of residuals? Most software have procedures to do it easily.

Jochen Wilhelm

Thanks, Jean-Luis. I know how to do residual diagnostics. But this does not answer my question.

Jochen Wilhelm

Brian, I can't recap your solution for the regression. If we say that

f(y|x,b) is proportional to e^(p*sum(y-x*b)²),

how do you get the ML estimate for b?

Andrzej Latocha

For a normal distribution LSE=MLE from definition and there is nothing to be consider, for perturbed normal distribution LSE errors add up exponentially as the large number of samples (from definition to use LSE) makes the method of divergent.

For perturbed normal distribution best concept who I met the is a "Parzen density estimator" F(x)=integral(from -infinity to x) f(x')dx' based on kernel density estimation http://en.wikipedia.org/wiki/Kernel_density_estimation.

Existence and limitations of solutions to this approach is shown in the publication:

ELLIOTT ROBERT.J.,KRISHNAMURTHY VIKRAM.:

EXACT FINITE-DIMENSIONAL FILTERS FOR MAXIMUM LIKELIHOOD PA- RAMETER ESTIMATION OF CONTINUOUS-TIME LINEAR GAUS-

SIAN SYSTEMS. SIAM J. CONTROL OPTIM. Society for Industrial and Applied Mathematics 1997. November Vol. 35, No. 6, pp. 1908-1923.

In my opinion Parzen density estimator, kernel density estimation makes sense for the distribution in which samples are a function of noise distribution around the unknown curve.

If we have only random samples distribution F (x), MLE = EX expected value: "http://en.wikipedia.org/wiki/Expected_value", for such a case use the concept of kernel density estimation complicating the calculation is unjustified.

For Parzen density estimator and Kernel density estimation I have my doubts whether density function should be shift back http://en.wikipedia.org/wiki/Kernel_density_estimation ?

Density estimator collects and dissipative potential in studies: "Parzen density estimator" and "EXACT FINITE-DIMENSIONAL FILTERS FOR MAXIMUM LIKELIHOOD PA- RAMETER ESTIMATION OF CONTINUOUS-TIME LINEAR GAUS-SIAN SYSTEMS."

You can see the problems with defining the algorithm for constructing such a function.

In my opinion, better to use ready-functional φ(t) = E[eitX] http://en.wikipedia.org/wiki/Kernel_density_estimation which has as properties collects and dissipative potential whose coefficients can be calculated by the LSE ?

Michael Paul Cohen

Historically, the champion of least squares estimation was Laplace (Gauss delayed publication of his results). Poisson supported Laplace. Then Cauchy came along and argued that least squares is special and only suited for normally distributed data. He noted that for data with a Cauchy distribution, least absolute values is better. Where he erred, as Poisson pointed out, is that many random quantities are approximately normal by central limit type arguments.

By the way, it was Poisson who first developed the properties of the Cauchy distribution.

Badges
Science topic

Similar topics
Mathematics
Statistics

More Jochen Wilhelm's questions See All

A-Flag antibody works on WB and for IP but cannot be used for blotting FLAG after IP ?

Hello everyone, I'm quitte at a loss here. I'm using the DYKDDDDK Tag Monoclonal Antibody (FG4R) (Catalog # MA1-91878) from thermofisher and while it works beautifully when I blot against Flag or...

13 March 2024 6,342 5 View

I would like to be added to the Research mailing list again. My email address has changed. The old address is no longer valid. What do I have to do?

10 May 2023 9,253 0 View

Standard curves and qPCR ?

Hello everybody, I'm confused regarding how frequent a standard curve should be done to assess primers efficiency. My last RT-qPCR were 10years ago so bear with me ! It was my understanding that...

22 June 2022 10,023 3 View

Ligation doesn't work ?

Hi everybody, I tried five time to clone gRNA guides and after trying several conditions nothing works. 2 guides sequences are ordered on IDT and carrying to "sticky ends" as to insert them in a...

27 April 2022 1,347 4 View

Best T4 ligase temperature and incubation time ?

Hi everyone, I read a lot about the incubation time and temperature of T4 DNA ligases. I'm on a very short time schedule and I was wondering wether or not a 20m at 4°C followed by a 2-3h at 16°C...

25 April 2022 7,984 3 View

Question about running a 3-way mixed ANOVA on SPSS?

So I have a data set where I have 5 repeated measures of a variable, but also need to check for interactions with gender (categorical), and a continuous variable (let's say minutes of something)....

31 October 2021 7,954 3 View

Is there a (dis-)advantage in dicotomizing Likert data?

Given there is some k-step (k=3 or 5 or 7 etc) Likert data with "absolutely no" on one side, "absolutely yes" on the other side, and "neutral" in the middle. A correct way to analyze such...

07 August 2020 1,343 5 View

How to internally link (or cross-ref) ioslides?

Using RStudio and rMarkdown, I have an ioslides presentation (an html file). On one of the slides I want to have a link (cross-ref) to another slide in the same html. I did not find any useful...

28 May 2020 3,008 0 View

Experiences with Optimization of discontinuous functions and smoothing before running a optimization algorithm?

For a industrial application (layout-planning) I am currently trying to globally optimize a discontinuous function f. The objective function is defined on some bounded parameter space in R^N where...

26 February 2020 6,679 5 View

Is it correct to call lo(w)ess "non-parametric"?

It's a weighted least-squares polynomial regression, so it's based on assuming normal errors, and the normal probability model is parametric. However, in some statistics book and online statistics...

28 October 2018 3,219 6 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View