First, it is important to note that the linear form is optimal if the distribution of the parameter to be estimated is also Gaussian, besides the noise. This result can be obtained by the straightforward formulation of the general MMSE estimator. In its simplest form, if you have a scalar model
y = hx+n
where x is normally-distributed parameter to be estimated, h a real constant and n the noise, also normally distributed, the following probability density functions can be defined:
p(x) = 1/sqrt(2pi)exp(-x2/2)
p(y|x)=1/sqrt(2pi)exp(-(y-hx)2/2)
assuming mean 0 and variance 1 for x and n.
The MMSE estimator can be obtained by computing the expectation of p(x|y), which by the Bayes rule can take the form
p(x|y) = p(y|x)p(x)/p(y)
so that
E{x|y} = int x p(x|y)dx = int x p(y|x)p(x) dx / p(y).
Substituting p(x) and p(y|x) in this expression, the exponential inside the integral can be transformed in a form of Gaussian distribution
E{x|y} = 1/sqrt(2pi sigma2) int x exp(-(x-mu)2/(2sigma2)) dx = mu
with mu = hy/(h2+1), which is the form of the LMMSE estimator for this model. and sigma2 = 1/(h2+1), which measures its MSE.
Then MMSE=E{(y-y`)^2}=0; where y` is the estimate for y.
Thus, when the correlation coefficient (x,y) is equal to one, the random variables x &y are linearly related to each other. the closer magnitude of correlation coefficient is to one, the smaller the MSE. It is observed that correlation coefficient provides a measure of the linear predictability between random variables.
Your MMSE estimation, the FIR filter can be used. FIR filters are obey the linear phase condition h(n)=h(N-1-n). FIR filters are realized by means of linear combination of past inputs.After convolution only [h(n)*u(n-k)] we get the output. So surely the estimation is linear form. Please refer the following books