In statistical inference our aim is to study the population characteristic that is the parameter. But if it is said that 'T' is an estimator of the function 'Ѱ(ϴ)' (which is of interest) then what is the meaning of it.
You have ϴ, a parameter of the population, and Ѱ(.), that is a function of that parameter, then T is an estimator derived over a random sample of Ѱ(ϴ). If Ѱ(.) is the identity function, then Ѱ(ϴ)=ϴ and T is an estimator of ϴ.
For example:
Let Y1, Y2, ....Yn be n independent Bernoulli(P) random variables which value 0 or 1.
And let X=Sum(Y1, Y2, ....Yn) the total of the random sample, then it may be shown that X has a Binomial distribution with ϴ=(n, P) where n is the number of independent bernoulli trials and P is the probability of success ( then Pr(Yi=1)=P ) of each Bernoulli.
In this example ϴ is a vector of parameters. Let µ be the mean of X, because E(X)=µ, but µ is a parametric function of ϴ=(n, P) because in the Binomial Distribution E(X)=µ= n P
If we want estimate µ we can obtain a MLE estimator using T= sample mean (of Y1, Y2, ....Yn), then we have an estimator of Ѱ(ϴ)=µ = n P.
To answer roughly, T is a random quantity that is based on the data and estimates the parameter of interest. It can be a mathematical expression available in closed form or can be a solution of an estimating equation. The sampling distribution of T is of major interest in statistics. It is desirable to have the parameter as the "center" of the sampling distribution of T. The estimator T is then said to be unbiased [the expected value of T equals the parameter]. Many estimators for the same parameter are possible. If all of them are unbiased, you want to choose the one with smallest variance; or one with the smallest mean squared error if the bias is small and variance large. If you are a statistics major, you may want to read one or more of the following books written by
You have ϴ, a parameter of the population, and Ѱ(.), that is a function of that parameter, then T is an estimator derived over a random sample of Ѱ(ϴ). If Ѱ(.) is the identity function, then Ѱ(ϴ)=ϴ and T is an estimator of ϴ.
For example:
Let Y1, Y2, ....Yn be n independent Bernoulli(P) random variables which value 0 or 1.
And let X=Sum(Y1, Y2, ....Yn) the total of the random sample, then it may be shown that X has a Binomial distribution with ϴ=(n, P) where n is the number of independent bernoulli trials and P is the probability of success ( then Pr(Yi=1)=P ) of each Bernoulli.
In this example ϴ is a vector of parameters. Let µ be the mean of X, because E(X)=µ, but µ is a parametric function of ϴ=(n, P) because in the Binomial Distribution E(X)=µ= n P
If we want estimate µ we can obtain a MLE estimator using T= sample mean (of Y1, Y2, ....Yn), then we have an estimator of Ѱ(ϴ)=µ = n P.
Definition of parameter, parametric, and parameterize varies among disciplines. It is easy to confuse the meaning if one has learned the definition in one sense and the current discussion uses a different sense.
In data analysis, we have two entities, the data and the distribution chosen to represent the data. The data are real, actual measurements (or simulated measurements). The distribution is the best fit or best representation of the data. We use a distribution for ease of calculation. A distribution is said to parameterize the data. The descriptors of the distribution are the parameters of the distribution. The mean, standard deviation, etc. are parameters of the distribution. The data are real, the parameters are not.
A parameter is a computed value (such as arithmetic mean) taken from the population and descriptive of the population. A statistic is a computed value (such as arithmetic mean) taken from the sample and descriptive of the sample. When cost, danger, infeasibility and other reasons will not make it possible for the researcher to get the data from the population so he can describe the whole population, he can resort to sample data to estimate the parameter, so that he can describe the whole population. This procedure of testing sample data if they can describe the whole population (or can serve as parameter) is called statistical inference.
In a problem where you have a batch of data Xn, and you assume that they follow some probability law, you have to estimate this probability law. And this raise two issues : what kind of probability law (gaussian, binomial, laplace,...) ? What specidic law inside one of these families ? Each law in these knonw family is defined by parameters, and so, once a family has been chosen (that is a specification for the probability law), the problem summed up to find a relation giving these parameters as a function of the available data; such a function is a stattistic.
By definition the parameters are linked to the main characteristics of the distribution of random variables following the probability law, such as mean, variance,...So the methods to estimate the paramters are closely linked with the computation of estimates of these quantities (which are also random variables...).
An important part of the job is thus to choose a specification. There are many theorems which give a specification, under some conditions (ex : the sum of independant random variables). In some cases one can find an estimation of the probability law without an a priori specification (this a non paramettric anlysis), many of them are based on the range of the data. They provide elegant and simple solutions.