In order to estimate the parameters of regression line (Bo, B1) for example, do we minimize sum square of residuals or sum square of errors and why?

Delshad -

All we have to work with is a sample. See bottom of page 1 and mostly bottom of page 2 in the paper at the following URL for the method of derivation:

https://www.researchgate.net/publication/263036348_Properties_of_Weighted_Least_Squares_Regression_for_Cutoff_Sampling_in_Establishment_Surveys

and similarly in

https://www.researchgate.net/publication/261534907_WEIGHTED_MULTIPLE_REGRESSION_ESTIMATION_FOR_SURVEY_MODEL_SAMPLING

By the way, e is the "estimated residual."

To see about the accuracy, estimated variance of the prediction error is helpful, but for including bias from the model being for just a sample, not the population, you need something like cross validation. That helps to indicate the difference between a model for the sample and the unknown model for the population.

Galit Shmueli has an interesting reference to Hastie, et.al., top of page 6 in the following, which might interest you:

https://www.researchgate.net/publication/48178170_To_Explain_or_to_Predict

Cheers - Jim

Article To Explain or to Predict?

Conference Paper WEIGHTED MULTIPLE REGRESSION ESTIMATION FOR SURVEY MODEL SAMPLING

Article Properties of Weighted Least Squares Regression for Cutoff S...

James R Knaub

Delshad -

I made some notes to sort out your question, and review this myself.

Consider the following:

1) There is a "true" but unknown model for a given situation.

2) There is then a model format, or possible models, postulated, based on a sample and on knowledge of the subject matter. Coefficients are labeled with subscripted "beta" characters. (Famous quote by George Box notes models are 'wrong,' though some are 'useful,' is dependent upon the fact that it matters how well the model format you pick may or may not approach reality.) I think it is customary to use epsilon here instead of e.

3) There are estimates for these coefficients, and estimates for residuals, e, for a given model, based on the sample, using least squares or some other technique, and those are the estimates of the betas, each beta with a hat, or a star (Maddala's notation for WLS regression, for which OLS is a special, and often overused case), or we may use the letter "b," each with a subscript.

4) It sounds like you would use "u" in place of "epsilon," if you had an entire finite population to calculate betas. However, those betas are still for a postulated model, for a given finite population. Also it might be regarded that a superpopulation would generate the finite population as one possible result of a mechanism, so in that frame of thinking, you still don't have "true" betas, for your postulated model, which is not "true," either.

5) Amazingly, many models are very useful, and many of those are virtually "true."

6) Note that you will often see "u" or "e" used interchangeably from one model to another, or in the same model representing different levels, so notation can be confusing.

Cheers - Jim

Delshad Shaker Ismael Botani

Dear Jim

Of course in real life we can't have the collected data of population, then we should try with samples. Concerning epsilon, yes you are right and I attached what I meant. So, please have a look on my attachement and let me now about your thoughts. Thanks alot.

James R Knaub

You can't know betas, only estimates of betas, so the sample approach is the way to go. - Cheers.

PS - At least, that's the way I see it. And in survey sampling, dependence upon a regression model is the basis of the conditionality principle. That is, we are dependent upon the sample we have. (That makes cross validation a good idea to check for bias from a misspecified model (as well as sampling variance and nonsampling errors of all kinds).) - So, I'd say "linear regression for samples," which is your second Q derivative idea.

Delshad Shaker Ismael Botani

Dear Professor James,

I totally agree with you and of course we can't know betas and have formulas for betas, but some statisticians said: when taking partial derivative of Q according to betas and equalling them to zero (∂Q/(∂β0 )=-2∑〖yi-β0-β1 xi〗=0), we will have beta hats that are estimates of betas. please look at my previous attachement (first choice). Both of the choices leads to the same formulas of beta hat but the starting points are different. cheers.

James R Knaub

I did look at your attachment before answering, and that was.the best I could make of it.

Also, note that n came up in both derivations, not N the first time, and n the next. That seems to indictate the second derivation is the way to go. But I have to say that I am not very good with notation, but I still think you want the "sample" approach notation here.

Delshad Shaker Ismael Botani

This is a mistake from me, it was a copy paste mistake. Yes, the first choice must be N.

I am talking about theoretical proof of who to finding B-hat.

Could you please look at the following book (appendix 3 page 273):

http://www.docente.unicas.it/useruploads/001223/files/wesiberg_-_applied.linear.regression.(2005),.3ed.lotb.pdf

The author of the book found the value of beta hat depending on the first choice of my attachement (population formula).

Cheers

Subrata Chakraborty

Error Sums of Squares

James R Knaub

Isn't the whole book on the internet a copyright violation?

Anyway, one cannot sum over all N cases if you do not have all N observations. You can say that this is what would happen, but we approximate using a sample. Perhaps the author meant just that: "This is what would happen. Now we approximate with a sample."

(But as I noted earlier, by the way, even if you could use the entire population to determine betas, you still have a model, and a model won't be 100 percent correct.)

Sorry, but I'm not going to look at that book unless I know it was posted with the author's and publisher's permissions.

What is the rho-A formulation?

Non-significant model and very low R2 in Multivariate Linear Regression Analysis, whtat shold I do?

How to report seed to voxel functional connectivty results using conn toolbox?

Is there any stationary tests for dynamic time series models (ARIMAX) and how we can take differences for both variables y and x?

Multiband imaging causes head movement artefact?

Have you done a research about crowded problem in the emergency or pre-hospital?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Why does my protein refolded to beta sheet during thermal denaturation analysis?