If there is evidence of heteroskedasticity from plots , Should you plot the squared residuals vs explanatory variables?

Based on the linear regression model analysis, to verify the regression conditions, it is possible. so

How to detect heteroscedasticity and rectify it?

One of the important assumptions of linear regression is that there should be no heteroscedasticity of residuals. In simpler terms, this means that the variance of residuals should not increase with fitted values of the response variable. In this post, I am going to explain why it is important to check for heteroscedasticity, how to detect it in your model? If is present, how to make amends to rectify the problem, with example R codes. This process is sometimes referred to as residual analysis.

Why is it important to check for heteroscedasticity?

It is customary to check for heteroscedasticity of residuals once you build the linear regression model. The reason is, we want to check if the model thus built is unable to explain some pattern in the response variable Y,

that eventually shows up in the residuals. This would result in an inefficient and unstable regression model that could yield bizarre predictions later on.

How to detect heteroscedasticity?

I am going to illustrate this with an actual regression model based on the cars dataset, that comes built-in with R. Lets first build the model using the lm() function.

lmMod

James R Knaub

Francesca -

I think you have the right idea to consider residual analysis graphics. However, I would normally keep the error structure intact.

With regard to the various regressors you mention, based on my experience, and others, such as noted in Särndal, CE, Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlang, you can use any regressor or combination of regressors as a size measure in the regression weights, but the best one may be the same combination and format for regressors that predicts y. (This is for prediction more than explanation.) In practice, at the US Energy Information Administration, where I led a group of statisticians developing electric power survey data applications, we used a preliminary prediction of y for the size measure, as part of the official data estimation procedure.

Econometrics texts and others often treat heteroscedasticity as an anomaly that must be removed in order to do hypothesis tests. However, it is often a natural part of the error structure, and hypothesis testing in practice is at best often not very clearly interpretable, and at worst, in practice, may often be misleading. Standard errors are often all that is needed. The estimated variance of the prediction error is designed to estimate variance, but is also impacted by bias due to the way sigma is estimated, and can be very useful. Also, test data can be useful. You could research the terms "model selection" and "model validation."

There are times that heteroscedasticity can be a symptom of a problem. If data that should be modeled by two separate models are mixed, then the (basically compromise) regression will show heteroscedasticity.

[By the way, caution: not knowing the nature of your data, etc., some things here - above and below - may or may not be of various levels of appropriateness for your application.]

Heteroscedasticity may show in time series applications, but especially in finite population sampling with regressing through the origin, you should expect substantial, naturally occurring heteroscedasticity. After all, for example, does it make sense to expect 1,000,000 +/- 100, 1,000 +/- 100, and 5 +/- 100 in many applications?

There are various ways to consider heteroscedasticity, but I think it most straight forward, especially for prediction, to consider a regression weight based on a size measure and a coefficient of heteroscedasticity.

Attached are some files on understanding heteroscedasticity and how to consider it in applications, mostly from the point of view of establishment survey applications, where there is one regressor for a finite population with regression through the origin. However, consider this: Heteroscedasticity is on the predicted (i.e., dependent, y) variable, which (using GS Madala notation of * for a WLS estimate/prediction) is y*. So multiple regression can be written as y = y* + e = y* + (e_0)y*^(gamma), where e_0i is the estimated random factor of the ith estimated residual, and gamma is the coefficient of heteroscedasticity. Using that, you can apply analyses that apply to y = bx + (e_0)x^gamma when assessing heteroscedasticity and its impact on predictions.

Cheers - Jim

PS - Note that sometimes you may be looking for an outlier, and David's document is a way of addressing that, though in a simple example you may have found that quickly from graphical residual analyses.

https://www.researchgate.net/publication/263036348_Properties_of_Weighted_Least_Squares_Regression_for_Cutoff_Sampling_in_Establishment_Surveys

This just uses gamma=0.5, which may often be robust:

https://www.researchgate.net/publication/263265199_CRE_Prediction_%27Bounds%27_and_Graphs_Example_for_Section_4_of_Properties_of_WLS_article

https://www.researchgate.net/publication/313925533_Comparing_Model_Performances_Graphically

https://www.researchgate.net/publication/262972023_HETEROSCEDASTICITY_AND_HOMOSCEDASTICITY

https://www.researchgate.net/publication/263809034_Alternative_to_the_Iterated_Reweighted_Least_Squares_Method_-_Apparent_Heteroscedasticity_and_Linear_Regression_Model_Sampling

https://www.researchgate.net/publication/263032446_Weighting_in_Regression_for_Use_in_Survey_Methodology

https://www.researchgate.net/publication/261596397_Ken_Brewer_and_the_coefficient_of_heteroscedasticity_as_used_in_sample_survey_inference

Article Ken Brewer and the coefficient of heteroscedasticity as used...

Article HETEROSCEDASTICITY AND HOMOSCEDASTICITY

Article Weighting in Regression for Use in Survey Methodology

Article Properties of Weighted Least Squares Regression for Cutoff S...

Data CRE Prediction 'Bounds' and Graphs Example for Section 4 of ...

Conference Paper Alternative to the Iterated Reweighted Least Squares Method ...

Method Comparing Model Performances Graphically

I have two samples. I used theThe studentized Breusch-Pagan test, proposed by R. Koenker. Is this a specific test?

Heteroskedasticity. Which test should you choose?

How can I prepare virus for a TEM or SEM imaging?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?