Hello Everyone, I am trying to regress a numeric variable on all categorical variables, which method would be the most suitable, ANOVA,Linear Regrsn?

04 May 2020 6 7K Report

I tried running Linear Regression but I feel the model has heteroskedasticity.

Rubal -

I have not done that kind of regression before. However, it sounds like it would fit into the general format of y = y* + e, where y* is predicted y. I have a way to find regression weights for that by first using a preliminary predicted y, say the OLS predicted y values, in place of the weighted least squares (WLS) predicted y values, which will be close for the purpose at hand, to act as a size variable, to then estimate regression weights and then you can do a WLS regression. It isn't as complicated as that might sound. You just have to do an additional model run with the weights, as long as your software can do that. To know what to do, see https://www.researchgate.net/publication/333642828_Estimating_the_Coefficient_of_Heteroscedasticity. You can use https://www.researchgate.net/publication/333659087_Tool_for_estimating_coefficient_of_heteroscedasticityxlsx to do this with your data.

..............................

Below is something I have sent to people on this topic:

If you might be interested, the following is on the fundamental nature and magnitude of heteroscedasticity, for regressions of form y = y* + e, most useful in predictions for finite populations:

https://www.researchgate.net/project/OLS-Regression-Should-Not-Be-a-Default-for-WLS-Regression

Please see the project updates, and also particularly note

https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity

https://www.researchgate.net/publication/324706010_Nonessential_Heteroscedasticity

and

https://www.researchgate.net/publication/333642828_Estimating_the_Coefficient_of_Heteroscedasticity

with

https://www.researchgate.net/publication/333659087_Tool_for_estimating_coefficient_of_heteroscedasticityxlsx

["Heteroscedasticity for estimated residuals in regression is not a bug, it's a feature."]

..............................

As I noted, I have not worked with categorical data, but I can see no reason why this would not work here. The paper "Essential Heteroscedasticity," which explains the reasoning is easiest to understand when thinking about continuous data, but I don't see why this should not function for your application.

However, count data can be looked at differently, but I'm not really very familiar with that either.

I just know that I've worked with continuous data, but that my spreadsheet above would seem to work for any case where y = y* + e. If that fits your situation, you might want to use this. The paper "Estimating the Coefficient of Heteroscedasticity" gives some examples.

Best wishes - Jim

Daniel Wright

One way is simply adding terms to allow the variances to differ for the groups.

Chapter 7 of Wilcox ( https://www.sciencedirect.com/book/9780123869838/introduction-to-robust-estimation-and-hypothesis-testing ) includes some functions that use a different approach for higher order ANOVA (which you can frame as linear regressions, with categorical variables these are all linear), and he has some articles that also address this. First though, can you expand on why you feel the model should include heteroskedascity? Often transformations can be useful to address this.

Bruce Weaver

Hello Rubal Mistry. I was going to suggest something more straightforward: Estimate an OLS model that treats all of the explanatory variables as factor variables and include an option to get a robust estimate of the variance-covariance matrix. How to go about that depends on what software you have. In Stata, for example, it would be the -regress- command with i. prefixes on all of the explanatory variables (to treat them as factor variables), and with the vce() option to specify to specify which of the robust covariance options you want (robust, hc2, hc3, etc.).

regress y i.x1 i.x2 i.x3 {etc.} , vce(robust)

HTH.

Harold Chike

Rubal Misery, if your ultimate objective is to develop a model, ANOVA is inapplicable because it is for making comparisons.

Regression analysis is the appropriate statistics for determination of relationships. However, you do not do regression analysis with categorical variables in SPSS. You may wish to do log liiear regression using the R statistical package..

Daniel Wright

Following on from Bruce Weaver 's point, are you interested in differences among the variance terms, or is the lack of homoscedaticity just a concern because of the assumptions of ANOVA/OLS regression, or are you interested in differences in variances for their own sake?

James R Knaub

My spreadsheet is pretty easy to use if you've already done OLS regression, and your software accepts the regression weights you find from the coefficient of heteroscedasticity that you decide upon. Weighted least squares should be the norm as the heteroscedasticity is naturally in the error structure. The size measure is predicted y, and sigma for the estimated residuals should increase with larger predicted y. Please see https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity.

Is there any model/equation to predict the erosion of SS pipes due to flowing water?

Can any one tell me the indirect method to calculate SOD based on Quercetin?

What is the purpose of using stain and destain in SDS-PAGE gel?

Is anyone has access to SciFinder?

What sequence will we send to Eval-AD5754R to get a output voltage of 4 channels DACA,DACB,DACC,DACD?

Patterning of PEDOT:PSS utilizing orthogonal resist?

How should i fit my ITC data showing multi site binding?

Gas sensor response and recovery time independant of flow rate and chamber volume?

What are the limitations of ESP32 as a GateWay?

MQTT Publish ans Subscribe with Node-Red?

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

AUX gas reading problem on QE with full MS and PRM method in one run?

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

How to report results of Generalised Linear Mixed Models in a journal article?

People weight in Oaxaca Blinder Decomposition on R?

Repeated measures ANOVA, ANCOVA or Regression?

Request a single Lecture notes for math as detailed as this that I can find in one place?

How to increase protein lysate concentration?

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

Posthoc test lettering in JAMOVI?