Is this pattern heteroscedasticity ?

Agreed Manfred. - I cannot really make out the labeling on the graphs, Ann. If you could attach the scatterplots so they could be seen better, that would be helpful.

Heteroscedasticity can be studied well graphically, as well as studying other characteristics. (See attached for more extended graphics.)

Conference Paper Alternative to the Iterated Reweighted Least Squares Method ...

James R Knaub

Ann -

Heteroscedasticity is often modeled with the use of a "coefficient of heteroscedasticity," using a 'gamma.' It is part of an exponent to a size measure, x. In the multiple linear regression you mentioned, a size measure would be used that could either be the dominate regressor, or I think better, a linear combination of regressors, best, I think, if it is a preliminary estimate of y, I.e., a predicted y, but not y itself. That size measure raised to the negative 2 gamma (or negative 1 gamma, depending upon format) is the regression weight, such that I know that in SAS PROC REG, if you set that weight as w=1/x, you get the classical ratio estimator for one regressor and a zero intercept.

A good reference:

Särndal, C.-E., Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlag.

So the question is, for SPSS, don't you have to set up a size measure for your multiple regression as well, or is it automated somehow? What did you do?

jim

PS - I suppose you could have several error terms (estimated residuals), each with their own factor for accounting for heteroscedasticity with regard to each regressor.

Ann Van den Broeck

I havent learned anything about gamma's at uni, so all I know is how to do this in SPSS, which is automated. You have to simply plot the residuals and then it gives you a chart. Normally it indeed had to be going wider or more narrow for heteroscedasticity. What I did was Take Standarized and Studentized residuals and plot them with each other (X and Y axis).

James R Knaub

I like to look at estimated residuals without first doing anything to them. Anyway, it occurs to me that SPSS may ignore the natural heteroscedasticity in data and that is why many transform. (A good book of interest is Carroll and Ruppert's Transformation and Weighting in Regression (1988).)

i think it best to attach a more legible set of graphs for all to view.

Ann Van den Broeck

Do you mean the partial regression plots? Of all the independent variables seperatly against the dependent variable?

James R Knaub

That sounds like it might be interesting.

Ann Van den Broeck

Ok, ill send it when im home. At uni right now and somehow I can't upload any images.

Ann Van den Broeck

I havent seen all forms of regression in class. Only multiple regression, binary and multinomial logistic regression.

My dependent variabele is political participation (scale from 0-12). I also measure the seperate forms with binary logistic regression.

Matthew Huxter

OK... I am going to have to look that word up! Could get me a lot of points in Scrabble.

Timothy A Ebert

It looks similar to plots I have seen where bands are produced as a result of including categorical variables in the model. There are 13 bands, so there are 13 categories (maybe 12 treatments and a control).

My uncertainty in this answer is focused on the observation that there is no variability in each of the 13 bands in the figure. There are not that many things that I know of that always result in a perfect model fit.

Check for errors. Start from the raw data and redo the analysis from the beginning. I once was exploring my data and started using ratios of different variables. I was excited to find a model with an R2 of 1. I was less excited when I discovered that after simplifying the equations I had proven that 1=1.

I would say that this is an example of heteroscedasticity because the residuals change systematically with a change in "treatment." Manfred gives one example of heteroscedasticity, but it is a term for a more general condition where error variance changes for select groups of individuals.

Wei-Sheng Zeng

Here is an example of heteroscedasticity. The residual error or variance increase with the growing diameter.

Ann Van den Broeck

Yes I got lots of categorical variables Timothy. But how do I check for errors? I have redone the analysis a couple times now...

Im only familiar with SPSS tho, I dont do these things manually

Timothy A Ebert

Probably the only way is to start over. It always feels like a stupid busywork sort of task. Start with your raw data. Go through the SPSS user manual. reconstruct your model. In theory, you should get the same answer the second time round. However, as an error checking activity, you can't assume that it will. Every time you act on the impulse that "I know how to do this" you compromise the error checking part.

The sort of thing I am thinking of here is something like what happened last month. A person asked a regression question because they had an unusual result. They had a simple model one dependent variable and one independent variable. However, their estimated slope and intercept differed greatly from the published value. It took a while to convince them that they had switched the independent and dependent variables. Once they got that straightened out, everything was fine. Your problem if caused by an error will not be that easy.

Here is another (maybe simpler) option. Can you recode your categorical variables as continuous variables? If you do that, does the pattern in the residuals go away or change?

I am sure that you have already noted the distinct bands in Zeng's figure. I would bet that there are categorical variables in that analysis. There are probably 7 categories. Just because there is a banding pattern does not mean there are problems, but it is important to know why the patterns exist. This sort of banding is more what I expect to see. The bands are broad indicating that there is some variability. In your case there is no apparent variability, and I am bothered by that.

Another option is to identify the data that results in all the residuals from one band. Can you identify the conditions that resulted in that band?

If you change the scale at which the graph is plotted do you see variability within a line or would a regression of all the data for one line have an R2=1.0? It is the perception that R2=1.0 for each line that really bothers me.

So there are lots of categorical variables. There might be a problem there. Are some of them highly correlated? Can you reduce the complexity of your model, and does that make the pattern go away? If there were two or three highly correlated independent variables ..... Could try stepwise regression, or a multivariate method.

How can I measure political participation by group in SPSS?

Can correlation change a negative relationship to a positive one?

Calculating significance of interaction with log likelihood?

Is Ordinal regression best in this case?

What analyzing technique can be used?

Is it a problem to have multicollinearity with interactions?

Which control variables should I use?

How to learn more about SPSS and its Application?

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

How to report results of Generalised Linear Mixed Models in a journal article?

Request a single Lecture notes for math as detailed as this that I can find in one place?

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

Posthoc test lettering in JAMOVI?

How to back transform the results generated from analyses using log transformed with In(X+1) data?

SAS Generalized Linear Model for trial/event anaysis and not survival (time to event) analysis?