I want to run multiple ANOVAs with a factorial design but my groups do not have equal variances and their sizes vary dramatically (one has 16 and one as 2000!). What is the best way to do this?
Hi, Hannah. One way to prepare your data for an ANOVA is to transform them to make them more normally distributed and improve their variances. Depending on the type of data, different transformations are recommended. For instance, for integer values, a square root transform may be appropriate. See for instance:
If the differences in the variances are not drastic you may safely ignore this.
If the variances are really very different than the most important question is: why is it so? It might indicate some deeper problem, for instance that the model is not correct or that you miss some relevant predictor.
If it happens that the variances are drastically different and you can not (find the reason and) resolve the problem, you can only use what you have and what you know. But your interpretations should be careful, keeping in mind that something else might be wrong (in the model) or missing (like a relevant predictor). When the large variance is in the large groups, the model will over-estimate the variance in the small groups. This leads to more conservative answers (too high p-values, too wide confidence intervals - what might not be bad). The other way around will give you too liberal answers (too low p-values, to tight CIs - this should be taken with care). A possible remedy is to use a weighted model with weights proportional to the inverse of the variance ("high-variance groups" get lower weights).
Provided that the other ANOVA assumptions (or their robust exceptions) are satisfied the recommended course of action is to use the Brown-Forsythe or Welch test. Generally the Welch test is slightly better unless there is one group with an extreme mean and a large variance (see Field, 2013: 443).
Reference
Field, A. (2013) Discovering Statistics using SPSS: (And sex and drugs and rock 'n' roll). 4th edn. London: SAGE.
Thank you everyone. The groups were derived from survey data, so those scoring on a test a certain number were allocated one of three groups. This meant we couldn't control for there being equal sample sizes in each of these groups. The Levene's test is significant unfortunately so I think we have to assume there isn't equal variance between the groups. I read somewhere that I would need at least 20 to do a Welch test, is this correct?
You say this test was "significant". How did you come to this conclusion? Becasue the p-values was below 0.05? This would ignore the sample size and can be very misleading. It might NOT be a "significant" result in your case! Judjing if something is "significant" requires (much) more than just looking at the p-value.
What you need is a judgment if the differences in the variances are large enough to be relevant for your subsequent analysis. In real life, variances (as everything else) will always be different to some extent. The assumption of "equal variances" is an abstraction required by an abstract mathematical model under which this model was derived and is then mathematically/logically correct (or exact). Real data does never exactly fullfill such assumptions - it should only be resonably close to the assumptions. Having large samples (like 2000+) will enable a test to "see" any kind of tiny deviation from the tested hypothesis, and this can be even due to a tiny misspecification of the model on which this test works. Therefore you will likely get very small p-values, but they are difficult to interpret: it could be that the test is not ideal (or the data is not ideal for the test), it could also be that there is a tiny and irrelevant deviation from the null hypothesis. Only having a small p-value is not enough - it is not even helpful to answer the relevant question.
You are better looking at diagnostic residual plots and let your eye decide. If you see that the residuals in one group scatter in a range that is 5 to 10 times as wide as for another group - then you will have problems in the ANOVA. If the difference is less pronounced I would not worry much and assume homogeneous variances.