Non-Parametric ANOVAs

Hi, Hannah. One way to prepare your data for an ANOVA is to transform them to make them more normally distributed and improve their variances. Depending on the type of data, different transformations are recommended. For instance, for integer values, a square root transform may be appropriate. See for instance:

http://www.biostathandbook.com/transformation.html

Jochen Wilhelm

If the differences in the variances are not drastic you may safely ignore this.

If the variances are really very different than the most important question is: why is it so? It might indicate some deeper problem, for instance that the model is not correct or that you miss some relevant predictor.

If it happens that the variances are drastically different and you can not (find the reason and) resolve the problem, you can only use what you have and what you know. But your interpretations should be careful, keeping in mind that something else might be wrong (in the model) or missing (like a relevant predictor). When the large variance is in the large groups, the model will over-estimate the variance in the small groups. This leads to more conservative answers (too high p-values, too wide confidence intervals - what might not be bad). The other way around will give you too liberal answers (too low p-values, to tight CIs - this should be taken with care). A possible remedy is to use a weighted model with weights proportional to the inverse of the variance ("high-variance groups" get lower weights).

Peter Samuels

Dear Hannah,

Provided that the other ANOVA assumptions (or their robust exceptions) are satisfied the recommended course of action is to use the Brown-Forsythe or Welch test. Generally the Welch test is slightly better unless there is one group with an extreme mean and a large variance (see Field, 2013: 443).

Reference

Field, A. (2013) Discovering Statistics using SPSS: (And sex and drugs and rock 'n' roll). 4th edn. London: SAGE.

Hannah Belcher

Hi,

Thank you everyone. The groups were derived from survey data, so those scoring on a test a certain number were allocated one of three groups. This meant we couldn't control for there being equal sample sizes in each of these groups. The Levene's test is significant unfortunately so I think we have to assume there isn't equal variance between the groups. I read somewhere that I would need at least 20 to do a Welch test, is this correct?

Jochen Wilhelm

Nope, you misinterpret the test results (Levene).

You say this test was "significant". How did you come to this conclusion? Becasue the p-values was below 0.05? This would ignore the sample size and can be very misleading. It might NOT be a "significant" result in your case! Judjing if something is "significant" requires (much) more than just looking at the p-value.

What you need is a judgment if the differences in the variances are large enough to be relevant for your subsequent analysis. In real life, variances (as everything else) will always be different to some extent. The assumption of "equal variances" is an abstraction required by an abstract mathematical model under which this model was derived and is then mathematically/logically correct (or exact). Real data does never exactly fullfill such assumptions - it should only be resonably close to the assumptions. Having large samples (like 2000+) will enable a test to "see" any kind of tiny deviation from the tested hypothesis, and this can be even due to a tiny misspecification of the model on which this test works. Therefore you will likely get very small p-values, but they are difficult to interpret: it could be that the test is not ideal (or the data is not ideal for the test), it could also be that there is a tiny and irrelevant deviation from the null hypothesis. Only having a small p-value is not enough - it is not even helpful to answer the relevant question.

You are better looking at diagnostic residual plots and let your eye decide. If you see that the residuals in one group scatter in a range that is 5 to 10 times as wide as for another group - then you will have problems in the ANOVA. If the difference is less pronounced I would not worry much and assume homogeneous variances.

Cyril Iaconelli

Hi Hannah,

I agree with Jochen, you must understand why the difference in variance is high.

Maybe you'll see that a data transformation (for example X->log(X)) help to get variance homogeneity and make sense.

**It is often the case in microbiology, we use log transformation to analyse different loss in microorganism population... **

Regards

How do I interpret the PEBL test output files?

How do I fill in missing data for a ranked item?

Assumptions of a MANOVA failed, are there alternatives?

Is there a difference between a Welch's T Test and a Welch's ANOVA?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Why does my protein refolded to beta sheet during thermal denaturation analysis?