Can the use of a parametric test (ANOVA) after a failed normality test be justified?

20 February 2023 5 7K Report

I am using Prism to carry out some two-way ANOVA tests. I have a series of fairly small datasets that I want to analyse the same way. But a few of the datasets do not pass the Shapiro-Wilk normality test. Reading the Prism Guide, it seems like I would be justified in still carrying out an ANOVA, seeing as the Q-Q plots looks like they do not deviate too far a normal distribution.

However I am struggling to find examples in the literature of this being done. I am wondering how I would justify this in a thesis/paper and if the following would be acceptable:

"Before undergoing an ANOVA, the each dataset was tested for Gaussian distribution using the Shapiro-Wilk normality test (ɑ = 0.05). Although not all datasets passed, majority of datasets in the series were normally distributed. Of those that failed, Q-Q was assessed and no major violations were detected. As statistical tests are generally robust to mild violations, and to maintain consistency across datasets, two-way ANOVA was carried out."

Although may manuals and websites state that ANOVA is robust there don't seem to be any peer reviewed references for two- or three-way ANOVA but I can find a couple of references for one-way and RM ANOVA (PMID 36695847 & 29048317). If someone could supply a reference to justify my use that would be great.

Excerpt from Prism Guide:(https://www.graphpad.com/guides/prism/latest/statistics/stat_interpreting_results_normality.htm)

What should I conclude if the P value from the normality test is low?

The null hypothesis is that the data are sampled from a Gaussian distribution. If the P value is small enough, you reject that null hypothesis and so accept the alternative hypothesis that the data are not sampled from a Gaussian population. The distribution could be close to Gaussian (with large data sets) or very far from it. The normality test tells you nothing about the alternative distributions.

If your P value is small enough to declare the deviations from the Gaussian idea to be "statistically significant", you then have four choices:

The data may come from another identifiable distribution. If so, you may be able to transform your values to create a Gaussian distribution. For example, if the data come from a lognormal distribution, transform all values to their logarithms.
The presence of one or a few outliers might be causing the normality test to fail. Run an outlier test. Consider excluding the outlier(s).
If the departure from normality is small, you may choose to do nothing. Statistical tests tend to be quite robust to mild violations of the Gaussian assumption.
Switch to nonparametric tests that don’t assume a Gaussian distribution. But the decision to use (or not use) nonparametric tests is a big decision. It should not be based on a single normality test and should not be automated.

Don't use this approach: First perform a normality test. If the P value is low, demonstrating that the data do not follow a Gaussian distribution, choose a nonparametric test. Otherwise choose a conventional test.

Prism does not use this approach, because the choice of parametric vs. nonparametric is more complicated than that.

Often, the analysis will be one of a series of experiments. Since you want to analyze all the experiments the same way, you cannot rely on the results from a single normality test.
Many biological variables follow lognormal distributions. If your data are sampled from a lognormal distribution, the best way to analyze the data is to first transform to logarithms and then analyze the logs. It would be a mistake to jump right to nonparametric tests, without considering transforming.
Other transforms can also be useful (reciprocal) depending on the distribution of the data.
Data can fail a normality test because of the presence of an outlier. Removing that outlier can restore normality.
The decision of whether to use a parametric or nonparametric test is most important with small data sets (since the power of nonparametric tests is so low). But with small data sets, normality tests have little power to detect non-gaussian distributions, so an automatic approach would give you false confidence.
With large data sets, normality tests can be too sensitive. A low P value from a normality test tells you that there is strong evidence that the data are not sampled from an ideal Gaussian distribution. But you already know that, as almost no scientifically relevant variables form an ideal Gaussian distribution. What you want to know is whether the distribution deviates enough from the Gaussian ideal to invalidate conventional statistical tests (that assume a Gaussian distribution). A normality test does not answer this question. With large data sets, trivial deviations from the idea can lead to a small P value.

The decision of when to use a parametric test and when to use a nonparametric test is a difficult one, requiring thinking and perspective. This decision should not be automated.

Sal Mangiafico

The quick answer: Look at the q-q plots for the residuals, or histograms of the residuals. In general, using hypothesis tests (for normality or homogeneity) to determine is a model is appropriate isn't very helpful.

I'm not sure I follow all of your post --- like how much is a quote from the Prism manual. But, to be clear, there is no assumption for anova that the distribution of the dependent variable is normal. Usually the assumption is about the errors of the model, which can approximated with the residuals from the model.

Ghada Omer Hamad Abd El-Raheem

You can use ANOVA with brown fosythe test

Jochen Wilhelm

Normality tets are NOT NOT NOT suited to decide on whether the normal assumption is reasonable or not. They answer a different question that is irrelevent to the task. They are not at all helpful here. Don't do tests on normality to justify using a model that assumes normality or something else.

Instead, look at the residuals (residual diagnostic plots, like a normal-quantile quantile plot of the residuals- among others!) and consider your sample size. And also understand the kind of response variable you like to model: is it defined on the whole real line, or is is continuous but can only be positive? Is is a proportion bound within (0, 1)? Or is is a discrete variable like counts of something? These are relevant things to be clarified first. Then you can think of a hopefully reasonable model, and if you then find that the normal distribution should actually be resonable (possibly afer a transformation of your values), then you fit the model and inspect the residual diagnostic plots to see if the data screams at you that some of your thoughts must have been severly wrong. If so, start over. If not, go on.

Bruce Weaver

Hi Kevin Harris. Following up on Jochen Wilhelm's advice to "consider your sample size", see slides 9 and 10 in the attached PDF. HTH.

PS- I know you asked about two-way ANOVA, not regression. But ANOVA and OLS regression are both special cases of the general linear model, so your two-way ANOVA could be estimated as a(n equivalent) regression model.

In case Jochen Wilhelm's point about normality tests above was not clear, please see the attached image. ;-)

I am working on my Master's thesis on the biogeography of the genus Ruagea and I would like to ask, could someone help me to check whether my result?

Can telekinesis, telepathy, and prediction be achieved through science?

Does soybean seed coat or cotyledon contain chlorophyl or flavonoids? What types? determined by paper chromatography? other methods (high school lab)?

How to identify wetland area in Landsat imagery?

Im trying iodination of an arene, but only getting black precipitates, is there anyone with iodination experience who can advise?

Is it possible that the Earth's axis of rotation could change?

Can we calculate the CO2 adsorption capacity(mmol/g) of a solid adsorbent (MOF) using CO2_TPD analysis?

Is there a way to calculate cell density and thus cell count from confluency for RPE-1 cells?

Does wind power turbine use increase world temperatures a lot and create hurricanes?

Condensation of a Q8 serendipity FE to a Q4 FE with drilling DOFs?

Repeated measures ANOVA, ANCOVA or Regression?

Which test should be used to study association among demographic profile and awarness level?

What is the acceptable p-value cutoff for GO enrichment analysis ?

Posthoc test lettering in JAMOVI?

How to do Mann-Whitney U test with Bonferroni corrected p-values?

Bonferroni correction. I have independent t-test, paired t-test and ancova conducted. Which test would require Bonferroni adjustment?

Can I use Likert scale with Paired Sample T-test?

Paired t-test or unpaired t-test for my quantitative data with SPSS?

Non-parametric version of the wo-way repeated measures ANOVA?

What is the impact of collaborations with key suppliers on an SME's competitiveness?