I am using Prism to carry out some two-way ANOVA tests. I have a series of fairly small datasets that I want to analyse the same way. But a few of the datasets do not pass the Shapiro-Wilk normality test. Reading the Prism Guide, it seems like I would be justified in still carrying out an ANOVA, seeing as the Q-Q plots looks like they do not deviate too far a normal distribution.

However I am struggling to find examples in the literature of this being done. I am wondering how I would justify this in a thesis/paper and if the following would be acceptable:

"Before undergoing an ANOVA, the each dataset was tested for Gaussian distribution using the Shapiro-Wilk normality test (ɑ = 0.05). Although not all datasets passed, majority of datasets in the series were normally distributed. Of those that failed, Q-Q was assessed and no major violations were detected. As statistical tests are generally robust to mild violations, and to maintain consistency across datasets, two-way ANOVA was carried out."

Although may manuals and websites state that ANOVA is robust there don't seem to be any peer reviewed references for two- or three-way ANOVA but I can find a couple of references for one-way and RM ANOVA (PMID 36695847 & 29048317). If someone could supply a reference to justify my use that would be great.

Excerpt from Prism Guide:(https://www.graphpad.com/guides/prism/latest/statistics/stat_interpreting_results_normality.htm)

What should I conclude if the P value from the normality test is low?

The null hypothesis is that the data are sampled from a Gaussian distribution. If the P value is small enough, you reject that null hypothesis and so accept the alternative hypothesis that the data are not sampled from a Gaussian population. The distribution could be close to Gaussian (with large data sets) or very far from it. The normality test tells you nothing about the alternative distributions.

If your P value is small enough to declare the deviations from the Gaussian idea to be "statistically significant", you then have four choices:

  • The data may come from another identifiable distribution. If so, you may be able to transform your values to create a Gaussian distribution. For example, if the data come from a lognormal distribution, transform all values to their logarithms.
  • The presence of one or a few outliers might be causing the normality test to fail. Run an outlier test. Consider excluding the outlier(s).
  • If the departure from normality is small, you may choose to do nothing. Statistical tests tend to be quite robust to mild violations of the Gaussian assumption.
  • Switch to nonparametric tests that don’t assume a Gaussian distribution. But the decision to use (or not use) nonparametric tests is a big decision. It should not be based on a single normality test and should not be automated.

Don't use this approach: First perform a normality test. If the P value is low, demonstrating that the data do not follow a Gaussian distribution, choose a nonparametric test. Otherwise choose a conventional test.

Prism does not use this approach, because the choice of parametric vs. nonparametric is more complicated than that.

  • Often, the analysis will be one of a series of experiments. Since you want to analyze all the experiments the same way, you cannot rely on the results from a single normality test.
  • Many biological variables follow lognormal distributions. If your data are sampled from a lognormal distribution, the best way to analyze the data is to first transform to logarithms and then analyze the logs. It would be a mistake to jump right to nonparametric tests, without considering transforming.
  • Other transforms can also be useful (reciprocal) depending on the distribution of the data.
  • Data can fail a normality test because of the presence of an outlier. Removing that outlier can restore normality.
  • The decision of whether to use a parametric or nonparametric test is most important with small data sets (since the power of nonparametric tests is so low). But with small data sets, normality tests have little power to detect non-gaussian distributions, so an automatic approach would give you false confidence.
  • With large data sets, normality tests can be too sensitive. A low P value from a normality test tells you that there is strong evidence that the data are not sampled from an ideal Gaussian distribution. But you already know that, as almost no scientifically relevant variables form an ideal Gaussian distribution. What you want to know is whether the distribution deviates enough from the Gaussian ideal to invalidate conventional statistical tests (that assume a Gaussian distribution). A normality test does not answer this question. With large data sets, trivial deviations from the idea can lead to a small P value.

The decision of when to use a parametric test and when to use a nonparametric test is a difficult one, requiring thinking and perspective. This decision should not be automated.

More Kevin Harris's questions See All
Similar questions and discussions