What is the best way to test for outliers using ANOVA?

Carmen C W Lim Popular answer

Regression models (ANOVA included) rely heavily on the normality assumption. So the presence of outliers can severely distort your analysis.

Maybe you can start by checking for measurement errors. If this really is the case, it will be safe to drop the outliers.

If not, you can run the analysis with and without the presence of outliers. If there are no change in the results, you can drop the outlier since the outlier affects the assumptions. If there are changes, then keep the outliers in your models and describe how has the results changed.

Alternatively, if there are way too many of them, you can try to transform the variable.

Hope this helps.

Carmen C W Lim

Regression models (ANOVA included) rely heavily on the normality assumption. So the presence of outliers can severely distort your analysis.

Maybe you can start by checking for measurement errors. If this really is the case, it will be safe to drop the outliers.

Alternatively, if there are way too many of them, you can try to transform the variable.

Hope this helps.

Mehmet Sinan Iyisoy

ANOVA is nothing but a regression. You can use outlier detection methods available for regression. Other than those you can inspect residuals of ANOVA.

Another path is to use robust methods.

Jochen Wilhelm

You may identify outliers in diagnostic residual plots.

If you want some control on the type-I error on that identification you can use Grubb's test on the residuals.

But be aware: outlier tests are to identify values that do not bewhave as expected (to select just those values), not to make sure that the rest (that failed to give significance) is in accordance with some assumptions of an analysis. This is often mistaken. If the latter is your aim, you should remove data based on the results of outlier tests. Instead, for the identified outlying values you should go the following:

1) check if the response value is physically impossible or highly implausible (that's what Carment supposedly called "checking for measurement errors"). If so, you should (must) remove the value, because it is obviousely wrong. If not so, proceed with step 2.

2) go back to the lab book and see if there is any plausible experimental reason explaining the "outlyingness" (according to Carmen I would call this "checking for experimental errors"). If you don't find a plausible reason, then the value is taken to be as serious as any other value.

The remaing data may still contain outlying values, possibly even values for which an outlier test would be "significant". Using this data, it is not, like Carmen said, that these values would distort the analysis - it is that removing those values would distort the anaysis, as removing outliers just because of their values will bias both, the estimates and the variance.

Mehmets answer "to use robust methods" implies (to me) that he also thinks you want to remove outliers to make your ANOVA "more correct". Note that many "robust methods" are testing different hypotheses, so you should be clear what hypotheses you actually want to test. Robust methods based on resampling/bootstrapping would allow you to test the hypotheses that are interesting for you, but they will have very low to no power if the sample size is small. If the sample size is large then there is no problem in using ANOVA, because some rare outliers won't have any considerable impact on the result (if outliers are not rare, you have a different problem! - either your experiment went wrong or your assumptions are severely unreasonable).

Jos Feys

The R package ‘outliers’ could be interesting?

https://cran.r-project.org/web/packages/outliers/outliers.pdf

Mehmet Sinan Iyisoy

Jochen, my intention was to mention a relevant technique because he seemed to be a lot concerned with outliers. I am not a big fan of outlier removal.

J. A. Hageman

Use leave-one-out-crossvalidation on your ANOVA model and calculate deletion residuals. Observations with very large deletion residuals (not the normal, model residuals!) are suspect for closer inspection. This heuristic might work with only a limited number of observations. However, without a proper excuse to remove these from your data set you are a on a slippery slope...

Martin Schmettow

First of all: do not confuse outliers with the assumption of normality of residuals. Many response variables, such as durations or counts are naturally left skewed, resulting in a long right tail. Any test and also boxplots would then identify those remote right observations as outliers, but they aren't. In such cases, stepping over to Generalized Linear Models resolves the problem gracefully.

https://www.researchgate.net/project/Book-New-statistics-for-the-design-researcher (chapter 7)

Syntax for a 4-way interaction with a mixed ANOVA?

Facially communicated extraversion

Repeated measures ANOVA, ANCOVA or Regression?

Posthoc test lettering in JAMOVI?

Non-parametric version of the wo-way repeated measures ANOVA?

How to test multivariate outlier in STATA?

Bimodal Distribution, Split or Transform for ANOVA?

Statistical test for weight data: 2 different groups overtime? 2-way repeated measure ANOVA?

Can I use Repeated measures ANOVA for 2 samples with 2 time point ?

Using Kruskal-Wallis-Test when analyzing abundance and diversity indices?

How can we calculate the ANOVA for the 13x13 simple lattice with two consecutive years in two locations by R software?

How can I prepare the csv. dataset to find the flood triggaring factor using ANN?