When you compare mean of more than two groups by ANOVA test, which post-test you recommend to find different groups ?

06 October 2014 36 2K Report

When you compare mean of quantitative data (for example: blood pressure or heart rate) in more than two groups by ANOVA test, and get significant difference (p< 0.05), which post-test should be used to find different groups.

Jochen Wilhelm Popular answer

@Hossein: ANOVA is no means to "compare mean of more than two groups". The ANOVA is a method to quantify the predictive value of predictor (or a whole set of predictors) in a model. The predictors can be continuous and/or categorical, and the categorical predictors can be dichotomous or multinominal (i.e. they can have more than 2 different categories).

The term "post hoc test" has nothing to do with ANOVA (logically). It comes from the fact that a pooled variance estimate of all data is used for the particular test. Thus, the data of all (other) groups must be known before these tests can be conducted, the tests "comparing" two groups can only be performed after (= post hoc) the data of all the other groups is known. (Now there is a stupid practical relation between ANOVA and post-hoc tests: the pooled variance estimate is an intermeate result in the ANOVA-calculations, so programmes [lazy people] used it subsequently for post-hoc tests; this may have caused the impression or misconception that "post hoc" means "after ANOVA", and that an ANOVA must be performed and be "significant" before post-hoc tests may be performed. This is wrong.*)

What post-hoc test you should use depends on two things:

1) what (and how many) test do you want to make? (e.g. all-pairwise? multiple-to-one? some selected tests? ...)

2) what error-rate do you want to control? (e.g. the test-wise error-rate (twer) or the family-wise error-rate (fwer) or the false-discovery-rate (fdr)?)

A test-wise control needs no further adjustment. You can perform the desired t-tests, just using the pooled standard error.

Most other typical poct-hoc tests control the fwer (independent of an ANOVA!). For all-pairwise comparisons, Tukey's HSD is typically a good choice. When many treatments are tested all against the same control group, Dunnett's test is appropriate. For a small selection of tests from a large amount of possible tests, the Bonferroni-Holm correction can be used to control the fwer.

The fdr is controlled with the adjustment by Benhamini-Hochberg.

As Olli noted, correlated outcomes cause additional problems and difficulties. Actually we have no tests that properly hyndle correlated outcomes. The correlations should instead be considered by the model in the first place (so that the residuals will be uncorrelated). Hierarchical or mixed-effect models can be a way to tackle these problems.

@David: I suppose you are talking about a repeated measures ANOVA? Considering only fixed effects, the same results can be obtained using the differences (treated minus the respective control value) only (the "control group" is then completely omitted in the analysis since its information is incorporated in the differences). The analysis of an interaction is something different. One usually talks about interactions when different treatment or predictive factors are concerned (the rat, in your example, would not considered a treatment factor). An example is the analysis of the effect of a treatment under different conditions. Consider the treatment being the administration of a drug (yes/no), and the condition a genotype (wt/mut). So there are 4 combinations of treatment and genotype. A typical (and quite bad) approach is to test all kinds of differences between these 4 "groups" (drug:no vs. drug:yes in wt and also in mut, wt vs. mut in treated and also in untreated), If one wants to infere how much the mutation affects the reaction to the treatment, then the interaction of treatment and genotype should be analyzed.

*There is unfortunately a famous exception: Fisher's LSD test. Fisher developed the ANOVA, and he used a post-hoc test that did not control the family-wise error-rate [fwer]. Althogh Fisher was pretty much upset by the Neymanian concept of error rates he (I think) proposed that if the tests are performed only after the ANOVA on the same values was "significant" at the same level, then the fwer was correctly controlled - and this also only for the case that there were exactly 3 groups. There are some derivations of this method, also for >3 groups, and they are called "ANOVA-protected post-hoc tests".

[Edit:typos; there will surely still be some left...]

Wojciech Świątkowski

Dear Hossein,

given your description, I understand you're interested in determining which specific groups are statistically different one from each other. I therefore recommend the use of the Tukey HSD test as a post-hoc.

Burak Alakent

First, I always use a Box plot to make a visual comparison (at least as important as numerical statistics) of groups. Then I would recommend using Fisher's least significant difference (LSD), which, I believe, is the easiest of all multiple comparison methods:

http://www.statisticshowto.com/how-to-calculate-the-least-significant-difference-lsd/

As pointed above, Tukey's method is also quite popular among comparison methods. You may also use contrasts.

Daniel Wright

There is much debate about which ``post hoc'' test to use, how to use them, and the usual mis-interpretation of p value arguments in general.

If you are going to use one of the common ones, most people say NOT to use LSD (the statistical test) because it does not adjust for the number of tests. This is why HSD came out, but there are several options and people argue when each is more or less appropriate. An older review article is Shaffer, J. P. (1995). Multiple hypothesis testing. Annual Review of Psychology 46, 561–576. The reference I now use most for the topic is Bretz et al.'s book, Multiple comparisons using R. This also covers some of the more recent methods used when there are hundreds of comparisons (like in neuro-imagining). The function p.adjust in R allows you to enter the unadjusted p values and then it outputs the adjusted ones using the method requested.

Pavle Randjelovic

I recommend Tukey's post hoc test, it compares means of all compared groups. It is the most popular and most commercial ANOVA post hoc test. It depends what are your motifs for this. If you want just to get published and pas through review, then use Tukey. If you want something more, than you have o study the topic by yourself.

David George Stephen Farmer

These are all good answers. It's important to realise that your choice of test is a decision about how conservative you want to be in your search for differences. R is actually a great program for analysis because it forces you to learn exactly what every test is doing.

Tukey is a conservative test. It will adjust your alpha term (relates closely to your p-value) based on the number of groups you have. However, Tukey actually does this based on the maximum number of possible comparisons given a certain number of groups. This might be more comparisons than it makes sense to make given your questions. If you do the same with Bonferroni, it is even more conservative; but with Bonferroni you can manually adjust alpha to only adjust for the number of comparisons you actually want to make.

Either of these are acceptable to journals. Stay away from Fisher's LSD unless you manually adjust your p values to account for the multiple comparisons.

Corey Shepard Sparks

Are you conservative or liberal in your statistical testing philosophy? Use Bonferroni if you want to be very stringent, it will be the hardest of standard corrections to find reject false null hypotheses. Tukey is a good one too, as others pointed out.

Daniel Wright

Corey, the argument against Bonferroni's is that Holm's dominants it. Though the argument for Bonferroni's over Holm's is it is better known.

Senthil Kumar A

Statistical significance set at p < 0.05.

Homogeneity of variance test applied to assess the homogenous data within the group. ANOVA applied to assess the significant difference between the groups.

If, homogeneity is achieved and ANOVA showed significance, then Bonferroni/Tukey test applied as Post Hoc test for multiple comparisons of groups.

If homogeneity not achieved, but ANOVA showed significance, then log transformation till homogeneity attained for the group values (not more than two transformations) and checked for homogeneity and ANOVA again, if homogeneity was achieved and ANOVA showed significance for the transformed values, then Bonferroni/Tukey test is used. If homogeneity not achieved even after two log transformations, Dunnett T3 test for original values was performed as Post Hoc test for multiple comparisons of groups.

The same methodology can be applied in case statistics is compared with for multiple groups with one paticular group alone, where Dunnett test for first or last group can be selected in place of Bonferroni/Tukey for homogenous data and Dunnett T3 for heterogenous data.

Borja Santos

As many people said before me, the Bonferroni is the strongest correction and it's possible that you lose the significance even if it's true. I think that the Tukey correction is very useful. But using ANOVA you only know if ther's any significant group, but if you try to do a regression model using as covariate a variable that define the different groups, you know which group is significant and also you will able to quantify that significance.

Olli Kärkkäinen

Question which post-hoc test is appropriate seems to depend on correlations between outcomes in your data (at least in this study http://www.ncbi.nlm.nih.gov/pubmed/19254098 by Blakesley et al. 2009). They recommend "the Hochberg and Hommel methods... ...for mildly correlated outcomes and the step-down minP method... ...for highly correlated outcomes."

David George Stephen Farmer

One last point worth thinking about: if you are looking at the effect of a treatment or some other intervention on a change induced by a disease model (e.g. if you are looking at blood pressure in control and spontaneously hypertensive rats [we'll call these 'group'] under baseline conditions and after drug intervention [we'll call these 'treatment'] then the two-way ANOVA (or other linear model) itself can be used to look for evidence of an effect of your treatment. The way to do this is to look for a significant interaction effect between 'group' and 'treatment'.

This is quite a powerful test when you are looking at more than one thing that you expect to change some or other parameter (e.g. blood pressure) because it takes into account your initial conditions (i.e. we'd expect that a spontaneously hypertensive rat would have elevated blood pressure when compared to a control rat). Therefore, taking this into account gives you statistical power. That is to say, it tests for 'a difference in the difference'.

Again, happy to discuss further.

Jochen Wilhelm

What post-hoc test you should use depends on two things:

1) what (and how many) test do you want to make? (e.g. all-pairwise? multiple-to-one? some selected tests? ...)

2) what error-rate do you want to control? (e.g. the test-wise error-rate (twer) or the family-wise error-rate (fwer) or the false-discovery-rate (fdr)?)

A test-wise control needs no further adjustment. You can perform the desired t-tests, just using the pooled standard error.

The fdr is controlled with the adjustment by Benhamini-Hochberg.

[Edit:typos; there will surely still be some left...]

David George Stephen Farmer

@Jochen In fact, I wasn't talking about a mixed effects model, though I agree that sticking a random effect in to account for variation within single animals is a good idea where appropriate. I also agree that comparing all four groups in my example In fact, I think the situation you've described with genotype is analagous to what I was driving at with my example of the spontaneously hypertensive rat (SHR). You'll have to forgive my ad hoc statistical descriptions. I remain a statistically enthusiastic physiologist and not the opposite.

"If one wants to infer how much the mutation affects the reaction to the treatment, then the interaction of treatment and genotype should be analyzed."

Exactly, or indeed the inverse. We'd expect that the SHRs would have elevated an elevated BP compared to control rats. If we administer a treatment that reverses that elevation in SHRs and has no effect on the BP of control rats then, unless I'm mistaken, that would manifest as a statistically significant interaction: the key thing being that the treatment produces changes in different directions in our SHRs and control rats.

Delighted to learn something if I'm wrong.

Jochen Wilhelm

David, there is nothing to forgive and noone to blame. I think we are all here to learn. I am glad that I was able to give some helpful hints.

Hossein Babaei

@Jochen: Thanks for your valuable comments.

What important issues should be considered to choose a proper and reliable post hoc test, when the ANOVA shows significant difference between groups? Is there any stepwise scientific algorithm to choose or reject a specific post hoc test in special design situations or is it up to the researcher to select one?

Is there any paper or Book as guideline?

Jochen Wilhelm

No book but some papers:

http://psycnet.apa.org/journals/cap/25/1/1.pdf

http://onlinelibrary.wiley.com/doi/10.1111/j.1440-1681.1998.tb02179.x/pdf

http://www.jstor.org/stable/2346101

http://www.stat.wisc.edu/~larget/Stat998/Fall2013/GelmanMultipleComparisons.pdf

http://www.education.umd.edu/EDMS/fac/Hancock/writings/HancockKlockars1996.pdf

Suat ŞAHINLER

I suggest Duncan Multiple Comparison Test becouse of sencebilities to small differences among groups. If small differences not important for you you can use Tukey...

Hossein Babaei

Thank you Jochen.

Edgar Hohlfeldt

Easy to answer: Bonferroni or Tamhane, in respect to distribution of variance (homogen or not). You can use any other test, but do never use more than ONE!

Naeem Aslam

I suggest the Bonferroni test. It is probably the most commonly used post hoc test, because it is highly flexible, and very simple to compute.

Karim Hashim Kraidi Al-Saedi

Yes I am dealing with Jochen Wilhelm

Mónica Mazariegos

You should try the Bonferroni test.

Hossein Babaei

What important issues should be considered to choose a proper and reliable post hoc test, when the ANOVA shows significant difference between groups?

Is there any stepwise scientific algorithm to choose or reject a specific post hoc test in special design situations or is it up to the researcher to select one?

Jochen Wilhelm

The problem is not "one-dimensional", so there can not be a unique or general best solution. Each test has specific advantages and disadvantages under specific circumstances (and those are defined by the data and the scientific problem). To my opinion, the differences are small, if not to say mostly negligible. A more powerful test (good) may perform particularily bad when only few groups are different (just to give an example). So a rationally good (or best?) choice of a test depends on many specialities of your data, experimental design, the expected effects and the research questions, the relative importance of the kinds of errors and surely much more. There is no simple answer.

However, I think that there are pragmatic answers that work reasonably well under most circumstances, and these are given above.

Neville Calleja

In addition to the above, I would add that GraphPad Prism offers Dunn's test as a post-hoc for a non-parametric Kruskal-Wallis test.

Pedro Magalhães

My suggestion is that Tukey's HSD is the best choice.

Asmuddin Natsir

in addition to that New Duncan Multiple Range test is maybe another option, or using orthogonal contrast in case you have to compare group of treatments

Dominik Burger

Dear Hossein. In line with many suggestions having been made, I also prefer the Bonferroni test - at least for a "trial and error" test. For an intuitive description of this and further post-hoc tests I recommend you consulting the book of Andy Field, "Discovering Statistics using SPSS". Best wishes, Dominik

Fredrik Schlyter

The discussion above misses some major points, like of post-hocs should be used as a routine. (The answer is "No".) This is not a new question, please see a very similar discussion in https://www.researchgate.net/post/Which_post_hoc_test_is_best !

Hossein Babaei

Thanks every body.

Jasenko Karamehic

Excellent answers colleagues Jochen Wilhelm ·fully documented and clear from whom I learned a lot of useful things!

Suat ŞAHINLER

The most sensitive and powerfull one is duncan multiple comparison test, you can use it.

Bishwo Pokharel

If you get a significant difference in ANOVA, you can adjust it with Tukey if you are making multiple comparisons. I use Tukey as it is conservative and mostly used. You can use Dunnet if you are comparing a particular group with just control. Depending upon the need you can also perform contrast.

Fredrik Schlyter

There are many good points above and some less so (like Ok for Duncan and double logs to achieve homogeneity).

In short, total reliance on post-hocs are a thing of the past., although often asked for. Effect size estimates and GLzM (General LineariZed Models) is the way forward. I have detailed this in an earlier thread, but now at midnight and no time to hunt down this earlier comment.

Ahmed Shemeis

The most important point is "the common test used in your literature"

You can use LSD, Duncan, Tukey, S,N,K, Dunnett, Scheffe, but please take care when you compare your results with what you find in the literature.

This point can drive you to false conclusion due to the variation in the sensitivity of these tests.

Badges
Science method

More Hossein Babaei's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View