When you compare mean of quantitative data (for example: blood pressure or heart rate) in more than two groups by ANOVA test, and get significant difference (p< 0.05), which post-test should be used to find different groups.
@Hossein: ANOVA is no means to "compare mean of more than two groups". The ANOVA is a method to quantify the predictive value of predictor (or a whole set of predictors) in a model. The predictors can be continuous and/or categorical, and the categorical predictors can be dichotomous or multinominal (i.e. they can have more than 2 different categories).
The term "post hoc test" has nothing to do with ANOVA (logically). It comes from the fact that a pooled variance estimate of all data is used for the particular test. Thus, the data of all (other) groups must be known before these tests can be conducted, the tests "comparing" two groups can only be performed after (= post hoc) the data of all the other groups is known. (Now there is a stupid practical relation between ANOVA and post-hoc tests: the pooled variance estimate is an intermeate result in the ANOVA-calculations, so programmes [lazy people] used it subsequently for post-hoc tests; this may have caused the impression or misconception that "post hoc" means "after ANOVA", and that an ANOVA must be performed and be "significant" before post-hoc tests may be performed. This is wrong.*)
What post-hoc test you should use depends on two things:
1) what (and how many) test do you want to make? (e.g. all-pairwise? multiple-to-one? some selected tests? ...)
2) what error-rate do you want to control? (e.g. the test-wise error-rate (twer) or the family-wise error-rate (fwer) or the false-discovery-rate (fdr)?)
A test-wise control needs no further adjustment. You can perform the desired t-tests, just using the pooled standard error.
Most other typical poct-hoc tests control the fwer (independent of an ANOVA!). For all-pairwise comparisons, Tukey's HSD is typically a good choice. When many treatments are tested all against the same control group, Dunnett's test is appropriate. For a small selection of tests from a large amount of possible tests, the Bonferroni-Holm correction can be used to control the fwer.
The fdr is controlled with the adjustment by Benhamini-Hochberg.
As Olli noted, correlated outcomes cause additional problems and difficulties. Actually we have no tests that properly hyndle correlated outcomes. The correlations should instead be considered by the model in the first place (so that the residuals will be uncorrelated). Hierarchical or mixed-effect models can be a way to tackle these problems.
@David: I suppose you are talking about a repeated measures ANOVA? Considering only fixed effects, the same results can be obtained using the differences (treated minus the respective control value) only (the "control group" is then completely omitted in the analysis since its information is incorporated in the differences). The analysis of an interaction is something different. One usually talks about interactions when different treatment or predictive factors are concerned (the rat, in your example, would not considered a treatment factor). An example is the analysis of the effect of a treatment under different conditions. Consider the treatment being the administration of a drug (yes/no), and the condition a genotype (wt/mut). So there are 4 combinations of treatment and genotype. A typical (and quite bad) approach is to test all kinds of differences between these 4 "groups" (drug:no vs. drug:yes in wt and also in mut, wt vs. mut in treated and also in untreated), If one wants to infere how much the mutation affects the reaction to the treatment, then the interaction of treatment and genotype should be analyzed.
*There is unfortunately a famous exception: Fisher's LSD test. Fisher developed the ANOVA, and he used a post-hoc test that did not control the family-wise error-rate [fwer]. Althogh Fisher was pretty much upset by the Neymanian concept of error rates he (I think) proposed that if the tests are performed only after the ANOVA on the same values was "significant" at the same level, then the fwer was correctly controlled - and this also only for the case that there were exactly 3 groups. There are some derivations of this method, also for >3 groups, and they are called "ANOVA-protected post-hoc tests".
[Edit:typos; there will surely still be some left...]
given your description, I understand you're interested in determining which specific groups are statistically different one from each other. I therefore recommend the use of the Tukey HSD test as a post-hoc.
First, I always use a Box plot to make a visual comparison (at least as important as numerical statistics) of groups. Then I would recommend using Fisher's least significant difference (LSD), which, I believe, is the easiest of all multiple comparison methods:
There is much debate about which ``post hoc'' test to use, how to use them, and the usual mis-interpretation of p value arguments in general.
If you are going to use one of the common ones, most people say NOT to use LSD (the statistical test) because it does not adjust for the number of tests. This is why HSD came out, but there are several options and people argue when each is more or less appropriate. An older review article is Shaffer, J. P. (1995). Multiple hypothesis testing. Annual Review of Psychology 46, 561–576. The reference I now use most for the topic is Bretz et al.'s book, Multiple comparisons using R. This also covers some of the more recent methods used when there are hundreds of comparisons (like in neuro-imagining). The function p.adjust in R allows you to enter the unadjusted p values and then it outputs the adjusted ones using the method requested.
I recommend Tukey's post hoc test, it compares means of all compared groups. It is the most popular and most commercial ANOVA post hoc test. It depends what are your motifs for this. If you want just to get published and pas through review, then use Tukey. If you want something more, than you have o study the topic by yourself.
These are all good answers. It's important to realise that your choice of test is a decision about how conservative you want to be in your search for differences. R is actually a great program for analysis because it forces you to learn exactly what every test is doing.
Tukey is a conservative test. It will adjust your alpha term (relates closely to your p-value) based on the number of groups you have. However, Tukey actually does this based on the maximum number of possible comparisons given a certain number of groups. This might be more comparisons than it makes sense to make given your questions. If you do the same with Bonferroni, it is even more conservative; but with Bonferroni you can manually adjust alpha to only adjust for the number of comparisons you actually want to make.
Either of these are acceptable to journals. Stay away from Fisher's LSD unless you manually adjust your p values to account for the multiple comparisons.
Are you conservative or liberal in your statistical testing philosophy? Use Bonferroni if you want to be very stringent, it will be the hardest of standard corrections to find reject false null hypotheses. Tukey is a good one too, as others pointed out.
Homogeneity of variance test applied to assess the homogenous data within the group. ANOVA applied to assess the significant difference between the groups.
If, homogeneity is achieved and ANOVA showed significance, then Bonferroni/Tukey test applied as Post Hoc test for multiple comparisons of groups.
If homogeneity not achieved, but ANOVA showed significance, then log transformation till homogeneity attained for the group values (not more than two transformations) and checked for homogeneity and ANOVA again, if homogeneity was achieved and ANOVA showed significance for the transformed values, then Bonferroni/Tukey test is used. If homogeneity not achieved even after two log transformations, Dunnett T3 test for original values was performed as Post Hoc test for multiple comparisons of groups.
The same methodology can be applied in case statistics is compared with for multiple groups with one paticular group alone, where Dunnett test for first or last group can be selected in place of Bonferroni/Tukey for homogenous data and Dunnett T3 for heterogenous data.
As many people said before me, the Bonferroni is the strongest correction and it's possible that you lose the significance even if it's true. I think that the Tukey correction is very useful. But using ANOVA you only know if ther's any significant group, but if you try to do a regression model using as covariate a variable that define the different groups, you know which group is significant and also you will able to quantify that significance.
Question which post-hoc test is appropriate seems to depend on correlations between outcomes in your data (at least in this study http://www.ncbi.nlm.nih.gov/pubmed/19254098 by Blakesley et al. 2009). They recommend "the Hochberg and Hommel methods... ...for mildly correlated outcomes and the step-down minP method... ...for highly correlated outcomes."
One last point worth thinking about: if you are looking at the effect of a treatment or some other intervention on a change induced by a disease model (e.g. if you are looking at blood pressure in control and spontaneously hypertensive rats [we'll call these 'group'] under baseline conditions and after drug intervention [we'll call these 'treatment'] then the two-way ANOVA (or other linear model) itself can be used to look for evidence of an effect of your treatment. The way to do this is to look for a significant interaction effect between 'group' and 'treatment'.
This is quite a powerful test when you are looking at more than one thing that you expect to change some or other parameter (e.g. blood pressure) because it takes into account your initial conditions (i.e. we'd expect that a spontaneously hypertensive rat would have elevated blood pressure when compared to a control rat). Therefore, taking this into account gives you statistical power. That is to say, it tests for 'a difference in the difference'.
@Hossein: ANOVA is no means to "compare mean of more than two groups". The ANOVA is a method to quantify the predictive value of predictor (or a whole set of predictors) in a model. The predictors can be continuous and/or categorical, and the categorical predictors can be dichotomous or multinominal (i.e. they can have more than 2 different categories).
The term "post hoc test" has nothing to do with ANOVA (logically). It comes from the fact that a pooled variance estimate of all data is used for the particular test. Thus, the data of all (other) groups must be known before these tests can be conducted, the tests "comparing" two groups can only be performed after (= post hoc) the data of all the other groups is known. (Now there is a stupid practical relation between ANOVA and post-hoc tests: the pooled variance estimate is an intermeate result in the ANOVA-calculations, so programmes [lazy people] used it subsequently for post-hoc tests; this may have caused the impression or misconception that "post hoc" means "after ANOVA", and that an ANOVA must be performed and be "significant" before post-hoc tests may be performed. This is wrong.*)
What post-hoc test you should use depends on two things:
1) what (and how many) test do you want to make? (e.g. all-pairwise? multiple-to-one? some selected tests? ...)
2) what error-rate do you want to control? (e.g. the test-wise error-rate (twer) or the family-wise error-rate (fwer) or the false-discovery-rate (fdr)?)
A test-wise control needs no further adjustment. You can perform the desired t-tests, just using the pooled standard error.
Most other typical poct-hoc tests control the fwer (independent of an ANOVA!). For all-pairwise comparisons, Tukey's HSD is typically a good choice. When many treatments are tested all against the same control group, Dunnett's test is appropriate. For a small selection of tests from a large amount of possible tests, the Bonferroni-Holm correction can be used to control the fwer.
The fdr is controlled with the adjustment by Benhamini-Hochberg.
As Olli noted, correlated outcomes cause additional problems and difficulties. Actually we have no tests that properly hyndle correlated outcomes. The correlations should instead be considered by the model in the first place (so that the residuals will be uncorrelated). Hierarchical or mixed-effect models can be a way to tackle these problems.
@David: I suppose you are talking about a repeated measures ANOVA? Considering only fixed effects, the same results can be obtained using the differences (treated minus the respective control value) only (the "control group" is then completely omitted in the analysis since its information is incorporated in the differences). The analysis of an interaction is something different. One usually talks about interactions when different treatment or predictive factors are concerned (the rat, in your example, would not considered a treatment factor). An example is the analysis of the effect of a treatment under different conditions. Consider the treatment being the administration of a drug (yes/no), and the condition a genotype (wt/mut). So there are 4 combinations of treatment and genotype. A typical (and quite bad) approach is to test all kinds of differences between these 4 "groups" (drug:no vs. drug:yes in wt and also in mut, wt vs. mut in treated and also in untreated), If one wants to infere how much the mutation affects the reaction to the treatment, then the interaction of treatment and genotype should be analyzed.
*There is unfortunately a famous exception: Fisher's LSD test. Fisher developed the ANOVA, and he used a post-hoc test that did not control the family-wise error-rate [fwer]. Althogh Fisher was pretty much upset by the Neymanian concept of error rates he (I think) proposed that if the tests are performed only after the ANOVA on the same values was "significant" at the same level, then the fwer was correctly controlled - and this also only for the case that there were exactly 3 groups. There are some derivations of this method, also for >3 groups, and they are called "ANOVA-protected post-hoc tests".
[Edit:typos; there will surely still be some left...]
@Jochen In fact, I wasn't talking about a mixed effects model, though I agree that sticking a random effect in to account for variation within single animals is a good idea where appropriate. I also agree that comparing all four groups in my example In fact, I think the situation you've described with genotype is analagous to what I was driving at with my example of the spontaneously hypertensive rat (SHR). You'll have to forgive my ad hoc statistical descriptions. I remain a statistically enthusiastic physiologist and not the opposite.
"If one wants to infer how much the mutation affects the reaction to the treatment, then the interaction of treatment and genotype should be analyzed."
Exactly, or indeed the inverse. We'd expect that the SHRs would have elevated an elevated BP compared to control rats. If we administer a treatment that reverses that elevation in SHRs and has no effect on the BP of control rats then, unless I'm mistaken, that would manifest as a statistically significant interaction: the key thing being that the treatment produces changes in different directions in our SHRs and control rats.
What important issues should be considered to choose a proper and reliable post hoc test, when the ANOVA shows significant difference between groups? Is there any stepwise scientific algorithm to choose or reject a specific post hoc test in special design situations or is it up to the researcher to select one?
I suggest Duncan Multiple Comparison Test becouse of sencebilities to small differences among groups. If small differences not important for you you can use Tukey...
Easy to answer: Bonferroni or Tamhane, in respect to distribution of variance (homogen or not). You can use any other test, but do never use more than ONE!
What important issues should be considered to choose a proper and reliable post hoc test, when the ANOVA shows significant difference between groups?
Is there any stepwise scientific algorithm to choose or reject a specific post hoc test in special design situations or is it up to the researcher to select one?
The problem is not "one-dimensional", so there can not be a unique or general best solution. Each test has specific advantages and disadvantages under specific circumstances (and those are defined by the data and the scientific problem). To my opinion, the differences are small, if not to say mostly negligible. A more powerful test (good) may perform particularily bad when only few groups are different (just to give an example). So a rationally good (or best?) choice of a test depends on many specialities of your data, experimental design, the expected effects and the research questions, the relative importance of the kinds of errors and surely much more. There is no simple answer.
However, I think that there are pragmatic answers that work reasonably well under most circumstances, and these are given above.
in addition to that New Duncan Multiple Range test is maybe another option, or using orthogonal contrast in case you have to compare group of treatments
Dear Hossein. In line with many suggestions having been made, I also prefer the Bonferroni test - at least for a "trial and error" test. For an intuitive description of this and further post-hoc tests I recommend you consulting the book of Andy Field, "Discovering Statistics using SPSS". Best wishes, Dominik
The discussion above misses some major points, like of post-hocs should be used as a routine. (The answer is "No".) This is not a new question, please see a very similar discussion in https://www.researchgate.net/post/Which_post_hoc_test_is_best !
If you get a significant difference in ANOVA, you can adjust it with Tukey if you are making multiple comparisons. I use Tukey as it is conservative and mostly used. You can use Dunnet if you are comparing a particular group with just control. Depending upon the need you can also perform contrast.
There are many good points above and some less so (like Ok for Duncan and double logs to achieve homogeneity).
In short, total reliance on post-hocs are a thing of the past., although often asked for. Effect size estimates and GLzM (General LineariZed Models) is the way forward. I have detailed this in an earlier thread, but now at midnight and no time to hunt down this earlier comment.