In statistical analysis,to find significant differences, you must use post hoc tests such as Tukey, LSD, bonferroni etc. Which of these tests is better?
While above many relevant comments are made above on problems and techs of post-hoc test (and indeed there are many!), the real problem is that they are often used for designs not suitable for this kind of analysis.
This is very often done purely by routine / tradition as correctly captured by the opening question "In statistical analysis,to find significant differences, you must use post hoc tests".
Post-hoc is a suitable for data collected on "fishing expeditions" experiments or observations where no clear hypothesis is clear beforehand regarding which treatments should be compared, so then you must compare all trmts with each other.
"Run the flag and see who salutes!"
When, in contrast and is often the case, there is a clear design / hypothesis built into the experiment before data was collected, a POST hoc is illogical. This is best illustrated by a dose-response study, where a Factorial ANOVA or Regression is both appropriate and sufficient for the analysis. Adding a post-hoc could be seen as an indication that the authors did in fact appreciate that a dose-response was done, but were just testing random doses without any logical order.
This is a fast note, pls note that references could be given and alternatives for control-treatments like effect sizes etc. I hope the logic itself will be sufficient for now?
It's a matter of opinion which may be better or worse; but, do keep in mind that any post hoc test is suspect, for it attempts to individually test each group in a t-test for significance and that flies in the face of the ANOVA or the very reason you applied an ANOVA test in the first place.
I always use One-Anova coupled with Tukey's post-hoc test. It works well depending on the amount of parameters you are handling. Most importantly, it is key to understand that the power advantage of the Tukey test depends on the assumption that all possible pairwise comparisons are being made.
Luisiana, in the first paper you linked the authors state that in the section "Problems" on an axample where a set of 20 variables was tested: "We can say that the two groups are not equal for all 20 variables, but we cannot say which, or even how many, variables differ."
I think this is wrong. In contratst to rejecting the Omnibus-H0 of an ANOVA, the rejection of *individual* H0s (with Bonferroni adjustment) allows directly to make decisions about these H0s. You use Bonferroni adjustments to assure a family-wise type-I error rate, and this rate *is* kept (being a little too conservative, but this is better than beein unexpectedly liberal). You *do* decide which individual H0s you reject while controlling P(at least one false rejection < alpha). Can you comment on this?
Two paragraphs further, the authors state to type-II errors: " In clinical practice, if a high
concentration of creatine kinase were considered compatible with “no myocardial infarction” by virtue of a Bonferroni adjustment,".
Again, I think this is a bad mistake. They interpret a non-significant result (as indicating the absence of an effect or difference). This is nonsense, as long as the experiment was not designed to have a prespecified power (but then, obviousely, their "lack-of-power" argument doesn't apply anymore!). I know that such mistakes are frequently done in medical papers, but a paper about the use of statistics should not repeat this mistake and also not indicate that such interpretations are ok simply because many others commit this error. How do you think about this?
I havent read further because I think it will be a waste of time. But maybe you will ease my doubts.
LSD is controlling the FWER only if the Anova is significant and there are exactly 3 groups. For more groups, LSD does not control the FWER, or at best only in a "week sense" (cumulation of too many false-positives when the omnibus-H0 is false.
Also I noted that the Tukey’s honestly significant difference test uses the Studentized range statistic to make all pairwise comparisons between groups. On the other hand, LSD test is equivalent to multiple t tests between all pairs of groups. This test does not control the overall probability of rejecting the hypotheses that some pairs of means are different, while in fact they are equal.
While above many relevant comments are made above on problems and techs of post-hoc test (and indeed there are many!), the real problem is that they are often used for designs not suitable for this kind of analysis.
This is very often done purely by routine / tradition as correctly captured by the opening question "In statistical analysis,to find significant differences, you must use post hoc tests".
Post-hoc is a suitable for data collected on "fishing expeditions" experiments or observations where no clear hypothesis is clear beforehand regarding which treatments should be compared, so then you must compare all trmts with each other.
"Run the flag and see who salutes!"
When, in contrast and is often the case, there is a clear design / hypothesis built into the experiment before data was collected, a POST hoc is illogical. This is best illustrated by a dose-response study, where a Factorial ANOVA or Regression is both appropriate and sufficient for the analysis. Adding a post-hoc could be seen as an indication that the authors did in fact appreciate that a dose-response was done, but were just testing random doses without any logical order.
This is a fast note, pls note that references could be given and alternatives for control-treatments like effect sizes etc. I hope the logic itself will be sufficient for now?
I think all the varying comments made so far exemplify the original assertion: it is a matter of opinion, including to what degree of belief one places in any such post hoc test.
In my experience – which includes small n tests, involving animals – I always tend more towards conservative, for the inherent variability in biological specimens and testing is enormous; hence, the data must show a very strong tendency before I would believe any post hoc test.
No test is "best". Each has its own advantages and disadvantages, and we do statistics precisely because we do not know the truth. If we knew the true situation we would know the best test but we wouldn't need it. If you want higher power against a certain kind of alternative, you have to pay for this with less power against others (for given significance level). There are no free lunches.
For sure post hoc tests are matter of opinion but I think on the same way that there is not a simple answer to the question if we do not have information on the dataset and the study design, for istance. Each of the above mentioned post hoc tests have their advantages, disavantages, assumptions and limitations. If we do not know nothing about data, how they were taken and what are the comparisons to performed I think no answer its possible to this question.
I feel the best post-hoc test is Tukey's HSD (Honestly significant difference) for only parametric analysis in which you are confident that your sample is belonging from a normally distributed population. However, it won't be a best test for non-parametric analyses. in this condition, first you have to go for test of homogeneity of variance.
In this context people do normalization of data etc...Therefore, i feel to which test is best ? i would say-it depends upon your choice, what do you want to test, thus kind of your aim/objective ?
I think it is a matter of preference because there are many tests that have the same characteristics and requirements and their results are generally similar. I prefer Tukey's HSD (Honestly significant difference) and Scheffé tests because they are more conservative (minor type I error). I use Tukey when all treatments have the same number of observations and Scheffé when they don't.