Post hoc tests seek to control Type I errors so you wouldn't expect significant results if the ANOVA was not significant. However, it depends upon which post hoc test you are using. The more liberal tests might allow this to happen sometimes because they are seeking to be more sensitive in determining differences between pairs.
You are not supposed to run post hoc tests if the original ANOVA was negative.
I've seen contrasts on data where you have say 5-6 groups in an ANOVA test and you have all of the 1 group vs n-1 groups. By doing that, you can get "significant" differences, even if the ANOVA says there are not any. I would assume that if one of the groups had a really large variance, and the others had a really small variance, then ANOVA would say there is no difference but Post-hoc tests would say there are.
Post hoc tests seek to control Type I errors so you wouldn't expect significant results if the ANOVA was not significant. However, it depends upon which post hoc test you are using. The more liberal tests might allow this to happen sometimes because they are seeking to be more sensitive in determining differences between pairs.
You are not supposed to run post hoc tests if the original ANOVA was negative.
I agree with Peter. It seems hard to believe that you'd get a non-significant P value after ANOVA and then find a significant P value between any of the groups.
Thanks for all scholars who answered. So, what about if the opposit occurs; ANOVA is significant but the post hoc did not show significant results among the groups?!
Huh, that would be equally as strange, however that could be a function of the post-hoc test you are using. Try a few different tests and see if your finding is consistent.
I was curious how often ANOVA and post hocs gave different results. The file attached gives some results. In case anyone wants to play with these, I will add a second comment here which will be the LaTeX, but you need to have knitr set-up for your LaTeX editor (and working).
\section*{How Often do ANOVA and \emph{Post Hocs} produce different results?}
This was a good question and got me thinking how often would they produce different results. This will depend on a lot of things, but I decided to look at just one situation ($n=100$, 5 groups, all normally distributed with $\mu=0$ and $\sigma=1$). I'll use Bonferroni's and no adjustment since those are at the extremes of \emph{post hoc} tests. I'll use \textbf{R} and \textbf{knitr} so all my code can be seen in an efficient manner (and so anyone can alter how they like \dots if you re-type, ResearchGate only want \texttt{pdf}s as attachments).
To keep things simple, let's talk about one-way (between-Ss) ANOVA only. In that case, the null hypothesis for the omnibus F-test states that mu1 = mu2 = ... = muk, where k = the number of groups. There are a host of other contrasts you can carry out that test very different null hypotheses--and these are not limited to pair-wise contrasts. For example, they could be contrasts looking for polynomial trends, some other set of orthogonal contrasts, each treatment versus a control, etc. So it is entirely possible to have situations where you reject H0 for the omnibus test, but not for some other contrast, and vice versa. See the links below for further discussion & examples. And note that with the exception of Fisher's LSD, none of the most commonly used multiple comparison methods require a significant omnibus F-test before proceeding. (The second link below includes lots of discussion of this point.)
Finally, bear in mind that the omnibus F-test for the has numerator df = k-1, whereas any contrast you carry out will have numerator df = 1, and that this difference can be important when the p-values are close to the predetermined alpha level.
I just found a link for an "old version" of Dave Howell's nice chapter on multiple comparison procedures in his book Statistical Methods for Psychology (see below). Here's what Howell says about it:
"In the text I said that I would provide a copy of an older chapter that covers a wider variety of multiple comparison techniques in greater detail. If you really want that stuff, here it is. Much of the chapter is similar to the one in the 8th edition, but there is a lot that has been left out of the new version."
Post hoc tests are designed for situations in which the researcher has already obtained a significant omnibus F-test with a factor that consists of three or more means and additional exploration of the differences among means is needed to provide accurate information on which means are significantly different from each other.
a significant f test dose not, however tell us which pairs of means are significantly different from one another. to determine the exact nature of the relationship between the independent and dependent variables we need post hoc test.
there is some controversy about the accuracy of the post hoc test, a frequent choice between nurse researcher is turkeys test.
should be done after full ANOVA.
the advantages of the comparisons that they increase the power and precision of the data analysis.
this is from: statistical and data analysis for nursing research Denise F. Polit. 2nd ED.
In my experience I never run the omnibus test, really it seems useless, but I prefer to start directly with a posthoc test with Tukey lettering, carefully setting the familiwise error rate, furthermore, whenever possible, I comment the results using the adjusted p-values.
The response to the question "In what cases could ANOVA and post hoc results differ? " I say, it happens in relation to choice of familywise error rate. In my experimental work I must suppose that it never happens even if it could happen with some unknown probability.
Post-hoc tests that control the experimentwise error rate do not require an omnibus F test to be done beforehand. I do not use an post-hic test after a nonsignificant ANOVA F test. It is important to do some exploratory data analysis first so that the user better understands the underlying data in the experiment. Consider also running a rank based ANOVA F test or similar. Compare the results for normal based and rank based F tests. Are they similar or not?
I typically cover several post-hoc tests in an intro biostatistics course.
1. Tests that students should learn about and then avoid using! Here, the error rate is comparisonwise. (multiple t-tests, Fisher's LSD].
2. Tests that control the stagewise error rate (Duncan's MRT, Student-Neuman-Keul Test]. They are compromises, and advise students not io use them either. Duncan's t test is quite liberal. It is popular in Soil Science research. Maybe they need a very powerful test at the expense of a high Type I error rate? The SNK Test lacks power.
3. Tests that control the experimentwise error rate: Tukey's Test is a good one. Bonferroni Method is a great idea for easily explaining it to some consulting clients, but it is in the end a conservative test with lower power than Tukey's Test. Dunnett's test when you compare the mean of a control group to experimental groups. Scheffe's multiple contrasts are useful when you have many hypotheses to test. It is very conservative. I advise to use instead many contrasts, and then modify the significance level with Bonferroni method or similar.
There are some points to think about. I hope that the above material is helpful to you, Professor Muayyad.
Raid, re your first bullet point, note that Fisher's LSD does maintain perfect control over the family-wise error rate (FWER) when there are 3 groups. Dave Howell discusses this in the chapter I attached to another post in the thread. Thom Baguley also discusses it in his Serious Stats book, IIRC. Cheers!
Yes, Bruce, this is true. You may have pointed this out to me in some other thread. When teaching students, I prefer to stay away from such special cases as students tend to forget. Of course, I said what I said above as being "what I teach in my classes". It is not meant for this forum of scholars.
In descriptive statistics any set of data must describe with parameter from central tendency ( mean ) and parameter from dispersive tendency ( std or stderr ) because one of them not enough to describe the data for example the two set of data ( 1 , 5 , 54 ) and ( 18 , 20 , 22 ) have the same mean 20 but they have different variation and range among values .
In inference statistics , the test of significant differences among set of means by using Duncan multiple test or Revised LSD included both the value of mean , variation between means and variation within ( mse ) . So I confirm the comments of other colleagues , that we can perform post hoc test with out ANOVA and we can use pooled variance instead of mse .
The ANOVA and multiple comparisons test presented different test and in some cases gave us different decisions like non significant in F test ANOVA ( for the variance ) and significance differences with Duncan test for the means.
The ANOVA F test tests for the equality of all population means. The test uses all information on the variability across all samples to arrive at a test that can detect differences between the means. Using a post-hoc test may result in using a different level of control of the Type I error rate, and this can affect the power of the test. Both types of tests compare population means,
Duncan's Test is known for having very large Type I error rate, Khalid. I would not use it. Instead, use Tukey's Test.
Duncan's Test used to be a very popular post-hoc test due to its high statistical power at the expense of a very high significance level. It controls the stagewise error rate.
We talk about the validity of the process when we face a situation show non significant F test in ANOVA and significant post hoc test , some of researchers call it problem or mistake in his solution . and other say we must stop when find non significant ANOVA and do not perform post hoc test .
The two tests can gave different results in significance by using good statistical software like SAS and SPSS .
When different levels of significance are used, we cannot compare the outcomes of ANOVA F test with Duncan's test. If the F test uses alpha=0.05 while Duncan's test effectively is using alpha much greater than 0.05, no comparisons can be made here.
I have attached an example (SAS). While the F test has an experimentwise error rate of 0.05, and it have a p-value=0.08>0.05, Duncan't test does NOT control for an experimentwise error rate. It has a comparisonwise error rate that depends on how many comparisons are made and it depends on the ordering fo the sample means. SAS shows significant diferences. This does not mean that the two tests have conflicting outcomes to me.