I got significant results after ANOVA test, but when I applied post hoc Tukey's test, I obtained non significant results. I could not reason it out.
ANOVA and posthoc tests have nothing to do with each other!
ANOVA tests the "exploratory value" of a predictor in a model (typically a factor with more than two levels). It answers the questions how likely we can expect a reduction in the residual variance when there was no association of the presumed predictor with the response.
Posthoc tests are apriori unspecified tests, so to say tests *after seeing* the data and deciding then which comparisons might be interesting. They use pooled variance estimates and control the FWER or the FDR for the family of tests. There are few special cases of posthoc tests that do not efficiently control the FWER. They need a kind of "protection" by a "significant" ANOVA. Most famous example is Fisher's LSD, but this is restricted to 3 groups. Tukey does not require any protection and it keeps the FWER.
In general, ANOVA and posthoc tests answer considerably different questions. Your observation is thus not puzzeling at all.
I agree with Jochen, these are different tests with different aims. I've seen even in some studies the researcher use DUNCAN or LSD after ANOVA showed no significant results, but comparisons are significant for LSD and DUNCAN.
ANOVA is, in words, "variance of the means divided by mean of the treatment variances, multiplied by replications (why multiplication? answer it by yourself)". But different comparison methods use different thresholds for testing the significance ( I bet if you use LSD or DUNCAN there exist significant differences). The logic behind these tests are error control (Type I & II) and it depends on your aim and partly your data.
I also agree with Jochem. This is just the idea of post-hoc test. In my opinion really strange situation appear if ANOVA or Kruskal-Wallis result is insignificant and become significant after post-hoc test.
Dear Malgorzata, you can try that. perform an ANOVA on a dataset which result in a non-significant ANOVA with p-value of around 0.1 or 0.07, and you will see there may exist some significant differences in two methods I've mentioned above (LSD, Duncan). By keeping p-values constant and adding more groups, the number of significant post-hoc tests will increase.
Dear Ehsan,
Thanks! I had such observations and thought it was something wrong with my script to compute ANOVA or something. Good to know you got such observations.
This is called a "weak control" (of the FWER): once the Anova is significant, one collects too many false-positives within the family. So either you get nothing (Anova=n.s.) or a "clustered" set of false-positives (among true positives, hopefully). There is only one special case where this "clustering" won't distroy the control of the FWER, namely when you have exactely 3 groups (or 3 post-hoc tests) (so this is the case where Fisher's LSD really controls the FWER).
Thank you everyone for all the convincing answers.
If you can suggest me some literature, so that I could be in a better position to justify results.
Thank you onceagain
Umesh, Justification of the results depend mostly on your aim and your constraints (specially financial). For example, if you find significant ANOVA you choose the post-hoc methods based on controlling type I and type II errors. Significant post-hoc test results, for instance, in changing some methods in work. In health science, you may find better results of healing using some antibiotics. IF AND IF two new antibiotics are more expensive than the old ones and they can heal your patients better, you should control type I error as much as you can. Because any significant result by chance has negative financial consequences, but has no important effect on healing.
I hope you have the whole idea of these two types of errors, cos they are very very important in health and social sciences.
Why not try the idea of model selection (GLM: BIC, CAIC …) just using dummy for all possible contrasts? May be with non-identiy link?
Dear Ivan, You've mentioned a good point, but in regression you can not control error types as far as I know. For example, If we have 12 groups to compare, we simply can control FWER using Tukey, Sidak, Bon, etc at different levels and if they came from same population, you have very low type I error. But using regression and the criteria you mentioned, there may be a "Chance" for entering any dummy variables in model even if they came from same population, Unless you lower the enter p-value of criterion (for example using 0.03 instead of 0.05 for variable enter). So this can be somehow misleading.
Decades ago, when SPSS, Minitab, etc. were not yet around, ANOVA is a pre-requisite to Duncan's or Tukey's and other multiple comparison tests. If ANOVA shows no significant difference, that means between the variables in any of the pairs, there is no significant difference (ANOVA has at least 3 pairs with variables to compare, unlike t-test which has only one pair) and we stop there. If the result of ANOVA says there is significant difference, we proceed to multiple comparison test like Duncan's. The multiple comparison test finds out (like t-test) if there is significant difference between the two values in each of the pairs (ANOVA has at least 3 pairs).
These days, the computer can solve, at the same time, F-test for ANOVA, and the multiple comparison tests at the same time. Making ANOVA irrelevant for purposes of multiple comparison test. Take note that ANOVA does not tell you whether all pairs (at least 3 pairs) have significant difference in them, it only tells that at least one pair has significant difference in it. To find out, based on the ANOVA result which of the pairs have significant difference in them (pair number 1, or all the pairs) we proceed to multiple comparison tests, and t-test is one of them.
Dear Eddie, Don't forget ANOVA is a pre-requisite test before multiple comparisons. The more treatments, the more ANOVA importance.
I agree Ehsan. The purpose of that is not to waste time testing differences in the so many pairs (at least 3 pairs) which would be not significant afterall. But SPSS et.al. computes ANOVA and the multiple tests, together in just 2 second. Even if the multiple comparison tests result to no significance, we wate only 2 seconds.
As to the observation of Umesh, there is nothing surprising there. It is a normal occurrence. In the ANOVA of ten pairs, a significant difference could mean significant difference in one pair but no significant difference in at most 9 pairs. Many researchers (and even statisticians) overlook this point. Ed
I just would like to comment on those who down voted my first answer. If this forum is an opinion section, I would not mind it, it is your own opinion. But his is not. This is a forum to clarify issues, to correct wrong ideas. Down voting correct, authority-based answer simply because it does not jibe with your long-held, but incorrect concept will further strengthen and multiply mistakes. I hope you review your statistics very well so that mistakes will not be multiplied to researchers who seek help in research gate.
@Eddi, time saving is not the aim of ANOVA. When there show no significant ANOVA and you have significant comparisons, specially in high number of treatments, your type I error is very very high fir LSD and DUNCAN. But be aware that all the numbers in ANOVA table talks to you. If you don't understand its language you lose some key points in analysis.
Thank you Fausto. As a matter of opinion I am up voting your answer. To be honest, throughout my stay in RG I only down voted only once, as a matter of opinion. Even if I don't know (since the answer is new) and clarificatory my tendency is to up vote because it is an effort to help. Gestures of helping must be up voted not down voted..
lat to teh show here, but when I've actually seen this happen it's been in situations where a significant post-hoc test is not a simple pairwise test but rather requires the pooling of some goups. i.e., if there were 3 groups and no significant pairwise comparisons, it could be that AB is different from C. A digression from the conversation that's gone on here, but a possibility in your data.
The main effect test in ANOVA tests the null hypothesis that all the means are equal versus at least one is different from at least one other. The type I error will be what you set it at, if all assumptions are met. A big assumptions is that the varibility in each level is the same.
Multiple comparisons (all pairwise comparisons in this discussion) will result in the overall type I error being no more than the level you indicate. Again if all assumptions are met. Again a big asumption is that the standard error is the same for each of the sample means. Many of the simpler multiple comparisons assume equal sample size. Thus if you have different varaibiltiy in the populatons or considerably different sample sizes you can get the results you observed.
There are variations on the basic Tukey's method that handles unequal variability and unequal sample sizes. The overall error rate protection becomes approximate in these cases. There are a wide range of different methods to do multiple comparisons, each with different overall error rate approximations and each handling unequal variabiltiy and unequal sample sizes differently. The resampling method in the MULTTEST procedure of SAS has the best overall set of properties.
You can get misleading results from ANOVA in various situations. One case for example is the "slippage configuration", where one mean is much different from the others, which are all close. The ANOVA may result in a statistically significant difference because of the different mean, but if this mean was removed, the p-value would be large, ie the result is not statistically significant.
Well it could because as you can see the tests answer slightly different questions and have different power. The post hoc tests focus on differences between groups they have more power to detect such differences even though the overall ANOVA indicates that the differences among the means are not statistically significant.
I suggest you have a look at Huck SW. Statistical Misconceptions: Taylor & Francis; 2008.
Regards
Dear Umesh,
I've had the same results as you. I found Ronald Bremer's answer above helpful, and this link below shows some more responses to the same question:
http://stats.stackexchange.com/questions/49093/anova-results-do-not-match-post-hoc-tukey-test-how-to-proceed
Best wishes.
I used SPSS and I used to have the same problem. I have tried different tests in the PostHoc. For equal variance assumed, I suggest you use Dunnet test in which you can have different results if you change the selection in Control category (First or Last) and sometimes in the Test (2-sided, < Control, > Control). For unequal variance assumed, you should use Tamhane's T2. That's my personal experience.
Not achieving a statistically significant result does not mean you should not report group means ± standard deviation also. However, running post hoc tests is not warranted and should not be carried out (p-value is greater than 0.05).
Recall from earlier that the ANOVA test tells you whether you have an overall difference between your groups, but it does not tell you which specific groups differed - post hoc tests do. Because post hoc tests are run to confirm where the differences occurred between groups, they should only be run when you have a shown an overall significant difference in group means (i.e., a significant one-way ANOVA result). Post-hoc tests attempt to control the experiment wise error rate (usually alpha = 0.05) in the same manner that the one-way ANOVA is used, instead of multiple t-tests. Post-hoc tests are termed a posteriori tests; that is, performed after the event (the event in this case being a study).
This is also due to the sensitivity of ANOVA, it can detect lower variability of means as compared to post-hoc tests.
If "post-hoc" means pair-wise, it is a matter of the sample sizes, the number of groups, and the difference between the sample means. Consider the case where there are 4 groups (A, B, C, D), where the mean for D is larger than the others, which are close (called the slippage configuration). The overall ANOVA may be significant due to the slippage of D. However, the pairwise comparisons of A, B, and C may not be significant. On the other hand, if there are many more groups and one has slipped, the overall ANOVA may be not significant but the pairwise comparisons for the extreme group will be significant.
Yes, but it depend on what the analyses are for,
ANOVA just tells you if there is only one significant difference between at least two samples (experimental conditions, treatments...etc) but could not preciesely tells where is the differences were. for that you need to do the post hoc, the post hoc testing means a pair wise comparisons between all the experimental conditions thus you can find a significant differences between two groups but this differences were meaningless which resulting in no real significant difference. For exemple lets say that you have three experimental groups (A,B and C) including control group (A, untreated) that you want to test their differences, with ANOVA you got sigficance, and with post hoc you got only significant difference between B and C which are two different treatments, but there were no significant difference between A Vs B nor A Vs C, that means there is no real significant difference in your testing unfortunately you got it with ANOVA.
Imprecise formulations increase the confusion about that topic. That's not very helpful.
Specifically:
"ANOVA just tells you if there is only one significant difference between at least two samples (experimental conditions, treatments...etc) but could not preciesely tells where is the differences were."
No. ANOVA compared entire (nested) models, not groups/samples. ANOVA doesn't care about differences between groups/samples. It is about increase in residual variance that can attributed to the restriction of any set of coefficients in the model. These coefficients usually comprise a complete explanatory factor, or even several such factors. It may be a source of confusion that a two-level factor is usually coded with a single coefficient in the model (estimating the expected difference in the response between the two levels), and that the restriction on this single coefficient actually represents the hypothesis of "no difference between the (two!) groups".
An ANOVA is useful to see how a set of restrictions in a model will impact the performance of the model (measured as the increase of the residual variance). The ANOVA does not analyze differences between group means at all.
Further:
"you can find a significant differences between two groups but this differences were meaningless which resulting in no real significant difference."
No. "A significant difference" is nothing one finds. You can use the p-value as an empirical measure of "significance", then significance is a "statistical signal-to-noise ratio" that can have lower or higher values. It is not an absent or present thing, it is a value estimated from the observed data, it is a number that varies in the interval (0,1). To make use of such a value you need to interpret it. If the p-value is quite small you may decide to reject the tested hypothesis and consider that you will have to work with the unrestricted model to account for all relevant structures in your data. But this is an interpretation and a personal decision. This is not in the data, it is not god-given and there is not a natural property or phenomenon that is discovered. There exists nothing like a "real significant difference".
Finally:
"but there were no significant difference between A Vs B nor A Vs C, that means there is no real significant difference in your testing"
No. Again, as you use the term "significant" means the interpretation (not the numeric value of "p"). As you used the word "significant" here the two "non-significant" findings can always be attributed to the failure to give you a "low-enough p-value". Since the p-value is a function of the sample size that decreases with sample-size for any non-zero effect size, the result only tells you* that you have not had enough data to convince you to better use a model that includes a coefficient for the A-B difference or for the A-C difference given your experimental setup and sample size. There is nothing more a p-value can tell you.
--
* the effect size in the ANOVA model is a continuous variable. The probability of such a variable having a particular constant value (like zero) is zero. Thus, it is sure that the effect is non-zero, and therefore it is only a matter of the amount of data to get a "sufficiently small p-value". So, actually, having small p-values is not the crucial point in the analysis! The crucial point is the likely size of the effect, which may be negligible. But this can only be adressed with subject knowledge, not with probability theory. Combining effect sizes and testing in a more formal way leads to the Neymanian hypothesis tests, where a balance is set between the size and the power of a test according to some given relevant effect size (for wich one actually must formulate a loss-function, what is usually not possible in research).
@Jochen Wilhelm
Dear Jochen
I could not see why you disagree or agree, only one thing that i concluded from your contribution that you did not understand my meaning, otherwise i absolutely agree with what you said, but i tried to explain it more simply to someone who could be not familiarized with those statistical analyses as mostly almost ambiguities comes from for what and or which test is most suitable for such analyses
Dear Younes,
I apprechiate that you help giving simple explanations. That's a good thing. The point I see critically is that some of these simple explanations are not only wrong but also misleading. Every day I am confronted with the down-side of similar "simple but wrong" explanations leading to bad publications, bad reviews, a waste of money, rescources and animal lives, and to a large confusion and anxienty about statistics among students (and post docs).
My points in short:
(i) you say that ANOVA tellst you something about differences in means. That's wrong. In very simple cases this may be equivalent, but generally it is not. ANOVA should not be seen as a tool to analyse differenses in means but rather to compare different (nested) models.
(ii) you say that there "is" something like significance and it is our aim to find it. That's wrong. Significance is a matter of interpretation, It is something that we attribute to observations. It is not in the observations.
(iii) you say that a "non-significant result" means that there is no difference. That's wrong. It is as wrong as concluding a difference when the result is "significant". To make any inference about the difference you would need a Bayesian approach. Significance does not answer then questions you think it would.
Dear Jochen,
again, i did not say that
look, please don't try to hear from others what you want to understand
and again i agree with what you're trying to explain (with some disagree), and i don't know why you're trying to give me lessons in statistics.
never mind what you're trying to prove, which might be acknowledged by some body. please, just bring in mind, as Einstein said "if you cannot explain it to someone who is 6 years old it means that you do not yourself understood it"
Dear Younes,
Interesting. When I ask students what they understand from your answer, they tell me just the things I claim you said. I will try to find out how differently the sentences can be understood. Maybe you can be a bit more specific and tell me where I am wrong?
Regarding your Einstein citation: we should then all stop teaching stats. The statisticians failed as they have not even managed to explain it to scientists from other fields, and those who give seemingly simple explanations (unfortunately this also includes many statisticians) that are understood by 6-year-olds are wrong. I see the problem here that these "6-year-olds" will only have to invest time to understand the topic; the topic can be understood, but not on-the-fly.
PS: I am surely not giving you lessons in statistics.
Dear Jochen,
Thanks for your time. i'll stop discussing this topic as i could not find any interests at least from my side.
again many thanks for your time
The underlying trouble with the question is that you have entrusted your analysis to ANOVA, which gives a precise answer to a vague question. Was your study hypothesis really that there's some kinda difference between the means? Because after you get a significant ANOVA, then you are faced with the problem that it tests a hypothesis that has rarely any scientific value. You might be concerned that there were differences between the means of academics marking exam papers, so a significant ANOVA would lead you to conduct some marker training, but most scientific hypotheses can be expressed as a one-df hypothesis.
Formulating your hypothesis after the fact based on post-hoc tests is completely reversing the logic of science. You can noodle around post-hoc to see if there's anything interesting looking, but it's not hypothesis testing, since you personally don't have a hypothesis. If you had, you would have written a model to test it.
Hi Umesh
I am not entirely clear whether your question arises from an apparent anomaly you have encountered with your own data or simply from curiosity. If the former, did you use a standard F test and if so, did you do some assumptions testing before using it?
Very best wishes
I had your same question!
Check here!
http://graphpad.com/support/faqid/1081/
to Marias post:
I am in fact pleasantly surprised that GraphPad clearly writes that ANOVA and pairwise comparisons of group means are not logically connected.
However, there is still a severe (to my opinion) flaw in the text: the authors repeatedly compare a "significant ANOVA result" with "significant results of the 'post test'". These two significances refer to different concepts and they are not comparable in principle; the significance of the data under the F-test is to be judged differently than the significance of the data under the t-test. These tests have a different "frame of interpretation" of their results.
Connected to this point is that Fisheras LSD is reported as a case where the significance of the ANOVA really determines the "validity" of the 'post tests'. This is again not correct. Fisher's LSD controls the family-wise error-rate (FWER) over 3 tests (mean-comparisons). This is a diferent kind of error rate than the test-wise error-rate (TWER). It has nothing to do with the validity of the TWER. These are still different things.It is only the case that the FWER is controlled at alpha when both, ANOVA and 'post test' are conducted at alpha. Apart from this: If one wants to control the FWER, then Fishers LSD works (with k=3). If the FWER is not a concern, the ANOVA is completely nonsensical, even for k=3 groups.
However, there is one example where ANOVA and 'post-test' really do the very same thing: for k=2. Only then there is a simple monotonic relation between F and t (F = t²), and the p-values are identical (thus, doing an ANOVA with k=2 this is also an application of Fisher's LSD, which is therefore strictly valid for 2 ≤ k ≤ 3. The point here is surely that the entire "family of tests" is a single test, so controlling the FWER is identical to controlling the TWER.
Thank you for your answer, Jochen. So, if I have 7 means to compare (I am interested in understanding if 7 treatments had different effects on the intensity of 10 wine sensory attributes) may I directly apply a post-hoc test (for example Duncan)? Should I also report the results of ANOVA even if "in contrast" with Duncan's test results? Is it correct to report in my paper only the results of the post-hoc test? And in materials and methods?
Thank you in advance for your help!
It finally depends on the reviewers. Technically, the ANOVA is not at all interesting in your case, so I would say it's not required to report it. What should be reported are the pooled variance estimates, the residual standard error, and the degrees of freedom used for the tests. But I know (unfortunately) that reviewers exists that will call your analysis wrong if you don't mention that you really did do an ANOVA.
As a "significant test" is only start of the interpretation (not the result or the end of an analysis!) I would also like to see the actual estimates (how big are the differences between the treatments?) together with some measure of the uncertainty acssociated with these estimates (most people like confidence intervals).Just to make sure: I am not talking about the 7 means of the 7 groups. I am talking about the mean differences between the groups.
Karl L. Wuensch. Pairwise Comparisons. Excerpt: Members of the STAT-L were recently asked: I am running a one way ANOVA, and testing significance between groups using the tukey HSD test. The ANOVA shows a statistically significant between group difference. However the tukey HSD shows no pair of groups that are different from each other . [Accessed November 23, 2009]. Available at:http://core.ecu.edu/psyc/wuenschk/StatHelp/Pairwise.htm
Steve Simon working at Children's Mercy Hospital.
Yes, this is possible, as for post-hoc tests, you have to apply a more conservative p-value, that is to say, the level of significance is not
The discussion so far is very insightful.
I have a small digression from the theme so far.
I have experienced a case where my P value was 0.056 - This traditionally is not statistically significant because the value is greater than the margin of 0.05. However, the post hoc test showed significant groupings. Can I go ahead and report the significant groupings and the P value (which is greater than 0.05)?
Thank you.
Nice discussion, I have also got this result for water quality analysis. In this case, could anyone tell how to interpret the findings where ANOVA shows significant result but the post-hoc test does not?
Conclude that your data is not sufficient to make statements about pair-wise differences.
By default, statistical software packages typically ensure that p-values in traditional post-hoc tests are adjusted to allow for significance arising from multiple comparisons. This gives rise to a tougher test. For smaller samples especially, this can lead to loss of prior statistical significance from comparing two groups.
Please I have a related situation. I used one-way ANOVA to test for effects of dietary treatments on growth parameters and two-way ANOVA to test the effects of my two additives and their interactions. The individual dietary treatments effected no significant difference on a parameter (one-way) but one of the additives had a p value less than 0.05 according to two-way ANOVA, with the interaction also not significantly different. Does it mean the significant difference effected by that additive is not vast enough to affect the individual treatments, or there is a problem with my data? I have cross checked severally. I would appreciate some insights and any resources I can use for clarification.
Gladstone, to enable people to respond to the right issue, could you please clearly and fully explain each of the one-way and two-way ANOVAs you performed.
Margaret MacDougall Thanks. My diets contained two additives a and b. I used one-way ANOVA to test the effect of the individual diets (D1 - D6), and two-way ANOVA to test the effects of additive a, b, and their interaction. In my earlier works and reviews, a significant effect of one of my additives (a or b in this case) by two-way ANOVA corresponded with significant difference in the individual dietary treatments(one-way ANOVA) but not this time. My current individual dietary treatments effected no significant difference in two parameters (with one-way ANOVA) but additive a had p-value less than 0.05 according to two-way ANOVA.
@ Tange Denis Achiri on your question I would say no it is not OK as your original ANOVA examined whether any of the means signficantly differed from the other and the answer is no (if potentially boarderline for an alpha of .05). If you then go on and run the post hoc tests then really you are double dipping after getting an answer you don't like.
I would say that the only time this is acceptable is if you had a good a priori hypothesis why the two means that significantly differed would be those that mattered. However, were that the case it's unclear why you would have been running the ANOVA in the first place.
@ Gladstone Sagada, I see no mention here of the nature of the dependent variable, including the data type, or assumptions testing that you applied before opting for specific ANOVA models. This is a good place to start before discussing anomalies in your findings. Also of relevance is your overall sample size and the groups sizes, both for each independent variable separately and for the two-way groupings represented by your two-way ANOVA. Also, how many categories are represented by each independent variable? These are some of the things you should be considering in approaching your analyses in a scientific manner. If you take time out over these, you may find that some of the insights you are requesting are forthcoming.
We are studying consumer behavior with special reference to environmental responsibility. We first tested Homogeneity of variances of various demographic groups. Then we selected one of the two tests(Welch or ANOVA) to compare the means. In one particular case the Welch test gives significant difference between the response patterns from different groups. But the multiple comparison test from Games Howell given no significance difference between any of the pairs for for alpha = 0.05 ; Isn't it a contradiction??
However, when this alpha value is adjusted to alpha = 0.04, some of the groups start showing significant differences in between them? Can we remove the above mentioned contradiction by adjusting significance level of testing i.e. alpha
No, that's not a contradiction. You can only have a contradition between two statements about the same thing. Here, you are contrasting two different things (an ANOVA model considering a predictor with k>2 levels on all the data and a simpler model considering a predictor with k=2 levels on a subset of the whole data - and additionally the ANOVA assumed equal variances and the Welch test does not).
If you don't want such things to happen, just use alpha = 0 (nothing will be significant) or alpha = 1 (everything will be significant).
@ Lakhbir Singh Rishi, I am going to restate your query in simple terms.
*You have performed Welch's ANOVA and arrived at statistical significance.
*You have then progressed to pairwise comparisons using the Games-Howell post-hoc test.
*You are surprised to have found that there are no significant results based on the post-hoc tests.
*Your expectation was that if you found statistical significance using Welch's ANOVA, you should have obtained statistical siginificance for at least one of the applications of the post-hoc test.
*Your expectation has been contradicted by your results and you would like an explanation.
Here is my explanation. Please refer back to my answer of 7 February, above. This answer applies to your query, as the Games-Howell post-hoc test involves a correction for multiple comparisons (that makes the test more conservative). You should not be adjusting your level of significance to accomodate the contradiction of your expectation. You should recognize that if there is a true difference between any pair of groups, you will need a larger study to provide statistical evidence for this. Of course, the reliability of your ANOVA result will depend on how well your data represent the parent population.
For Kind Consideration, I am attaching Descriptive Statistic, Levene's, Welch and Games Howell Multiple comparison Tables. please see Mam Margaret MacDougall and Sir Jochen Wilhelm
With Regards
read and this: https://www.graphpad.com/support/faqid/1081/
When comparing normal healthy, benign and malignant groups, significance was observed when comparing any of two groups using Mann whitey, , With ANOVA, significance was seen but post hoc multiple comparison (Dunn's or Tukey;s), significance was not observed between one of the pair. In this context, is multiple comparisons or post hoc test essential for these type of studies? If so, how to interpret or report the results?
I also came across the same issue i.e. the p-value for the ANOVA test was 0.02, but p> 0.05 values were obtained for the post hoc (Tukey) tests. I think this could happen due to the rounding of numbers during calculation that may bring variation if the values are around the cut points, 0.05, for example. As a solution, it is better to observe the individual p-value of the pairs, and if it is around 0.05 + < 0.04, we may consider it as significant.
I got significant results after ANOVA test, but when I applied post hoc Tukey's test, I obtained non significant results. same question from my side . but should do ?
Yes, it can happened. In this situation, we should consider other post hoc test.
Most common test is LSD test. Or you may use Bonferroni test for pair wise comparison
Waqas Latif Sir I have the same question. I have a large data set I want to check the activity in different hours during different months at different sites. I applied glmm (anova summary) shows significant difference. so I applied a Tukey post hoc test it shows significant difference for some months but for some I know is lower than others but it shows significant difference with lower mean number but non-significant with higher mean numbers. Is it normal in such calculations. Please help me understand. For example; in general march has low numbers and april and May has high compared to march. June has the highest among all but june is non-significant with april and may but significant with march. So I'm confused is it normal to get such results for Tukey's post hoc.
Waqas Latif
Hello, everybody. I have a question, what do you do if the opposite happens? In the ANOVA you get a p value>0.05 (for example p value equal to 0.1390) but when performing the Tukey test there are significant differences between the treatment levels? Does anyone have information about this?
Kind of strange that these two tests are always considered together or in parallel. They are (in almost all cases) about different hypotheses. If you test different hypotheses you should not have any problem with getting different results. I think this is the key problem: that many people seem to think that these test would somehow belong together and test something similar. The (very usually) do not.
If the aim is to select comparisons while keeping the FWER, then Tukey's procedure does this job.
If the aim is to compare the full model to a model restricted to have all zero coefficients, then ANOVA does the job (as this is only a single test, there is no FWER involved or the FWER is the same as the TWER).
Thank you for the quick reply, Jochen Wilhelm . I want to explain a little why I am asking this question. I am working with the use of drones in forest plantation monitoring. And one aim of the project is to evaluate two types of software (Pix4D and WebODM) and four different processing options. To evaluate these factors, I selected processing time, plantation area in the orthomosaic, and processing failure as response variables. I am then interested to see if there are differences between the two types of software and the four processing options.
For this, I performed a block design with a factorial arrangement, where the blocks are the plantations where the drone flights were performed, and the factors are the two types of software and the four processing options. It is important to mention that I consider this an unbalanced factorial design because not all combinations of treatments are present in each block.
So, first, normality assumptions were made, and the data were non-normal. Then I wanted to do a Friedman test because we are working with blocks, but it was not possible to do because it provides an error as it is an unbalanced factorial. Therefore, a Kruskal-Wallis test was performed without considering blocks, but this got the same results as the parametric test (Anova + Tukey test), so we considered keeping the results of the parametric test because they are richer in information.
Is it possible that getting a p-value greater than 0.05 in the ANOVA (no significant differences) and significant differences in the Tukey test is because the data are not normal (assumptions of normality and homogeneity of variances) or that it is an unbalanced factorial?
Pd: Sorry if I made a mistake in the previous paragraphs, but English is not my native language.
Not the data should be normal but the residuals (you assume that the response has a normal distribution with a mean conditional on the predictors).
Is it possible that we set the p-value for ANOVA to 10% and post hoc to 5%? What are the implications?
Yes, it is possible. I just encountered some points on this issue.
[Pairwise Comparisons - Karl L. Wuensch](https://core.ecu.edu/wuenschk/StatHelp/Pairwise.htm) [When the F test is significant, but Tukey is not](http://www.pmean.com/05/TukeyTest.html).
Yes, it is possible to get non-significant results in post hoc tests when we got significant results in ANOVA. According to , post-hoc tests in ANOVA are performed when there is not a significant interaction effect, but there are significant main effects. In this case, the post-hoc is performed on the significant main effects. If the overall p-value of the ANOVA is not statistically significant, then you will not conduct post-hoc multiple comparisons between groups. This means you obviously don’t have to report any post-hoc results in the final report