It is impossible to know that without knowing what the error bars represent (what's "e.s.e.", something standard error?) and knowing the research design (i.e., whether the things being compared are repeated measures factors or not).
It is likely, though, that these error bars almost certainly do not have the interpretation you suggested. To have the interpretation you suggested, they would have to be difference-adjusted confidence intervals (which would not be plus-minus one standard error; they would be plus-minus something more like 2 standard errors times the square root of 2) and the comparison would have to not be on a repeated-measures factor.
To learn about error bars, see (among others) papers such as the following:
The graph looks odd. You have clone 117, 667, and 777. However, these appear to be plotted as random variables rather than having clone as a fixed effect. One solution is to put a "C" as part of the clone name ( c117 ).
With the right model, use least squared means and standard errors. An LSD or Tukey's test should work to identify significant differences in means.
I too do not know what e.s.e. stands for. This seems like a simple experimental design.
Alberto Maydeu-Olivares , Ruby Metzner : error bars can overlap substantially even when the difference between conditions is highly significant, for many reasons. One way this can happen is if there are repeated measures across the conditions you are comparing. See examples in the following papers:
Article On visualizing phonetic data from repeated measures experime...
Another way is that, even for independent samples comparisons (without repeated measure), if the difference between groups is just about p=.05 then the two groups' confidence intervals (if the groups have similar N and SD) will overlap by about 50% of the average margin of error. It follows, then, that error bars can overlap by smaller amounts than that even when a difference is significant. See the Cumming & Finch paper I posted in a previous reply.
A third way is that error bars may be showing something other than a confidence interval. For example, if the error bar is showing standard deviation (instead of standard error), the bars will be much larger and thus are likely to overlap even if the difference between groups is significant.
Here, for instance, is a key figure from one of the papers I recommended above. The error bars (traditional 95% confidence intervals) not only substantially overlap, but also contain each other condition's mean. Still, the difference between conditions is very significant at the traditional .05 alpha level. And in some fields, this kind of data is the norm.
The results may be unduly influenced by the bottom most and topmost pairs of observations. If these are removed then is the p-value significant? I am not saying that they are outliers but they do look very influential. I suspect the data departs from normality in the measure to kurtosis and skew.
That won't change anything. The issue is that with paired data, confidence intervals of each individual condition are completely irrelevant to the difference between conditions (what's relevant there is the confidence interval of the paired differences). Again, see the papers I've linked above. Or see this classic paper:
https://web.uvic.ca/psyc/masson/LM94.pdf
(And for what it's worth, the data in my post above don't depart from normality and don't have outliers; they're not real data, they're fake data sampled from a normal distribution. That's not the point. The point is that for certain research designs, confidence intervals around the individual conditions tell you nothing about the differences between conditions.)
Thanks, Stephen Politzer-Ahles , for those interesting references. While many people simply dismiss error bars, the real problem, which you tackle in your paper, is how to plot estimates of uncertainty that can be interpreted correctly by eye more or less all the time.
Quite right Stephen. Looking at your graph, the proper graph would be a single column of the differences between the pairs. It is those differences that a paired t-test uses. The individual points are irrelevant.
So the fact that Gemma's plot has a very similar values for two of the points is inconsequential. It is the differences among the three point that matters.
Excellent posts by Stephen. It is exactly that point that you have to be very clear and sure what the error bars or confidence intervals represent, what you need to test your hypotheses and if these two things match. Confidence intervals for individual conditions are somewhat pointless for the comparison of conditionsin a repeated measures design, since they do not answer the hypothesis and can be very misleading in that context. Only looking for overlapping bars without knowing what they represent is as pointless and shallow as looking for the "magic" p
I have a similar issue. I used a 'one-way repeated ANOVA' (using statistica). While the P value showed no statistical difference through the experimental period, and the Standard Error bars are overlapping, further post-hoc test revealed statistical significant different. Comments from a journal's editor was that based on the overlapping of SE, there shouldn’t be significant differences between treatments (biocontrol vs. exclusion). But like I shown in the graph attached, LSD test reveals significant differences from day 54 – 58. How best should I deal with this comment by the editor? Thanks
I don't know why you're doing post-hoc tests on individual days if you didn't get an main effect or interaction in the omnibus ANOVA. In any case, if you're running this many tests, you need to correct for multiple comparisons. Also, if the error bars in your graph are showing 95% confidence intervals and these data are coming from independent groups with similar sample sizes and variance, then the way the CIs include the other condition's means does suggest the groups shouldn't be significantly different; therefore, these results suggest that there's something wrong with either your error bars or your p-values, so you should investigate where the mistake might be.
I have several issues with the ANOVA approach: 1) you seem to have count data with a zero bound, therefore generalized linear model for count data seems more appropriate (Poisson or negative binomial model), 2) you seem to be interested in the different trajectories and that there are general group differences. Again, you should consider a generalized linear model.
As Stephen Politzer-Ahles already pointed out: even if ANOVA would be appropriate, why testing pairwise, if the omnibus test did not show an efffect. On the other side: LSD mean no correction for multiple comparisons at all and therefore I would expect to find signficant differences with the probability 1-.95^c if you used 95% and c the amount of comparisons.
Rainer Duesing I would like to add something on multiple comparisons.
Unfortunately there are so many ways to correct pvalues that one method to be cherrypicked so that it shows what we want (i.e. try them all then show one specific correction).
I think it should be useful for people to start interpreting CI's as summaries on (confidence ) distributions, slowly detaching ourselves from the idea of type I/II errors, because those concepts are not very useful in the analysis of real data.
CI's carry all the problems P-values do, with the additional burden that you can technically accept the null (in a Neyman sense).
· Does the overlap of error bars on this 2-way ANOVA profile plot mean that these means are not significantly different?
had no sound answer by any of the attendant people and by any recommender.
The overlap of “error bars” does not provide very much information…
1. IF your intervals were Confidence Intervals (CI) with a stated Confidence Level (CL) that you could decide about the “”Statistical Significance”” of the means in the various (9) “design states”.
2. The way to get it is ANOVA, both for the 2 factors and their interaction
3. A mean in a “design state” of one factor is “”Statistical Different”” from the mean of the other factor if it is “DOES NOT belong to the CI” of the other factor, EVEN THOUGH the two CIs overlap
It is rather odd that NOBODY realized that your data came from a 2-way complete factorial design with both factors at 3 levels and that your PLOT was the interaction plot.
There is a SIGNIFICANT Interaction…
Therefore IF you want to “maximize” your yield you have to choose the suitable level of the factors; the same you have to do IF you want to “minimize” your yield; IF you want to “get a chosen stated value” of your yield, you have to act accordingly....
If you send me your data I can show you how many things you can find with ANOVA…