If an effect is adjudged to be significant it makes sense to determine its size. But if adjudged as non-significant it doesn't make sense to determine its size because its size is essentially within the zero range.
I think the problem lies in the confusion of testing and estimation.
Testing only demonstrates that the data provides some recognizeable amount of information on some aspect of a statistical model (usually parameter(s) of a functional model). Failing to reach significance means that an interpretation of the parameters(s) is not justified (because there is not enough information in the data), and significance only means that the data allows at least a minimalistic interpretation (e.g. about the sign/direction of an effect, but not about its size).
Estimation is a diffrenet beast. There is no frequentist justification of estimation, because an estimate expresses what we belief about the value of the parameter. In principle, we'd need a fully fledged Bayesian analysis to to give an estimate that really is about the parameter value (and not about the data, given some parameter value). For large samples there won't be any relevant difference between the maximum likelihood estimates (MLEs) and the Bayesian posterior estimates (BPEs) , the impact of the prior belief is negligible. For smaller samples, the MLEs and BPEs are very similar for "flat" priors that usually express a large degree of (prior) ignorance (more-or-less "uninformed" priors). Thus, simple MLEs and frequentist confidence intervals are often taken as an easy-to-calculate and resonable approximation for estimates (circumventing the Bayesian detour)*.
An estimation must be reasonably precise to be helpful. Ideally, the complete poterior distribution is interpreted, but it may be simplified looking at highes posterior density intervals. As an approximation, confidence intervals can be used. However, one should interpret the entire interval: Does it allow to discriminate between relevant and irrelevant values? Are the values reasonable? Does it contain values (at either end) that would leadt to substantially different (possibly contradictory) interpretations?
---
*two notes on that:
(i) MLEs are identical to least-squares estimates in the normal probability model, and
(ii) MLEs and confidence intervals are about the data, not about the parameter! This is often confused or just ignored.
Not only is it reasonable, I'd also want to see the confidence interval, credible interval, or some other estimate of uncertainty
That's very important to estimating whether the finding was "not significant" because there was nothing there or because the study was underpowered.
Even for a "significant" finding, I would still want to see confidence intervals so I could determine how likely it was that the finding was really large enough to be relevant.
I have the same problem in my research too. I do not know how I interpret these values. I think , it is related to the sample size and data . For small sample size do often occur this problem.
Effect sizes should always be reported, as they allow a greater understanding of the data regardless of the sample size and also allow the results to be used in any future meta analyses.
The effect size is completely separate to the p value and should be reported and interpreted as such.
Effect size = clinical significance = much more important than statistical significance.
So yes, it should always be reported, even when p >0.05 because a high p-value may simply be due to small sample size. And anyone who ever conducted a non-industry sponsored clinical study knows how hard it is to recruit enough subjects.
But this is exactly the point: a large p-value tells us that the data is insufficient to even interpret the sign of the estimated effect. If we really give the interval estimate (e.g. a credible interval - but who does this?), there is some considerable credibility of positive as well as of negative effect sizes... how would one work with that (other than saying that one might need more data to get a definite interpretation)? A valuable opportunity is certainly that looking at the interval would allow to see how credible relevant effects (in the desired and/or undesired direction) are in light of our current state of knowledge. I think the "usual" way to take the point estimate (usually the MLE/LSE, rather than a posterior mode or mean) as a "typical" or "most probable" effect is extremely dangerous particularily in small samples. Having a small p-value still is not (!) sufficient for such a conclusion (it is sufficient only to conclude the general direction, that is the sign but not the not the size of the effect). This seems to be misunderstood or even ignored quite often.
To summarize: if effect size is important, the entire frequentist analysis is not on the point anyway, and knowing the p-value is quite irrelevant. We would need to know how credible possible effects are, and we need to interpret the entire range of possible effects with reasonably high credability. A low p-value may only indicate that going further might not be very helpful, as the data we have is still too likely under opposite signs of possible effects.
How effect size is calculated is important to the question. The usual (Cohen's D) effect size is obtained by subtracting the means and dividing by the uncertainty. This is informative if quality of the difference in means is recognized. The quality is not the usual ranking from weak to strong. The quality is the relative error in the difference, the inverse of the effect size. An effect size of 1 means the difference equals the uncertainty. An effect size of 1 is 100% uncertain. A difference with a p-value of exactly 0.05 is 60% uncertain. A value that is 60% uncertain cannot be considered quantified. As Jochen Wilhelm said, you might know the direction of the effect, but you do not know the value. Any p-value >0.05 is worse. A value that is significant has no value. Values that do not reach significance are worthless and should not be reported.
The reporting of effect sizes is likely worse in many cases. Significance is obtained by using the standard error, instead of the standard deviation. The uncertainty of the difference based on the standard error is much less than the fully propagated uncertainty using the standard deviation.
The standard error describes something completely different from the standard deviation. One cannot replace the other. People showing the SE in their graphs normally do this for cosmetic reaspons, as the smaller error bars make their data look better.
The almost holy "0.05" is completely arbitrary. There is absolutely no rational reason why it should not be 0.1 or 0.01. As to the alleged worthlessness of insignificant results, I refer to my favorite paper, "Scientists rise up against statistical significance":
it means that it is not significant statistically due to the small sample size. then I report an effect size to compare which variable has a better effect on the result. is this true?