This isn't surprising. The variance is harder to estimate than the mean; it is less stable from study to study. In addition the distribution of variances is skewed - particularly in small samples. It is very common for a small sample to underestimate the variance (a common problem in using a pilot study to estimate the variance for a larger study).
Hello Mauro...You may need to clarify your question a little.
Sample size calcs set an alpha and a beta value and does a calculation to assess sample size. This calculation will depend on the type of analysis you are doing. The alpha is not necessarily 0.05.
If you are getting p-value alpha then you should interpret the results as not enabling you to reject the null hypothesis (ie any p value>alpha means you are unable to reject the null hypothesis).
Note - there is noting magic about the term "statistical significance" or the value 0.05. They are historical anomalies. I do not allow my students to use the term "statistical significance" because I believe it is misleading. There is plenty of literature about this which you may like to explore.
Strictly speaking, there is no such thing as "statistical significance." The statistical methods result in a statement of probability. There is a 0.0000005 chance that you will observe a result as extreme or more extreme than the observed outcome given that the null hypothesis is true. That is the sort of thing that you get out of statistics. Statistical significance is a short hand for taking this statistical statement and then concluding that with this result you will choose to reject the null hypothesis. This last part is purely human, and entirely your choice (with consideration of whatever you can get past reviewers and editors. Rightfully, this will be a difficult task if you choose to argue that we should relax the standard alpha=0.05 criteria for deciding significance.
As noted above, "significance" is a misnomer. Sample size needs are determined by the "effect size" of interest, which is often better interpreted through a confidence interval.
The confusion, which many people have, resulted in the following letter I published in The American Statistician:
Sample size needs, whether determined by a type II error/power analysis for a hypothesis test, or - often much better I'd say - from the need to attain a given estimated standard error or lower, perhaps for a mean, are largely dependent upon the population standard deviation. Hypothesis tests and confidence intervals are impacted by sample size. Standard errors are lowered by larger sample size. But the standard deviation of a population (or subpopulation/stratum) is fixed, though estimating it may be problematic. (Sampling textbooks may help you. In Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons, for example, he mentions a few ways to estimate/guess population standard deviation, including a pilot study.)
So, regarding your question, in general, if you have a "small" p-value, with a "small" sample size, then the effect you are examining would presumably be rather impressive. - Is that what you are saying? Did you obtain a small p-value without using as large of a sample as you thought you would need? If so, how had you decided your sample size requirement? (Beware of online sample size "calculators" which are often only for proportions for "yes/no" data, in a simple random sample, for the worst case of p=q=0.5, without a finite population correction factor, and can even sometimes suggest a sample size larger than the population size. They are often irrelevant.)
Cheers - Jim
PS - In general, 0.05 should not be used as a threshold. (If you look at this from the point-of-view of a confidence interval, this may be more clear to you.) -- For example, in the relatively new area of "big data," using 0.05 would generally not work out well at all.
Article Practical Interpretation of Hypothesis Tests - letter to the...
There is of course an issue with the point of using the concept of statistical significance, but that is, I guess, not the issue here. How come the statistical test turn out significant in spite of the estimated effect being inferior to the assumption used in doing the Power analysis in deciding the sample size? Well, the only explanation I can see is that the variance assumption for the Power analysis was pessimistic (as noted by Stephen). Hence, in terms of statistical precision obtained from the study the analyst had the unfortunate weak effect counteracted by the luck of a smaller variance. In designing experiments/studies one is sometimes lucky and sometimes unlucky - its a part of research.
This isn't surprising. The variance is harder to estimate than the mean; it is less stable from study to study. In addition the distribution of variances is skewed - particularly in small samples. It is very common for a small sample to underestimate the variance (a common problem in using a pilot study to estimate the variance for a larger study).
Rama: I don't think I agree. Collecting a larger sample is not intrinsically likely to increase the magnitude of the effect. In fact, there is lots of evidence that small samples tend to overestimate effect size, which means that a larger sample is likely to get a smaller effect than what would have been predicted based on previous studies with [p
The magnitude of the effect is intrinsic to your experiment. Increasing sample size increases the probability that you will be able to detect the effect. Small sample sizes improve the chance that if someone were to repeat your experiment they would get a contrary answer.
Again, two things are mixed that don't go together.
If the sample size was calculated, then with respect to two distinct alternatives and a loss function (-> "hypothesis tests" [Neyman/Pearson]). The experiment is done to decide between the two alternatives. It is not very helpful to talk about a null hypothesis and significance here, because this is a designation of significance tests [Fisher].
Here it is NOT the question whether or not a null hypothesis is rejected. Here it is the question if hypothesis A is accepted or if hypothesis B is accepted. The math behind allows us to do the same calculations like in significance testing and make the decision by comparing the calculated (empirical, observed) p-value to the given values of alpha. If p