If we conducted a control study without determining the sample size and power of study, is it possible to calculate the power of study at the end (after data collection is completed)?
No! If you want to look post-hoc, look at the confidence interval instead.
Why would you look at power for a study you have completed? Arguably you would do it because you wanted to know whether or not you could trust a negative result.
The argument would go something like this "I didn't get a statistically significant result, but then for an effect size of x my power was only 50% so this doesn't really tell me very much."
But if you look at the confidence interval you will see the range of values that are consistent with your data, and if this includes an important effect size, then you know that your study was uninformative. Confidence intervals are almost always more informative than significance tests.
Of course, for a non-significant result, if you calculate power using the effect size seen in your study you are bound to get low power. You then have a beautifully circular argument for resurrecting your hypothesis and concluding that your experiment just wasn't big enough. So never do that.
If you are doing a genuinely post-hoc analysis - that is trying to use power analysis to make sense of the results of a study you have completed, not to lan the next study, then the basis rules are:
1. Don't do post-hoc power analysis;
2. If you really must do post-hoc power analysis, don't do it yet;
3. If you are forced to do it now and can no longer delay, make sure that you never use the effect size observed in your results.
Even a priori, power analyses are based on a ehole load of assumptions about the nature of the response, the variances and the effect size. Always remember to look at power under a range of scenarios, and remember that we tend to be over opti istic about both effect sizes and variances!
Yes, you can, but it is not that meaningful and not a good practice. Because if result is significant, power is not of interest. If result is not significant, power is asked by reviewer sometimes, but it can just tell that given that sample size, power is not enough. It will be a explanation of why result is not significant.
I must disagree. Calculating the power of a study retrospectively is a useful tool when evaluating published findings. Further, it can be informative when planning a new study. The advantage here is that you have actual variance estimates from the study population of interest. Given that, for example, you can set the significance levels, power, and Ns to various levels, and then solve for effect size. This is an invaluable tool when planning a new study.
Confusing retrospective power and prospective power.
Power as defined above for a hypothesis test is also called prospective or a priori power. It is a conditional probability, P(reject H0 | Ha), calculated without using the data to be analyzed. (In fact, it is best calculated before even gathering the data, and taken into account in the data-gathering plan.)
Retrospective power is calculated after the data have been collected, using the data.
Depending on how retrospective power is calculated, it might be legitimate to use to estimate the power and sample size for a future study, but cannot legitimately be used as describing the power of the study from which it is calculated.
However, some methods of calculating retrospective power calculate the power to detect the effect observed in the data -- which misses the whole point of considering practical significance. These methods typically yield simply a transformation of p-value. See Lenth, Russell V. (2000) Two Sample-Size Practices that I Don't Recommend for more detail.
See J. M. Hoenig and D. M. Heisey (2001) "The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis," The American Statistician 55(1), 19-24 and the Stat Help Page "Retrospective (Observed) Power Analysis" for more discussion and further references.
I always recommend calculating the effect size (ES) to report study results. ES is directly related to the power of the study. Something can be statistically significant but practically unimportant. ES is based on the square of the standardized regression coefficient, the effect size is extremely helpful. If you are using SAS, request the STB option.
Since we have only n=6 so far, here's my contribution to the sample size.
As Fisher observed, calling in the statistician after the study is over may be no better than asking for a post-mortem to understand what the study died of.
Post-hoc power analysis on an under-powered study can let you plan for a new, better-designed study.
To determine whether the study were underpowered, I would not do a post-hoc power analysis. I would look at the confidence interval of the non-significant result. A broad confidence interval which contains values that could be meaningful indicates an underpowered study. A narrow confidence interval where the extremes are so close to zero as to have little visible impact tells me that the study was not underpowered, it just looked for something that was not there.
If you're looking at power post hoc, you may want to look at the entire body of evidence for a given comparison of interventions. That would typically be in the context of a meta-analysis. You can estimate the optimal information size (computationally, or by nomogram; by number of events or by relative effect measure e.g. RR) when examining pooled effect estimates. This will tell you if your meta-analysis is adequately powered to exclude a significant treatment effect when none was observed. Otherwise, I wouldn't make too much of a single under-powered study.
Post hoc power analysis assumes the observed effect to be the true one, which is probably not the case. You might want to see (if you haven't, already): Hoenig & Heisey (2001). The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis. The American Statistician, 55 (1).
"Like trying to convince someone that buying a lottery ticket was foolish (the before-experiment perspective) after they hit a lottery jackpot (the after-experiment perspective)."
Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results Ann Intern Med 1994;121:200-206
Hi, I was taught that you decide what effect size is reasonable (i.e. what effect size is deemed significant), estimate the variance, decide alpha and beta, then see what N is required. Then you know how to construct your study, and also not gather more samples than are necessary. Thus post-hoc power analysis is pointless for that study, but may assist in designing a follow-up study, or for conducting meta-analysis of related studies.
There is a software, "G-power" which I think is is free to download. You can put the info of your research. (N, alpha, Beta, type of test, etc.) for calculating your power (real beta). (retrospectively)
Also you can calculate a reasonable Sample size to achieve your desired Power before starting a study (prospectively).
take a look at its graphs to, there are full of information.
It is always useful to know if the power was adequate or not, especially if we can apply the learning to our future analysis. Yes, we would have preferred to know to power earlier to contain our Type II Error (Risk to Customer), however, at least as a postmortem analysis, it could help. Good luck. If you have data, please send over and I will help on how it can be done.
It is not just for researchers but also for critical decision makers in positions where risk to producer and/or customer is involved. The more serious the product is, the higher is the level of knowledge needed across the organization.
I have to say, first, that calculating power is also a tricky thing when you do it a priori. In this case, you use a expected prevalence or effect size, but... what happens if prevalence is different in the end? For instance, you design a study assuming that the number of people who will die of X will be of 10% in the first year if you use a specific drug and 20% if you use placebo. But in the end only 3% of people have died using that drug and 5% using placebo. It's a nice thing but your power calculation was not useful because you just use "expected" values that may happen or not. Or imagine that you want to know if diabetes prevalence is lower in one city when compared to another and you just use estimated prevalence for a small survey for calculating power. Again, you may miss the "real prevalence". Of course, if you knew the real prevalence you wouldn't need to do that study... but I just want to point out that a priori power calculating is many times just a "theoretical design". You have to do it because you need a starting point, but it is a very solid base and may fail. In this case, it could be useful to calculate "retrospective power", as a way to explain why you did find a non-significant result in the end and also as a useful tool for future studies.
One approach I've used with some success when attempting to delicately assert the null in the case of a continuous measure is to use the standard deviation(s) observed and sample size used to calculate the function describing power as a function of a range of theoretical population mean differences. The smallest mean for which this function exceeds, say 95%, could be defended as have been effectively ruled out. In essence the standard deviation from your data is a superior or equivalent substitute for what you might have used in an a priori power analysis...and you're appropriately ignoring the observed mean difference.
A point that's important to highlight in general: In the case of a priori power analysis, one should anticipate their observed data is unlikely to be the same as that which was used in the power analysis. This isn't pointing to error or a mistake. The principle is based on the assumption that any given experiment/study is merely a sample from a population of studies who's density functions can be estimated...most of which are not expected to have the actual population means. These density functions are then used to estimate the proportion of samples drawn from, say the noncentral t distribution might be anticipated to fall below the critical t value from a central t-distribution with the same degrees of freedom.
No one study/experiment can hope to know if by virtue of being different from the estimates used in the power analysis, an error in that estimate has been uncovered...since the knowledge of that would exclude the rational from studying it in the first place...that is, if we knew the population parameters we wouldn't be sampling with our one little experiment in the first place.
No! If you want to look post-hoc, look at the confidence interval instead.
Why would you look at power for a study you have completed? Arguably you would do it because you wanted to know whether or not you could trust a negative result.
The argument would go something like this "I didn't get a statistically significant result, but then for an effect size of x my power was only 50% so this doesn't really tell me very much."
But if you look at the confidence interval you will see the range of values that are consistent with your data, and if this includes an important effect size, then you know that your study was uninformative. Confidence intervals are almost always more informative than significance tests.
Of course, for a non-significant result, if you calculate power using the effect size seen in your study you are bound to get low power. You then have a beautifully circular argument for resurrecting your hypothesis and concluding that your experiment just wasn't big enough. So never do that.
If you are doing a genuinely post-hoc analysis - that is trying to use power analysis to make sense of the results of a study you have completed, not to lan the next study, then the basis rules are:
1. Don't do post-hoc power analysis;
2. If you really must do post-hoc power analysis, don't do it yet;
3. If you are forced to do it now and can no longer delay, make sure that you never use the effect size observed in your results.
Even a priori, power analyses are based on a ehole load of assumptions about the nature of the response, the variances and the effect size. Always remember to look at power under a range of scenarios, and remember that we tend to be over opti istic about both effect sizes and variances!
Power analysis is essential and certainly required in grant applications and other proposals. The problem is that it's very speculative. If a "non-significant" finding is the result, it's good to be able to say that a moderate effect size would have been detectable given the sample size., but it wasn't. One should not, however, as many here suggest, try to determine how many cases would have been required to find an effect of the size observed. sincce there will always be an answer. With enough cases, any effect can be significant.
Anyone looking to understand why this might be done should read design sensitivity by Livsey. Also, many of us do not have the luxury of designing an experiment with the N needed for the best stat analysis, does this mean the experiment shouldn't be done?
My answer is also conformed with Mr. Fubing Tang. If you have calculated the sample size before data collection, the power of the study is 1-type ii error. Some times we are not at all doing the calculation of sample size before data collection. It is possible to calculate the power by substituting the n (sample size) and Type i error (5%) values in the formula and calculate type II error. By this way retrospectively also the power of the test may be calculated.
I believe that you SHOULD calculate power of statistics retrospectively (as stated correctly by Miguel Marcos) in the following cases:
1. There are no prevalence data to calculate the power a priori
2. You can't really estimate the sample size a priori due to no prevalence data
3. If you want to prove that the significance level of your statistics seen in your study is in fact of a valid significance
4. Since there is no way of calculating the effect size a priori (but merely keying in an assumptive value), it would be better to calculate the power/effect size at the end of the study and document it as such.
Mervyn Thomas is providing the best advice in this thread: Do not perform post-hoc power analyses. The manner in which many of the other commenters are suggest that it be used is easy to compute but highly unreliable.
For reasons nicely articulated by Gelman & Carlin (2014), not only are effect size estimates from small studies highly volatile, but statistically significant studies with small samples tend to dramatically inflate apparent effect size. If your study was underpowered to begin with, then the effect size estimate following a significant result will necessarily be inflated, sometimes dramatically so (a "Type M" error, or error of magnitude). Furthermore, if the true effect size is especially small (e.g. if it's effectively zero) then an alarming share of significant results will report the wrong direction of the effect, in addition to exaggerating its size (a "Type S" error, or error of sign). These phenomena are further exacerbated by the unavoidable incentive to conceal null results and elevate significant ones: The stronger the publishing bias, the more misleading a post-hoc power analysis will be.
Do not confuse a statistical procedure that is easy with one that is useful or informative. Trust the statisticians on this one.
Although it is not ideal, you certainly can. Also, whether a priori or post-hoc, it's not always easy to do them right so I would suggest getting someone on board who knows what they are doing. All that said, a post-hoc power analysis can indicate whether you had the power to find your observed effect size(s), which is especially useful if you are running a pilot study.
Carl, Hoenig and Heisey (J. M. Hoenig and D. M. Heisey. The abuse of power. The American Statistician, 55(1):19–24, 2001.) term retrospective power analysis for data analysis “An Abuse of Power”. The problem is that, whenever a test is not significant, retrospective power at the observed effect sizes must allways be low, and whenever a test is significant retrospective power must always be high. The Hoening and Heisey paper has over 450 current citations: not because it is original but because it provides a clear and well written account of a problem which applied statisticians encounter very frequently. It very neatly expresses the dominant understanding of power analysis in the statistics community.
A much more rigorous account of the folly of retrospective power analysis can be found in Hacking’s magisterial work on the logic of Statistical Inference [I. Hacking. Logic of Statistical Inference. Paperback re-issue. Cambridge University Press, 1965., pages 95-102] in which he demonstrates that Neyman Pearson inference provides a “before trials” rather than an “after trials” decision rule. In that context, power is a property of the decision rule which is set up at design, before examining the data and has no role in data interpretation. Indeed from a strict Neyman Pearson perspective, data interpretation is an entirely deterministic matter of applying a decision rule which is determined a priori. Neyman and Pearson themselves write [J. Neyman and E. S. Pearson. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 231(694-706):289–337, Jan 1933.]:
We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis.
The widely accepted alternative to retrospective power analysis is the consideration of confidence intervals. Unfortunately confidence intervals must also be recognised as before-trials intervals (as Hacking [ page 159-160] points out). But abusing the concept of a confidence interval in this way has fewer consequences than the abuse of retrospective power analysis From my somewhat partisan perspective, that is because a confidence interval is usually pretty similar to a Bayesian credible interval - which has exactly the sort of a posteriori meaning you are looking for.
By all means calculate power for your next study based on the effect sizes and variances seen in your current study (but do so with caution). But never attempt to use power to interpret the results of a study you have already undertaken.
No reputable statistician will tell you that you should calculate retrospective power.
As Greg points out, you can certainly complete the mechanics of calculation, it is only when you try to use the computed [power that you come to grief.
This is an excellent discussion on power analysis. In my previous studies, I have divided my studies into two parts where prevalence information is not available. And then I used it for the subsequent study, which I think is the best thing one could do. Thanks a lot guys.
you may need to calcualte how much you need in the sample. If the power calculation shows that you have the minimum number, then it is ok to go for the retrospective data.
To do power analysis to estimate your sample size, you have to write your hypothesis, and based on that you decide what statistical test you will use. It should be one of the inferential statistics. so you need to determine the following: alpha {standard to be .05}, power [standard to be .80], effect size {small, moderate, or large, each test has its own value, you can find these values in the net}. Then download free programs to calculate the sample size such as G. power.
At that moment I have this problem. I submitted an article where there is no sample size calculation, because it is taken from a larger survey to which had a sample calculation, but not for this topic.
Many of you advised to note the confidence interval, but how to make this observation to linear regression results? For studies with association measures as Relative Risck or OR, I can see a wide confidence interval or narrow, but how about linear regression results?
Statistical Minimum Sample Size = fn(Alpha Risk, Beta Risk, Minimum Difference In Central Tendency I Wish to Detect, Existing or Expected Standard Deviation and Power). So we have SMSS = fn(5 Variables). When you input these parameters into the function (say using Minitab), the output gives you the SMSS needed as well as the power value. One can also input the power value with a known SMSS value.
Is this what you were looking for? Or is there something else?
If post-hoc power analysis is based on the observed effect size, it will not be comparable to the a priori power analysis (before conducting the experiment) in which the effect size is expected. In this case post-hoc power could, in fact, be calculated from P value, and thus it would be not only useless but misleading.
After you have the results and you find non-significant differences between groups, it might be more relevant to ask which would be, at the same P-value threshold, the minimum detectable difference based on the available sample and observed variance. If the minimum detectable difference is way too high for the aims of your study (e.g., an environmental risk assessment), then the only conclusion is that more research is needed, and next time you should have a better study design (e.g., larger number of sampling or experimental units, and/or more precise measurements).
As others have stated, no you should not calculate power retrospectively. It is meaningless.
See: Gilbert, G. E., & Prion, S. K. (2016). Making sense of methods and measurement: The danger of the retrospective power analysis. Clinical Simulation in Nursing, 12(8), 303–304. https://doi.org/10.1016/j.ecns.2016.03.001
Article Making Sense of Methods and Measurement: The Danger of the R...