Apriori sample size calculation is often thought to be the best approach. How would the results change if the sample size is calculated afterwards or just analysis done on the sample size on which a study is done?
The purpose of sample size calculation is risk management, ensuring ethical adherence, minimise resource useage, minimise exposure of subjects to unvalidated procedures etc. You do it a priori to determine how to design your experiment to achieve an anticipated results at a satisfactory level of risk of failing to detect a true result. Once you have ran your experiment any sample size calculation is redundant for interpreting that dataset. It can be useful to do for designing follow on studies. In the case of failed studies it can't rescue them, but it can help design better future studies with realistic risk management built in. If you have specific reasons for raising the question perhaps share them so a more specific response can be given.
The purpose of sample size calculation is risk management, ensuring ethical adherence, minimise resource useage, minimise exposure of subjects to unvalidated procedures etc. You do it a priori to determine how to design your experiment to achieve an anticipated results at a satisfactory level of risk of failing to detect a true result. Once you have ran your experiment any sample size calculation is redundant for interpreting that dataset. It can be useful to do for designing follow on studies. In the case of failed studies it can't rescue them, but it can help design better future studies with realistic risk management built in. If you have specific reasons for raising the question perhaps share them so a more specific response can be given.
There are a couple of reasons. James hits on the power/ethics reason. You do a power analysis beforehand to make sure you have a large (and not too large) a sample for your proposes and you make this public so the readers know you are not just collecting data until you found what you want. A second reason is that depending on the statistics that you use, many require (or assume) that you have a fix stopping point for you data collection. This can be your sample size.
You've had some good answers from James & Daniel. Another way to tackle your question is to talk about what's wrong with post hoc power analysis (as it is typically done). Russ Lenth has some nice discussion of that in Part 3 of this report:
The ethical component, noted by James, is very important. People who take part in research (and animals too) work for nothing. It is an abuse of their participation to run a study that doesn't have a reasonable chance of detecting what it sets out to detect, assuming that such a thing exists.
The old 80% power reflex is another thing I dislike. There's no real thought behind it. Are we really going to ask funders to fund, researchers to carry out and participants to participate in a piece of research that has a guaranteed 20% chance of failing? I want to see 90% power at a minimum, and when I'm doing calculations I always give the minimum detectable effect size at 90% and 95% power.
an animal research project was declined by our committee just because the researcher wanted 90% power. They said that 80% was usual and applying for more animals was not justified... (the researcher stated that the few more animals would outweight the higher risk of the experiment being a complete failure)
And a note: (1-power) = beta is not the guaranteed risk of failure. It refers to the hypothesis B (assuming B is true, then the probaility of deciding for B equals the power, and the risk to falsely decide for A is beta). If the truth is larger than B, the probability of deciding for B is larger, and the risk of a false decistion for A is lower. If B was chosen conservatively, beta can even be seen as an upper bound of the risk.
Thank you very much all. I think I have learnt a lot from your answers.
Taking it a bit further my next question is as follows.
Researcher-1 conducted an experiment on X number of animals with apriori sample size calculation. Researcher-2 also did the same experiment with same number of animals but had not done any prior sample size calculation. Both Researcher-1 and Researcher-2 got the analyzed from the same data analyst independently without knowledge of each other. Will both the researchers get same results? What difference will their respective data make?
As Daniel pointed out: power analysis is required in classic null hypothesis testing as an a prior stopping rule (i.e., sample size). Stopping the study early because p < .05 is reached is a mistake and one may also not just continue testing. As an alternative, Bayesian parameter estimation does not have these restrictions. Data is seen as cumulative evidence (today's posterior is tomorrows prior). One may stop whenever the estimates reach a desired level of certainty and one may extend the study when stronger certainty is needed. Moreover, a researcher doing a power analysis must have a good idea about the effect size. Focusing the analysis on estimating magnitudes connects to the idea of effects size more naturally than the NHST black-or-white decision based on an arbitrary threshold.
As convenient (and intuitive) this may sound, it comes at the difficulty of convincing the funding and ethical commitees of such eccentric ideas ;)
actually, a power analysis in its strict meaning makes sense if you want to decide between two substantially different alternatives. Let's call the corresponding hypotheses A and B. In most research projects, A typically refers to the hypothesis or a "zero effect", and B refers to some least interesting/relevant effect. The aim of the study is to decide whether we should act as if A was the case or as if B was the case, and we want a test that balances the benefits and risks of possible correct and wrong decisions. This is done by defining appropriate confidences in the possible decisions, given by alpha and beta (1-alpha is the confidence in deciding for B and 1-beta is the confidence in deciding for A).
If researcher #1 did a power analysis, I would assume that he set up the hypotheses A and B and gave the desired confidence levels of the possible decisions. The test result will either be "decide to act as if A was the case" or "decide to act as if B was the case". In ech case, the decision has the known confidence.
Researcher #2 obviousely did not specify A and B. I assume that the test here is not and A/B test but a significance test: testing the statistical significance of the observed data under a 'null hypothesis' (H0). Formally, H0 is equivalent to A of #1, and the same math is involved. But the interpretation is different. The test does not give a decision between A and B. It just tells the researcher how unlikely his data (or more "extreme" data) is to be expected under H0 (=the p-value). If this is very unlikely, the researcher may be sufficiently confident to interpret the sign of the effect estimate. Otherwise, the data is considered insufficient or inconclusive. That's it.
So both researchers will end up with different interpretations. #1 can act as if the effect was A or B (at least), with confidence 1-b and with confidence 1-a, respectively. Doing so the researcher behaves optimally w.r.t. the loss function he imposed on his actions. #2, in contrast, can either make a claim on the dierection (sign) of the effect (and he can do this with confidence 1-p, although this concept of confidence doesn't really aply here, because #2 didn't aim make a decision between A and B and thus cannot control any error rate, so he cannot define a procedure to assertain some level of confidence [confidence is a property of the procedure, but the p-value is a statistic from the data!]), or he has to admit the the data is not sufficient to get that confidence in inetrpeting the sign.
Even if #2 knew about the sample size calculation of #1, it remains open if #2 would consider B as least interesting/relevant and if he would also pose the same loss function like #1 (an aspect related to Rónan's post above).