Are there certain factors could make the statistical significant ( p value less than 0.05) clinically important as larger sample size or randomized controlled clinical studies?
You're right, if you increase the sample size in each treatment group, eventually significant differences will be found; therefore it is more advisable that in addition to the significance test, you calculate the effect size, this is a statistic which estimates the magnitude of an effect (Cohen's d, Hedges' g, correlation coefficient).
Clinical importance has nothing to do with a p-value. Nothing. Knowing a p-value does NOT in any way help you to judge clinical relevance and there is NOTHING one can do to get this information from a p-value.
To judge the effect of a treatment (or drug) you need to weigh the consequences of this effect, the consequences of not treating patients that might benefit, and the consequences of side-effects(!). There is no common rule or ethics how to do this. There are very difficult questions about the quality of live, the severity of symptoms, and last not least economical aspects, to all of which we do not commonly agree upon.
The judgement of the clinical importance or relevance is thus to 10% a statistical problem, to 30% a medical problem, and to 60% a social/ethical problem. And even for the 10% impact of the statistics, the p-value is not a useful measure at all. The statistics give you an estimate of the effect (and the side-effects) one should expect, including a statement about the precision or uncertainty of these estimates.
You mentioned the sample size: the p-value is itself a function of the sample size. Larger samples give smaller p-values (given all other conditions being the same). So the interpretation of a p-value should actually consider the sample size, and small p-values especially from large studies are thus not as "significant" as one would think.
You further mentioned the study design: this is important because this defines how we can interpret the data and the results. The aim of a good design is to make sure that the relevant population is sampled, relevant confounders are considered and that the estimates we get are not (severely) biased. Randomization is a reinsurance to reduce the impact of confounders that we do not even know, blinding is a tool to reduce the impact of confounding by systematic differences in the treatment or dealing with the patients, and double-blinding extends this to the recording and interpretation of the data. Some statisticians say that another step of blinding should be introduced: the analyst schould not know about the aims of the study and about the desired results.
The established statistical procedure for clinical studies is using Neyman/Person testing to make a rational decision about the "usefulness" of a treatment or drug. This requires that we have a good a priori knowledge about the effects and the side-effects. A rational decision can be made based on the expected utility of the treatment, but this requires that one needs to name and value the benefits and harms of correct and wrong decisions. Based this knowledge and on these considerations one can plan a study of which a hypothesis test will provide a rational decision.Without a knowledge about the effect-sizes and a specification of the utility function (and instead just plugging in some "usual" values for alpha and beta) the maths will work, but the proposed decision does not need to be rational!
To add to Jochen's very thoughtful reply – a p-value is a ratio. It is the result of dividing the effect size by the degree of precision in the measurement of that effect. So it combines two pieces of information in a way that cannot be uncombined. (A half is a half, no matter whether it's 2÷4 or 18÷9).
The first important figure for clinical significance is the effect size, which, as Jochen says, cannot be read from the p-value. And even here, the size is not the only factor. We need to understand the context.
For example, smoking increases the risk of cancers of the head and neck by a factor of about 20. But these cancers are really rare, so the effect of smoking is to produce just a few extra cases. If you are a smoker, your absolute risk of cancers of the head and neck is still tiny.
On the other hand, smoking increases the risk of heart disease by a factor of just 2.5. However, because the risk of heart disease is already substantial, a 2.5-fold increase represents a very serious increase in risk both at the level of the population and the level of the individual.
So without knowing the context, you cannot deduce clinical significance from the measure of effect size, even if you knew it.
The denominator in the p-value calculation is the precision of measurement. This is also useful information in judging clinical significance. But it is better expressed as a confidence interval around the point estimate of the effect size. The width of the confidence interval allows us to judge which findings are already precisely measured to a degree where we can put them into practice, and which will require more research before we decide to act.
As I hope you can see, a p-value is the worst possible way of losing two really important pieces of information - effect size and precision. And they are lost irretrievably - you cannot reverse-engineer the process of division without knowing at least one of the terms in the sum.
Good question, and one which more people should ask themselves!
Excellent answers by Drs. Wilhelm and Conroy. Just to complete Dr. Conroy's point about confidence intervals... the best way to proceed is to determine the effect size that is clinically relevant at the outset (prior to conducting the research). Then determine your sample size using the AIPE method (Accuracy in Precision Estimation; Maxwell SE, Kelley K, Rausch JR. Sample size planning for statistical power and accuracy in parameter estimation. Annu. Rev. Psychol. 2008,59:537-563.), with the lower boundary of the 95% or 99% confidence interval to correspond to the minimum clinically meaningful effect size. Thus, interpreting the confidence interval around an observed effect size in relation to a predetermined clinically meaningful effect will tell you if the observed effect is clinically relevant. I teach a PhD level course in applied statistics and I've included my lecture on sample size, which includes the AIPE method.
I totally agree with all previous writers and want to add one point: When you are talking about clinical importance, maybe you should go in for the reliable change index and the concept of clinically significant change introduced by jacobson and truax (1991).