So, consider two different examples. A study outcome demonstrates no difference between groups as NS p>0.05 with a sample size of 10 in each group. If I knew that p=0.08, then I might consider replicating that study myself with a larger sample size if I had strong reason to anticipate a difference, but if p=0.40, then I would abandon that hypothesis. Now consider a study outcome demonstrates difference between groups as significant at p
Giving the exact P value you indicate the level of probability for the difference between or among treatments. If you just say 'significant' or 'not significant' is at a fix level of probability (i.e., 0.05, 0.01 ...) and therefore is less informative. Nowadays, almost all statitistical packages give the exact P value.
Theoretically, the Neyaman Pearson approach is the right one; if you are testing a hypothesis, you are maximizing the power for the chosen significance level. To use the hypothesis testing as the method and then declaring a P value in the end does not make sense. Having said that, it has become a practice to report P-valurs and it has meaning only if you were not sure a priori what your risk tolerance was, then it becomes a descriptive analysis (leaving the conclusion to the reader). From a practical point of view, does it really matter which approach one uses?
Confidence intervals (CI) are more informative then p values. They give the same information as p values, plus an idea about sample size.
So, consider two different examples. A study outcome demonstrates no difference between groups as NS p>0.05 with a sample size of 10 in each group. If I knew that p=0.08, then I might consider replicating that study myself with a larger sample size if I had strong reason to anticipate a difference, but if p=0.40, then I would abandon that hypothesis. Now consider a study outcome demonstrates difference between groups as significant at p
Suggest looking at Tukey's three-decision rule. See
Jones, L. V. & Tukey, J. W. (2000). A sensible formulation of the significance test. Psychological Methods, 5, 411--414.
or
Wilcox, R. R. (2012). Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction. New York: Chapman & Hall/CRC press
A p-value reflects, for example, how certain we can be about which group has the larger value of some parameter, or whether the value is greater than specified constant. If the p-value is not overly small, say less than .05, make no decision.
Respected Sir/Madam, Thank You so much for the answers. These information are very important not only for me but also to the other researchers who reads the answers. I feel other knowledgeable scientists also will share their views here.
P value is of paramount importance. Infact its the p value that tells whether null hypothesis rejected beyond any doubt. Iam not a statistician but i have managed to learn a bit of it. When we say a p value less than 0.05, it means the probability of a test rejecting null hypothesis incorrectly is less than 5% and hence the statistical output is 95% or more reliable. Please correct me if am wrong.
I would never discuss my results until all my data has been put to statistics and p value determined. Whether or not to put the exact p value or categorize it as less than or more than a paricular value is up to authors and journal formats.
Apurva, here I can give a correction: "it means the probability of a test rejecting null hypothesis incorrectly is less than 5% and hence the statistical output is 95% or more reliable." The first part is correct, but not the second. The second part depends on the probability of a false null itself. Consider the case where you are really testing only noise all the time (no "real" effects). The procedure guarantees that 5% of your test results will be "significant". When I know that most of your tests will anyway test a non-existing effect, even after a "significant" result I will still be convinced that this is wrong (false positive). In the other extreme, when you are a brilliant researcher and almost all of your thoughts (hypotheses) are fitting to observations (I want to avoid the phrase "are true"), then a non-siginificant result would not discurage me in thinking that there will be an effect anyway but the power was not sufficient to reach significance.
So, the confidence in the reliability of "significant results" is not at all determined alone by the p-value but also by a-priory beliefes about the likelihood of the tested hypotheses (what may depend on a lot of knowledge not considered by the actual experiment alone). Further, one may take the philosophical standpoint that any intervention, any kind of systematic difference in whatever aspect, will surely lead to some effect - probably of irrelevant and negligible size - but this effect will result in rejecting the null given the power of the test is high enough.
Apurva,
To tell you in non-statistical terminology, p value is analogous to false positivity of a diagnostic test. After analysing your data, if you say that you found a statistically significant finding with a p value of 0.001, it implies that you concluded there is a significant association/difference, but you could be wrong in saying so, 1 in 1000 times, as it could be a false positive result.. So quoting exact p value would help the reader to understand this likelihood of your result being a false positive or type I error in statistical parlance.
Yes, confidence interval is more informative, but exact p is also informative.
Sreenivas
Reporting the exact value is an indication of not fully understanding the logic of significance testing. A priori one selects the probability of the test statistic that they would consider rare, often .05 or .01. If the probability of the statistic is less than that value then you reject the null. It is either fail to reject or reject - there is no such thing as more significant.
I would like to take Brijesh's question a bit further. If P values are meant to reject or accept a null hypothesis, then why some people would use a P=0.1 while some others use P=0.01.
Thanks a lot, Dr. Wilhelm and Dr. Vishnubhatla.
This is not in the context of the present question. But i was, out of curiosity, looking for scientists with real top rg score. Dr. Wilhelm, i have found my answe. You are in the top 3%. That really is great.
Lawrence, I think stating that "Reporting the exact value is an indication of not fully understanding the logic of significance testing" is only one side of the coin. Based on Newman-Pearson's (NP) approach, this is right. But this approach *needs* to set the power a priory, what, at least in the basic research known to me, almost never is done. NP clearly state that the only purpose of a test is to guide *actions* taken, and that in the long run the proportions of wrongly taken actions should be controlled. They do not state anything about the hypotheses, it is only about actions. In their setting one has to take an appropriate action for the case that there is an effect or that there is no effect. Such "actions" may be to decide to further investigate something or to approve a drug, or to decide not to further investigate or to disapprove a drug. In this sence, a "non
significant" result also inevitably leads to a decision, and for low-power experiments such decisions are too likely to be wrong. Even worst: if the power is not specified a priory, then the type II error rate is not controlled. However, this is common practice in basic research.
The other side of the coin is represented by Fisher's approach. Here, not the control of long-run error rates for taken actions is central, but rather the inference from a particular experiment to a more general situation/model. This can for sure not be based on a p value, but a p value is yet another piece of information that adds to the complex process of inference. Here, it is perfectly fine to distinguish p values, i.e. to report their actual values. There is no (arbitrary) cut-off for a yes/no decision. However, the information is still pretty much condensed (I mean much information is not used in the presentation of a p value). Giving the entire likelhood profile and/or a posterior probability distribution would be most informative, but most people are not familiar with this. A good compromise are confidence intervals (that are identical to credible intervals for flat priors, if you like to go that way).
Dear Scientists thanks a lot for your valuable answers. Yes, it is worth to read your answers. I am getting the answers more near where i need to prove. Please post some scientific proof also. Mohammad Sir, Yes the researchers are keeping cutoff value up to 0.2 instead of 0.01 or 0.05. It is according to your study design. You can fix the alpha and power, after that calculate the sample size and proceed the study. But my question was there is no highly significant or highly non significant according to the exact p value so why we should report the exact p value instead of reporting NS (Non significant) or S (Significant).
Respected Dr. Jochen Wilhelm Sir, thank you so much for your very valuable answer. Please let me know some references too any editorials, short communications anything will be very helpful to me.
If you follow NP, you set the "level of significance" and the "power". Then you collect the data and then you do the test. As a result of the test, you take either action "A" or action "B", depending on the test result, what you may indicate by "significant" (S) or "non.significant" (NS). This is perfectly fine.
If you follow F, you have some data and you analyse its "significance" under the null hypothesis. Take this a just another piece of information together with everything else you know (from other data, from literature, from the reasonability of the competeing models and so on) to argue, either more in favour or more in disfavour of a particular model.
For a starting point you may have a look at: http://biostat.mc.vanderbilt.edu/twiki/pub/Main/ClinStat/EndOfSignificance.pdf where other references are given.
The real question is, do we really need p values ??
The frequentist approach and the systematic reporting of p-values do more harm than good, because it is very difficult to interpret.
As an introduction to the issue, the classical work of Cohen, "The earth is round, p
Part of this question, should be answered by the referee and editorial board of the journal. Some journals are asking for such values. If the referee and the editorial board with some statisticians are agreed to have such data in the journal that will be OK. In my openion, some times, it can make difference if you would like to run the experiment again with different samples and measurement scales. Anyway, it does not make harmful effect.
I am assuming that all of us want to publish the research for which we are discussing the use of P values? If so, we may have to rely on our reviewers and journal editors. If they are finicky about the exact P value, we may have to provide it for our research submission, irrespective of the utility.
As it relates to reporting or not, in agreement with many on this discussion board, I think we should report exact p values!
I believe that the "exactness" of a probabilty (p-value included) is a contradiction in terms. The traditional ranks (
Boris, the p values are as "exact" as the data are. the p value is a random variable. A particular, given set of data translates into a particular, given p value (depending on the chosen error model and null hypothesis). Reducing the information in the "exact" value to a rank (like "
Thank you, Wilhelm! I think, however, that "tiny", "not so big", "quite large" - are not so bad definitiions and, in some cases, quite sufficient ones. I feel that presenting p-value as such estimates is enough when, for instance, comparing average RBC counts in two samples of population or of laboratory animals. If I wish to demonstrate that, let us say, a Ca dietary supplement has reduced the lead-induced anemia and to check the probability that this beneficial effect is not a mere sample error, p
"If I wish to demonstrate that, let us say, a Ca dietary supplement has reduced the lead-induced anemia and to check the probability that this beneficial effect is not a mere sample error, p
Clearly, stating that a difference is "statistically significant at P=0.048" is nonsense.
Following Newman/Perason, you reject H0 at a fixed level of significance. This will allow you to control the type-I error rate. But note that this does not tell you anything about one particular result. This philosophy is based on the long-run error rates. And surely there is no control of the error rate when you start rejecting H0 at different levels. One can state the level of significance ("alpha") in the Methods section, and then just say whether or not any result is "significant". However, this does not carry any information about the likelihood or believe in the "truthness" of the result: p
Dear Philippe! "You need effect size estimation and a discussion of what difference is practically / clinically relevant. " Who would disagree wirh you? Not I!!! You seem to have forgotten, however, that the question under discussion was quite specific: is reporting the "exact" p-values (instead of rank estimates) really useful for anythinf but satisfying a reviewer. I am sorry if my too laconic "all I need" led to a misunderstanding.
Dear Wilhelm, I am overpowered by your statistical erudition (far above anything I might propose myself) but I'd say that you have also gone far outside boundaries of the same question. The only statement of yours that is pertaining to it is "Clearly, stating that a difference is "statistically significant at P=0.048" is nonsense." It is just what I wished to say.
It was a very interesting discussion indeed. Thanks a lot!
Science and hence scientific experimentation or research is all about nearing exactness in methodology, writing and testing. Although it is correct, the p-value provide certain information relative to the hypothesis testing (if the calculated p-value is smaller than the probability of the alpha rejection region then the null hypothesis is rejected, however, if the p-value is greater than or equal to the probability of alpha, the null hypothesis is not rejected).
However, the p-value can provide a great deal more information than one would think, particularly if exactness is what counts in science. The degree of significance of the result of the p-value allows one to evaluate the degree of exactness or the extent at which the data is in agreement or disagreement with the hypothesis and not that they just disagree with the hypothesis.
John Billingsley
Dear John, I believe you are right in your last statement but I don't see why the expression of this "extent at which the data is in agreement or disagreement with the hypothesis" in ranks instead of in "exact values" would not be quite satisfatory. Morover such "exactness" is confusing just in this respect as one should decide what minimal difference between these exact values has to be taken seriously.
One issue not touched so far, is that often papers do not report just one p-value, but 10 or more in every table. In that case, if the authors did not, readers may formally or informally wish to apply some form of correction for multiple testing (e.g., Bonferroni, false discovery rate) to the critical sets of results, to estimate how many of the "significant" findings may have a chance to be true positives.
The P value cannot stand alone by itself in most of the instances. It depends on the test used besides various other factor. Most of the researchers rely on significance, but in reality, for an effective translation of findings, one has to take the whole measures to interpret rather than relying on P value
yes, there exists in minds of researchers a bias which I call "statistical fetishism": for them, if P>0.05, a fact is no fact at all. Natural significance of data, their interdependence, their reproducibility, mechanistic considerations and so on - are no less important.
The use of reporting P value is that whatever research study or project you have done in that there is a chances or probability of error up to 5% or 1%.
The p value at LOS 5% or 1% means that there is a chance or probability of error in the given data.
Dear Ashish, an "error in the given data" and what is call a standard error of an estimate based on a sample (be it an average value, or a proportion, or what not) have not the same meaning as, I am sure, you understand. Your data themselves should rarher be obtained and given with no more error than is unavoiable with your experimental or measuring technique - and to estimate this (systemic) error no P will be useful.
Basically the p-value tells us the probability of obtaining a result as extreme or more extreme than the actual sample value obtained given the null hypotheses is true. Which means the difference b/w the group is really or systematically true OR is it by chance.
I am sorry, but many of us went in our answers far outside the original question of Brijesh; is reporting the exact p value in a research paper necessary or the level estimates (0.05; 0.01; 0.001) are quite enough, and began to explain trivialities to one another.
I see a variety of responses to the question as to the benefit of reporting an exact p value. It is my impression that you are cautioning a reader when you say that you find a significant association between an exposure and an outcome. For example if one says that treatment A works significantly better than treatment B, say 60% vs 40% witha p value of 0.01, it should be taken that the investigator could be wrong in saying so, one in 100 times. So that's how exact p helps.
Agree with Boris Katsnelson, we drifted away from the original question: is there value in reporting exact p value or just report the level estimate of significance? In other words, report p=0.003 or simply report p
James Bowman maintains that "If p=0.08, then that seems to suggest a different meaning than if p=0.3" - I think, it is above any reasonable doubt. However, I even in the wide range of p values exceeding 0.05 the "exactness" is not absolutely necessary - we can simply introduce additional categorical estimates. For instance, in epidemiological studies some researchers (I included) report associations significant at p
So, to answer the question:
IF you follow the philosophy of Newman/Pearson that an individual p-value has no meaning at all and there is no point in reporting it. Just by claiming effects only where p
I find p values very useful when presenting results that are potentially type II errors. If you have a two-tailed p value, halving it gives you the probability that the true population effect could lie in the direction opposite to that in the point estimate. This may be useful if, for example, you have a point estimate in favour of the intervention, but a p value of, say 0.18 in a clearly underpowered study. You certainly wouldn't want to state that there was no effect, as it is very possible that we haven't detected a significant effect that may be there. Equally you certainly would not want to state there was a benefit as the magnitude of the point estimate is meaningless unless you are reasonably sure that the results are not just due to sampling error. All we can say in this ambiguous case is that there is a 9% probability (half the p value) that the effect direction of the true population value might actually represent a harm. This allows the reader to have a ful understanding of the risks of a type I error if they were to accept the point estimate as representative of the population effect. Hence the exact p value can be of use.
Mark, I think this is a wrong conclusion. The p-values gives you the probability of the data (or a test statistic) for a given hypothesis (usually H0). Given your test statistic is t, your data gives you some observed value t.obs, and the probability distribution of t under H0 can be used to get
p = P(|t|>|t.obs| | H0)
what surely can be written as
P(t < -|t.obs| | H0) + P(t > +|t.obs| | H0)
and for a symmetric probability distribution it is
P(t < -|t.obs| | H0) = P(t > +|t.obs| | H0) = p/2
So far I can follow. But then you state, as I understood, that
p/2 = P(effect>0) if t.obs0)? Is it an unconditional probability? I suspect that your line of arguments suffers from the common mistake that P(data|hypothesis) is *thought* to be the same as P(hypothesis|data), but this is not the case.
What I would see is following: If you set H0 at the absolute empirical effect, then
P(t < 0 | H0=|effect|) = p/2.
But I don't see how this should give me P(effect0).
As one who learned it from Fisher himself, Fisher merrely suggested that if there is only one in 20 chance of an outcome, or any outcome more extreme, that alerts you to the possibilty that the null hypothesis is false, then you should probably seriously consider the possibility of it being false. In the absence of type 3 error (which is no longer taught), actual P-values (and even more so confidence intervals) are more informative, and Fisher would have recommended them (tjhough, for good reason, he wanted fiducial intervals rather than confidence intervals). A careful reading of Fisher's last book makes it clear he believed Bayesian methods are required and, provided there is agreement on the appropriate prior informatioon that should be included, he would now recommend q-values and credible intervals. If you are really serious about the philosiohy of making rational inferences when there is uncertainty, you need to study these topics. Remember, all statstics (and a P-value is a statistic) depend on assumptions; any result that occurs because your assumptions are wrong is a type 3 error.
So a type 3 error occurs when nature (real life) does not obey a bell-shaped curve?
Regardless of what Dr. k
says I think the exact value of a P value is informative it should not be used for decision-making but I would like to see the difference between p equals .045 and P equals 1×10 to the minus 5th
L it makes a difference
But, Jay, such a big difference would be easily seen in the effects. I don't see what this information (the p value) would add to what is already and straight-foreward visible in the data itself.
Exact p values can be handy if you do not report statistics from which effect sizes can be calculated, from the p value you can obtain an effect size. But of course if you do (the preferred way indeed!) report for example a mean difference with the associated s.e. or a t value with df, exact p values do not add much. Except that you can fool the reader a but in saying that p=0.04999 is p
I found a perfect qoute for this discussion:
"We demand guaranteed rigidly defined areas of doubt and uncertainty."
Douglas Adams in: The Hitchhiker's Guide to the Galaxy
P-values are like congress; nobody really likes them in general, but they do love their own, especially if they are less than 0.05.
Simply said:
An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question.
--John Tukey
A penny of probably is worth a pound of perhaps.
--James Thurber
I hate to see if authors just report NS for not significant. Especially if this on secondary results and post-hoc analysis. I would like to judge myself rather than being kept away by the authors perception on what is relevant or "significant".
For me personally, it makes a difference if the p=0.051 or p=0.99 for post-hoc analysis. But some authors would pool both into NS.
Anyway - the even bigger problem is that authors forget to mention the descriptive statistics (e.g. odds ratios) and just present the p-values, which makes no sense at all.
In medical field the exact p value is not of the concern: it should only be used to describe significance: finally the out put depends on how good the work translates to reality/ public benifit
While I can see that the responses of Bowman and Wilhelm are statistically good answers, I believe that clinical significance is even more important. For example, suppose somebody were to test the effectiveness of a drug to suppress breathing and results suggested a p-value of
I've always preferred reports of the most exact probability of a type 1 error, alpha. It seems a good deal more professional to me. You are clearly giving your readers MORE INFORMATION with this, and after all isn't that waht it's all about?
Hi Edward,
the 'exact' type I error rate is pre-specified a priori (usually at 5%), not calculated afterwards. Don't confuse p-values and alpha, they do not match, as Jochen thoroughly pointed out earlier.You may also have a look at
http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
Joanna and James, for a (clinical) prationer this is absolutely right. Here it might even be much more useful to have a look (only) at the expected effect sizes rather than on the p-values. Example: A medical doctor has two still experimental drugs to treat your disease (if you agree on taking part on a study). Drug A might be able to heal, or at least to *consideraby* improve your state of health, but it can not be ruled out that the drug may be useless or even have slight, almost negligible adverse effects. Drug B, in contrast, may be well-known to have a certain but very very small positive effect. Which drug would you take? Which drug would the doc prescribe? Would it depend on the disease (e.g. cough versus cancer)? To my opinion, here the "clinical relevance" should(!) even override "statistical (in)significance". But at the end, after all studies, there should only be drugs available that have thoroughly proven to be effective ("significant"), and the choice of which drug to take should (cannot!) be done on the significance but rather on the effect size, possible side effects, adverse effects, costs, compliance,...
The expected rate of completely useless drugs or (more important!) drugs with adverse effects) is controlled by alpha, what here should be clearly 0.001 or lower; however, this still will not ensure that the fraction of available drugs with relevant effect sizes is high. So even after approval of drugs, the desicion which drug to take (or if to take a drug at all) is again not even dictated by the effect size alone but also by many factors whose impacts on the desicion depend on the individual situation and circumstances. Further, also in applied medicine it is often not clear what effect sizes are "relevant". To my opinion, this will be impossible to judge unless we can measure the cost of the treatment and the costs of the "non-treatment" in the same unit (what unfortunately might turn out to be a monetary unit, but this will give rise to very unpleasant ethical discussions; we would need a "quality-of-life" currency, and money is not at all a good projection for this [although it is not completely independent]... but this runs off-topic...).
But what about basic research? Here, "relevant effect sizes" are often completely unknown, - or: *any* effect (at least in controlled lab experiments) can be scientifically interesting and therefore "relevant" (for the progress of research). I don't see how in basic research something like a "minimum relevant effect" can be defined in a reasonable way. But this makes all the sample size calculations (power!) that are required for the approval of animal experiments by ethical commitees quite a farce. (Note: I am talking about *basic research*, not about phase II/III clinical studies!) So in the end the significance is all we can get from the data to serve as a guard against being too over-optimistic with the findings. Unfortunately, this guard is not taken serious as long as "failed" experiments are repeated until a small enough p-value is obtained (what then is published)...
At the end we need two assumptions/expectations to get an idea of how good good our science (i.e. the published results) are: (i) How brilliant are we (the scienticic community)? and (ii) how many experiments are done (independent of getting "significant results")? If we are perfectly brilliant, we would not at all need any data. The less brilliant we are, the more data will we need to keep the right track. We should be critical and consider us not beeing very brilliant. Now if we think we are just wildy guessing we can expect 5% of our results being significant anyway (false positives), plus some proportion of significant results from "true effects" (given a correct experiment and analysis). By not publishing the "failed" trials, there is the high risk that almost all of the published results are just false-positives. Actually, we do not have any means to judge the quality based on the data (and its "significance") in this case! Estimating e.g. the false-discovery-rate in research requires to know all the p-values from all attempts/trials/experiments. When only experiments with significant results are reported, then we lose the possibility of judging the quality based on the p-values/significance. In this case we can only judge the elegance of the model, the coherence, the plausibility, ... all that what actually stands outside of the data, and what would be used in an objective, formalized way by Bayesian analyses.
Why not to use confidence interval at 95% or 99% that in my point of view shows better the significance of results once we can see all extension of results variation
Perhaps, it is not nonsens to show exact p values, when these are in the range 0.051-0.20; afterwards, enough is to say NS.
Having exact p, one can judge, if NS is result of small sample size, or just diferences are not significant.
When nature of things may not allow bigger sample (replications not possible; events are rare), it is advisable to show p, not NS.
Not true. See http://darwin.cwru.edu/ref/view.php?id=316&article=Elston+Reprints
But I strongly agree that, to the extent that P values mean anything at all of interest, exact P values should always be reported. (This is not the same as reporting P values that assume asymptotic results when the sample size is not large enough for those asymptotic results to be reasonably accurate).
@Jochen,
I think, your reasoning is not correct here. If you want to combine results of five independent trials, you can apply Fishers combination test http://en.wikipedia.org/wiki/Fisher%27s_method, which results in
pchisq(-2*5*log(.5),df=10,lower.tail=F)
#p=0.73
and therefore less support against H0.
For completeness the example with 2 studies with p
Thanks for the link, Robert. I do not understand the arguments given there, maybe you could help me? I think the following sentence is critical: "The analogous combined p-value would then be (0.9)^n, and we would arrive at the absurd conoluaion that any null hypothesis could be made as significant as we please merely by testing it on the baais of a large enough number of experiments"
I do not agree with the presumption. Given H0, the p-values are uniformly distributed. Therefore, given H0 is really true, it MUST sooner or later happen to give a p>0.9. Only a series of at least 29 studies all with p0.9 one would already need a series of 53 studies all having p0.9 in any single study...
Further, combining different studies will increase the "cummulative power" to reject H0. Thus, given enough studies (as well as large enough sample sizes!) should allow us to reject almost all H0's since it is quite unreasonable that any systematic difference in some respect wil have *exactly* zero effect on any outcome. This is a general philosophical problem that arises from basing decisions on expected error rates instead of (additionally) judging the relevance of the estimated effect sizes.
EDIT:
I have one more problem here: "p-values below 0.37 suggest that the null hypothesis is more likely to be false" This is concluded from the fact that a large series of Fisher-combined such p-values has a limit below 0.5. I do not understand how this will give information about P(H0|data) from the p-values that are P(data|H0).
I think effect sizes and 95%CI are definitely more informative than p-values, but actual p-values should be reported as well.
As others have said, a marginally non-significant p-value in an under-powered study may still be of interest, so information is lost when reporting NS instead of an actual p-value. If someone would like to include your results in a meta-analysis, knowing whether p=0.10 or p=0.80 would make a difference.
Dear Jochen,
There is something wrong with your calculus.
Using your argument it would even be less likely to observe consecutive p-values above .9, which would occur under H0 with a probability .9 would lead to a rejection of H0, which is indeed counter intuitive. Even more ridiculously, you would reject H0 with a single p>0.95 (or p within any interval of length
No, Eik, I think you did not get me (or did I make a typo using the wrong sign somewhere?). P(p>0.9|H0) = 0.1, that's right, but this would be a test against H1, not against H0. But I was referring to P(p
It is possible for a set of data to be too close to H1 to be believable. Fisher showed that Mendel's data were so close to what he wanted to show that the data were probably fudged (but, contrary to what is sometimes stated, he did not think that Mendel himself fudged his data; he thought that Mendel had an assistant who knew thr result Mendel wanted and fudged the data!). Also,, see:
http://darwin.cwru.edu/ref/view.php?id=316&article=Elston+Reprints
Sorry Jochen, I don't get your point, can you clarify please? Why should P(p>0.9|H0) = 0.1 be a test against H1? Your argument is, that under H0 p has a uniform distribution and you want to construct sequences of events which contradicts this distribution assumption, but so am I.
agreed on that, but what does that prove? I still doubt that your calculations are correct and using your arguments deduced a scenario with an unintuitive result as an admittedly heuristic 'prove'.
Going back to your original statement, you proposed a rejection of H0 if you observe enough p values
No, Eik :) If I observed (0,2,2) in three such dice experiments, the p-values would be 0.581, 0.263, 0.263. The smallest alpha to reject H0 in *all* experiments is 0.581, so the probability to falsely reject H0 all three times is 0.581^3=0.196.
However, I see that the combination of the *data* (rather than the p-values), what is possible here, gives p=0.525 (0+2+2 "sucseeses" in 6+6+6 "trials", binomial test). Obviousely, my naive calculation does overestimate the collective significance (I think the combination of p-values ideally should come to a similar conclusion than the analysis of all available data together - I cant calculate it, but I have a gut feeling that a Bayesian procedure will just do this).
OK. Let me ask again. If one year two habitats were differing by, say, small mammal numbers with p
Linas, first of all, my statement above was just a push to think about possibly valuable information (for instance in favour of an effect) in a series of studies from which each would not be considered significant if seen in isolation. I never implied that significances (in medical examinations or any other field) are or should be calculated this way. So this discussion in my teaste is going much too far already. My statement gets overinterpreted. But at least it provoked some thinking about the topic, what in fact was intedend.
Then you claim a "significance". But I say: the chance to get three times p
There are many ways to combine the data from several studies in order to get an overall P value. Ways to do this when all we have are the P values of the individual studies are discussed in:
http://darwin.cwru.edu/ref/view.php?id=609&article=Elston+Reprints
http://darwin.cwru.edu/ref/view.php?id=701&article=Elston+Reprints
P value is explaining, the data correlations are having significance or not either positive or negative with respect to number of values (n). Means if p=
RE: The original question,... Is not the difference between an enumerative study and an analytic study related to a possible partial answer? "Tests of significance, t-test, chi-square,are usleless as inference – i.e., useless for aid in prediction." "Use of data requires also understanding of the distinction between enumerative studies and analytic problems. An enumerative study produces information about a frame."
Page 100, 101 The New Economics, W Edwards Deming, second edition
'Analysis of variance, t-test, confidence intervals, and other statistical techniques taught in the books, however interesting are inappropriate because they provide no basis for prediction and because they bury the information contained in the order of production. Page 132 Out of the Crisis, W Edwards Deming, MIT press
"A process control chart is an example of an analytic study." Page 191 Four Days with Deming, Latzko, Saunders 1995
""Test of hypothesis has been for half a century a bristling obstruction to understanding statistical inference."p.100 The New Economics
please bear with my low-quality question
john
All this tournament of learned statisticians is very impressive indeed. Still I wonder again if anydody remembers the question to which they believe to give answers: Let me remind it: "If it mentioned in the Material and Methods p
Boris, don't you think that the original question has already been answered? Apart from this, I personally find it good that questions sometimes initiate discussions that might well go far beyond the original topic. This way, all discussion partners and readers can learn a lot.
Dear Jochen, no, I do NOT think that the original question has already been answered - at least, not to common satisfaction . Some people voted pro, others - contra, and it was interesting to understand why they vote this or that way, but no consensus has been achieved, and such problems can hardly be solved by majority. I think that nobody has changed his (hers) opinion, nor has I.
As to the discusssion going far beyond the original topic... Well, I have to confess that you, highly sophisticated statisticians, have lost me, a simple researcher, rather long ago. However, in principle, I believe that in discussing one should not forget what it is all about. Call it a discipline of discussion if you like.
It is how I feel, but I will not insist that I am right.
I think that it is not uncommon in statistics that there is no unique best procedure to many problems. There are often many different aspects playing a role, and there is no natural law dictating which aspect is how important. Thus I think it is good and valuable to see that pros and cons are discussed and that there might not be a (known) uniquely best solution. This reminds people to use statistics NOT like a cookbook but rather than a tool - the user must take care that he handles it a way to reach his aims. This requires to be clear about the aims (what is very often just not the case!). And recall Jakob Bernoulli, who called his masterpiece "Ars Conjectandi".
But anyway, thank you for your comment: "I believe that in discussing one should not forget what it is all about." I apprechiate it.
Indeed, I agree with your last statement completely - and that is why I've voted it up.
By the way, my own answers (within all the Research Gate discussions I am taking part in) are now posted without these little triangles, so I do not know if anybody agrees or disagrees with me. Can somebody give me a link to a RG adminstrator to file a complaint?
Boris, you dont see the vote buttons under your own posts - it would be strange to vote one's own posts... ;) I see them under your posts and a gave you a vote.
Thanks a lot! However NOW I see those buttons - just as they were seen before 2 or 3 last weeks. It seems that somebody saw my appeal.
3 months ago I posted here the following comment:
"Yes, there exists in minds of researchers a bias which I call "statistical fetishism": for them, if P>0.05, a fact is no fact at all. Natural significance of data, their interdependence, their reproducibility, mechanistic considerations and so on - are no less important."
Now I'd like to add to it some citations from a very interesting commentary by David A. Savitz just on "What is to be done with P values?" ("Epidemiology", 2013, V.24, No2, 212-214):
"...we invoke such frequentist techniques with little thought , having lost sight of the original questions and SUBSTITUTING STATISTICAL ANSWERS FOR SUBSTANTIVE ONES."
"... we are susceptible to the "tyranny of statistics", bowing before the God os cientific objectivity and conservative interpreetation, despite the fact that conventional statistical tools are neither objective nor consevative".
Bravo, Dr. Savitz!
In zoology, we are also using threshold of p < 0.10. To my mind, p = 0.10-0.15 is absolutely worth to mention, and particularly, when sample size cannot be bigger due to objective restrictions (geographic samples, rare species, rare events, new species, etc.)
Dear Linas, I wonder if this comment of yours is somewhat connected with mine of Apr 19, 2013: "James Bowman maintains that "If p=0.08, then that seems to suggest a different meaning than if p=0.3" - I think, it is above any reasonable doubt. However, even in the wide range of p values exceeding 0.05 the "exactness" is not absolutely necessary - we can simply introduce additional categorical estimates. For instance, in epidemiological studies some researchers (I included) report associations significant at p
Dear Boris, sorry, I overlooked your comment from 19th of April. My intention was to say, that in zoology we quite often have p
Coming back to the original question, multiplying the amount of "cetagories" for rejection null hypotheses does not solve any problem. Having some *fixed* limit (whatever it actually is; be it 0.05 or 0.1 or 0.153...) assures a long-run maximum false-positive rate. This (long-run maximum) rate is statistically ensured by *not* using any common sense or expert judgement for the decision. The decision is made like by a stupid machine on the sole criterium that the p-value is smaller than the accepted (long-run maximum) rate of false rejections (given the appropriate test was used). Here, reporting the actual p-value is not at all interesting, because knowing its exact size would not change the decision. It's only p
Dear Jochen,
If “providing "exact" p values is pointless” “but comparing them to some fixed level(s) is usually not sensible (in research) either” – what is to be done in practice? For me, it is difficult to see why “presenting p-values as such by giving one or 2 significant digits” is the solution.
Your idea of the pp value is very attractive indeed. However when you maintain that
". pp < 1 should indicate to be very cautious in interpreting effects” but qualify that “ this should not be read as a "threshold" it is somewhat like playing with words.
Would you agree that the problem is just in the widespread absolutization of the “threshold idea”? Until our minds rather than our statistical criteria “move away from the black-and-white picture of rejecting/not-rejecting null hypotheses” no way of presenting p values will help.