A p-value represents incomplete information by itself. Also it is a very bad historical mistake that so many sources seem to indicate that 0.05 is an acceptable universal level for "significance." (The word "significance" itself is a misnomer.) The p-value is very often, very badly misunderstood, and has been for some time, even before I published the following almost 30 years ago:
A p-value is sample size dependent. Standard errors, and estimated standard errors, for means, totals, and proportions, are reduced by increasing sample size (unlike standard deviations for populations, estimated by standard deviations of samples), and a p-value is similarly impacted. However, standard errors and confidence intervals are more practically interpretable than a p-value.
If "Big Data" were easier to have handled earlier, then many more researchers may have noticed sooner that setting 0.05 as a universal "significance" level is nonsensical.
The 'right' level for a given application depends upon standard deviations and sample sizes, and thus standard errors, which relate to "effect size." Thus this also depends upon your tolerance for not doing very nearly as hypothesized. That is, no hypothesis is exactly true. It isn't a question of yes or no, but more like How close above or below some value might you be?
One might say that it takes more data to 'notice' a smaller real difference between two specific hypotheses.
If a confidence interval, or just an estimated standard error, or estimated relative standard error, will not suffice for your situation, then you need some kind of type II error probability analysis.
Cheers - Jim
Article Practical Interpretation of Hypothesis Tests - letter to the...
Say that I choose 0.2 as the answer to the question. I let everyone else use the same value. I have the simplest experiment possible: one dependent variable and one independent variable where the independent variable has two states that I want to compare. A t-test is sufficient and I am done, after noting that all the assumptions of the model were satisfied. If the null hypothesis was true, I would "identify" a significant difference 20% of the time when no such difference exists. I go to the library and find 10 articles (it is a small library). They also follow this protocol. The chance that one or more of the articles are reporting a false significant difference is 1-(1-0.2)^10 = 0.89. So I have a 90% chance of seeing at least one error in reading these 10 articles. Ok, I'll try 0.05. and I find that my chance of seeing at least one error has fallen to 40%. My library got a new shipment of books and I now have 100 articles. After reading all of these I have a 99% chance of at least one report of significant results that is in error.
Now I look at real papers that have multiple statistical tests in one paper and a library with 15 million research papers. As I increase the p-value that is used to claim significance I will see more an more false positives in the literature.
So here are some questions:
1) What is the cost in making a claim of significance when no significant differences exist?
2) What is the cost of failing to find a significant difference when the treatments are significantly different?
3) Relative validity. So the literature in your field tends to use 10 replicates. You have used 4 replicates. To make your results consistent with the published literature you need to increase your threshold for significance to a p-value of 0.2. Alternatively, we can picture the same scenario, but you used 3000 replicates rather than 4. Now what do you do?
If you set your p-value at 0.05 you will have two possible outcomes.
1) The calculated p-value for your data is 0.05. I can now claim that my data are insufficient to determine if the null hypothesis was true or false. The result is inconclusive. It is not a negative result. It does not show that the null hypothesis is true. It shows nothing. It is not consistent with nor inconsistent with other published results. If you did a great job in executing the experiment, a great job in data analysis and had average or better replication, then this can still be a useful and publishable result.
Finally, you should read the following article, and one or two more like it. Either reading from articles that are cited by this article, or articles that cite this one. This discussion applies to all science.
I agree with James R Knaub and Timothy A Ebert, but I'll give you another take.
You can use whatever alpha value you want to determine if a p-value is "significant". There is nothing magical about 0.05.
For example, if you were doing a preliminary trial with many treatments and not many replications, you could simply say, "We retained for future study all treatments with p < 0.15."
I would avoid using terms like "tending toward significance" or "marginally significant". It is better to report the actual p-value, and some measure of the size of the effect, and say "This suggests this might be a variable of interest".
If necessary, use additional indications of p-value range. * is < 0.05; ** is < 0.01. But for example R adds . is < 0.1.
I agree with Salvatore that there is nothing magical about 0.05. I disagree that you can use anything you like. I think we should all use 0.99 as the cut-off for determining significance. That way we are all happy because all our research will be significant and therefore publishable. We will only need 2 or 3 replicates and budgeting will be easier. Everyone just has "alternative facts."
It is not a good idea to do research by running an experiment and then adjusting the p-value up until you get the story that you want. That generates a whole new genre of science fiction.
That said, within the context of Salvatore's example he is correct that you can use whatever you like. The preliminary study is part of the methods section, while the primary study would be the results. Just do not try adding a bit more data to the preliminary study. With 3 replicates my lowest p-value is 0.22. Ok, I will add two more replicates, and I then find a p-value of 0.15. That is closer, so I will add 5 more and hope for the best. Victory, I get a p-value of 0.043. I can now stop and publish? You can find a name for this tactic in the paper that I suggested earlier.
Significance level ( alpha ) refer to the probability of type I error for the statistical decision, there are many factors affect the significance include sample size and the error magnitude.
The significant levels 0.05 and 0.01 used widely in the biological research but it is correct to take statistical decisions in other levels according to research field as medicine, pharmacology, ... ) which need high accuracy results and decision.
"A p-value, or statistical significance, does not measure the size of an effect or the importance of a result."
This is #5 of the ASA's statements on p-values. It depends on the context and on the relative impact of wrong decisions (see the enclosed file).
E.g., in medical drug testing: In early phases of working with substances, it would be better to have a larger p-value; when it comes to the stage that the substance should be approved as medical drug, it were wise to have a small p-value.
When, in the same context, you test for side effects, the worse situation is if you miss potential side effects. A null hypothesis of no side effects not rejected may lead to a bad situation later on.
In every case, the type II error or the power of the test is vital to. This power is linked to the effect size.
In modelling, when you try out several models and compare them, e.g., by p-values, then you should not use a small p-value as this would require large data sets to detect that your tentative model approach does not serve its purpose.
What I wanted to say is, p-values are difficult to handle and only to make them small is not a strategy for all problems.
There will be a special conference of the ASA on statistical inference devoted to p-values in the fall of this year, see the link for
Scientific Method for the 21st Century: A World Beyond p < 0.05
This is no such thing as tendency toward significance. I have never seen this in any reasonable statistical reference, paper, book, or other. This is a concept that non-statisticians use when significance at the 0.05 level isn't reached. One may say that a result was significant at the 10% level, but not the 5% level. There is something special about alpha=5% (two-sided), at least in human clinical trials. From Guideline ICH E-9, "Conventionally the probability of type I error is set at 5% or less..."; any change from this will require justification. 5% is special because that's what the regulatory authorities expect to see.
If you mean "trend toward significance", it is meaningless.
A p-value can be presented and let the reader decided what it means; this is the classic difference between significance testing and hypothesis testing. Traditionally, in the early days of development of these topics, there was a distinction; it seems that we have lost the distinction.
Excellent. Please provide citations so that we can see what you are looking at. Please include page numbers and paragraph numbers to make sure that we can find the part of the manuscript that you are focusing on. While many of us have access to a university library, it would be nice/polite if you provide links to pdf files with the research papers in question so that we can find the relevant parts quickly.
As it stands we have answered your question. The short answer was given by David "If you mean "trend toward significance" it is meaningless." James' answer tried to broaden the context to make some sense of the question when he pointed out the connection between p-values and sample size. The others have added important points to consider. However, the bottom line is that the question as asked is meaningless. To make progress we need more context that what has been provided to date.
An alternative: post a program in R code that simulates what you are talking about. My guess is that if you can write the program and look at the output that you will have more effectively answered your own question than anything we could post.
When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. Hypothesis tests are used to test the validity of a claim that is made about a population. This claim that’s on trial, in essence, is called the null hypothesis.
The alternative hypothesis is the one you would believe if the null hypothesis is concluded to be untrue. The evidence in the trial is your data and the statistics that go along with it. All hypothesis tests ultimately use a p-value to weigh the strength of the evidence (what the data are telling you about the population). The p-value is a number between 0 and 1 and interpreted in the following way:
A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis.
A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.
p-values very close to the cutoff (0.05) are considered to be marginal (could go either way). Always report the p-value so your readers can draw their own conclusions.
A trend toward significance is meaningless, just talk about the 95% confidence intervals, for a better understanding of p-values watch Geoff Cumming dance of the p-values .
Obviously no one understood what you mean... Everyone focused on "trend toward significance", it is meaningless...
Regardless if it is meaningless or not, the question is, if I get a P-value of 0.055 or 0.07 or 0.06. Is it correct to interpret this as: there was a tendency (P=0.055 or 0.07 or 0.06 what ever value less than 0.1) for an increase or a decrease or whatever etc...
At what maximum P-value (less than 0.1 and larger than 0.05) can one say, "there was a tendency for..." or "X tended to etc...".
Rabie, again, no one will or can give you a straight answer. At least, it would be great if some of the answers are backed up with a clear example or explanation from a statistics book or so...
To make the story short, go to scholar.google, and type the name of the journal of your interest followed by the words "tendency" or "tended to" (as you can see between quotes " ") in the search box. Narrow you search by selecting papers from the last 10 years or less maybe.
When you find the papers mentioning "tended or tendency", you will find your P value in addition to other information (sampling size, replication etc...)