I have a data set that is extremely significant (p < 0.001) and one that is significant (p < 0.05). Is the difference between these levels of significance significant? Knowing so would help me draw the right conclusions.
Yes I have conclusions based on the global and local trend of the problem at hand but was wondering if more could be interpreted from the differences in significance. Thanks for your input.
I would suggest you to read the guidelines recently drafted by American Psychological Association.This will certainly be helpful in understand the use of P value in your studies. https://www.amstat.org/newsroom/pressreleases/P-ValueStatement.pdf
There are several explanations for the difference between two levels of significance. First is simply by chance. Second is that your samples have different sizes. Third is that your samples have different noise. What I suggest is to perform a post-hoc power analysis to calculate the power of your tests. You may use Gpower (http://www.gpower.hhu.de) or R if you are proficient with this language. The power will show you if you had a large probability to reject the null hypothesis given the sample size, the variance (including noise), and the level of significance. If this probability is very small for p < 0.001 but large for p < 0.05, you may have obtained an extremely odd result in the first experiment. However, I must note that you should set the level of significance a priori, before the experiment, rather than a posteriori, once that you have obtained the p values. Thus, if you set the level of significance to 0.05, both tests agree. Other approaches are to joint both experiments (check for bimodal or multimodal data) or to perform a Bonferroni correction for multiple testing.
Not much should be read into a p-value beyond the rejection (or failure) of the null hypothesis. What counts after that is the substance of the science and anything the results might tell about substance.
Bear in mind that a p-value tells you something about whether you just made a mistake in rejecting the null hypothesis. If the p is smaller then you are less likely to have incorrectly rejected the null.
Having said these standard things it might be worth examining the underlying statistical information of the tests. For a two-sample t-test these would be sample sizes, observed means and their differences and the observed standard errors, both separately and pooled. If some of these differ between experiments it might be a sign of stronger experimental control in one of the experiments. (It is worth examining these numbers anyway, regardless of the p-values.) The obvious place to look is for a difference in the sample sizes, of course.
Keep in mind that p-values are nonlinear in the test statistics, that is, relatively small differences in the numerator (means for a t) or the denominator (standard deviations) can yield apparently large shifts in the p-values. Also keep in mind that the nature of statistical variation is to cause variation in summary statistics and in p-values.
Is the difference of .05 and .001 in p-values interpretable, or even meaningful, on its own? No.
Is it worth asking how this happened? Yes.
Could this difference be due to chance alone? Yes.
Could this difference be due to small differences in the numerator or denominator or both? Yes.
Can such examination be useful in interpreting the results for scientific inference? Possibly.
Can such examination be useful in future research? Probably.
"Bear in mind that a p-value tells you something about whether you just made a mistake in rejecting the null hypothesis. If the p is smaller then you are less likely to have incorrectly rejected the null."
No, that's just plain wrong. Please don't feed a common misconception.
Interestingly you seem to contradict this a bit later when you say that
"Is the difference of .05 and .001 in p-values interpretable, or even meaningful, on its own? No."
Well, if your first statement was right, then p=0.001 would indicate that in this case the probability of a wrong rejection was about 1/50 the probability of a wrong rejection when p=0.05. This would be some meaningful interpretation. But here you are right: that's complete nonsense, and it is nonsense because p-values do not tell you something about the probability of making a wrong rejection!
p=0.001 tells you that data (or the test statistic calculated from the data) more extreme that the one you observed is about 50 times less probable under the null hypothesis than it is for p=0.05. These are statements about the probability of data, and there is nothing said about rejections, mistakes, truths of hypothesis or anythink like that.
The only way to test whether two things are significantly different is to compare them directly. As Gelman (2006) notes, the difference between significant and not significant is not in of itself significant. The same applies to any other pair (e.g., the difference between "significant at α=.05" and "significant at α=.01" is not in of itself significant).