Cases when the effect (including an interaction effect) is exactly zero very rarely occur, thus the question is if we can prove that the effect size is practically negligible. In other words instead of testing for difference (a non-zero effect) a test for equivalence is needed. Define the positive and negative thresholds around zero and perform an equivalence power analysis to show that the interaction effect is between these thresholds, i.e. practically zero. This is the same as having a confidence interval lying fully between the thresholds (if the methods of the significance test and the CI match).
A remark on "never getting the effect": if there is a statistical interaction or not depends on the measurement scales of the (two or more) predictors the interaction of is studied. For example if two continuous variables are NOT in interaction using their original scales, then their log transformed variants will be in interaction. So in my opinion showing no interaction also means finding the appropriate measurement scales.
If you have the appropriate measurement scales you can declare that there is no interaction if you powered your study for equivalence and the study results confirmed it; alternatively you can declare that "you will never get the effect" if your power analysis demonstrates that showing the interaction with a test for difference would require a sample size that is practically infeasible.
Power analyses are performed before data collection. The wording "is not significant" suggests that you may already have the data. In this case look at the confidence interval after appropriate transformations and compare it to the relevance thresholds.
Cases when the effect (including an interaction effect) is exactly zero very rarely occur, thus the question is if we can prove that the effect size is practically negligible. In other words instead of testing for difference (a non-zero effect) a test for equivalence is needed. Define the positive and negative thresholds around zero and perform an equivalence power analysis to show that the interaction effect is between these thresholds, i.e. practically zero. This is the same as having a confidence interval lying fully between the thresholds (if the methods of the significance test and the CI match).
A remark on "never getting the effect": if there is a statistical interaction or not depends on the measurement scales of the (two or more) predictors the interaction of is studied. For example if two continuous variables are NOT in interaction using their original scales, then their log transformed variants will be in interaction. So in my opinion showing no interaction also means finding the appropriate measurement scales.
If you have the appropriate measurement scales you can declare that there is no interaction if you powered your study for equivalence and the study results confirmed it; alternatively you can declare that "you will never get the effect" if your power analysis demonstrates that showing the interaction with a test for difference would require a sample size that is practically infeasible.
Power analyses are performed before data collection. The wording "is not significant" suggests that you may already have the data. In this case look at the confidence interval after appropriate transformations and compare it to the relevance thresholds.
As Gabor already indicated: there is nothing such as an exact null effect. This has as a consequence that you will always find a statistically significant effect if yoyr sample size is large enough, see for example [1]
What you actually have to do is define for yourself what a practically or theoretically relevant effect size would be. Ziliak and McColskey call this the oomph [2]. Then, perhaps you are already there with your current data, when the confidence limits of your current estimate fall inside the "irrelevant" range.
[1] White, John Myles. Criticism 5 of NHST: p-Values Measure Effort, Not Truth. http://www.johnmyleswhite.com/notebook/2012/07/17/criticism-5-of-nhst-p-values-measure-effort-not-truth/
[2] Ziliak, S., & McCloskey, D. N. (2009). The cult of statistical significance. How the Standard Error Costs Us Jobs, Justice, and Lives.
Have you already dtermined/calculated the power of your experiment?
Some furmulae allow you to increase the N, keeping other parameters constant.
If the power does not change, then it will be a demonstration that continuing won't change anything in principle.
I would also examine the possibility of *bootstrapping* that is performing a statistical simulation sampling from the data you already have obtained (you can also add noise) to see what happens with the parameters of the two populations compared.
Gabor and Martin are giving you very good advice. Given enough power, you can always show a very small effect as significant. What you want to do is to show your readership, in case you fail to reject, the precision of your interaction to convince them that there is no reason to believe interaction would ever be of importance.
Use the Operating Characteristic Curves appropriate for the experiment you have. If you did the right power analysis inititally for the experiment, then your sample size should reflect the power to be 80% or greater. Next, if the relation is proven to be not significant, you simply argue that a larger sample size will produce no greater significance in the result, in other words, if you have proven the null hypothesis at 90% power, then adding a few more samples to the mix will not increase the resulting power by much. This is the law of diminishing returns in statistical analysis; hence, the reason of doing a sample size analysis before any experiment, you want to find the sample size that will clearly show whatever result; but, at the same time, you don't want to use more samples than are absolutely necessary, due to the cost, time and etc...
Statistical significance of an effect only answers the question "Is my effect measuring tool good enough to measure a non-zero effect." The "effect measuring tool" is the combination of collected data and statistical test employed. Whether the measured non-zero effect is important is entirely dependent on the application. As another comment points out, exactly zero effects are very rare so a "better measuring tool" is always able to find a non-zero effect. Significance is a statement about your tool and not about the application.
Since you expressed interest in the simplest way to convince your readers, I am wondering if you have considered simply reporting (and preferably illustrating) the confidence interval for the estimate of the interaction and noting that the width of confidence intervals is proportional to 1/sqrt(N). These two facts probably will make it apparent to the reader that even a very large increase in sample size would not change the result. (This is akin to David Gomez's suggestion but less likely to raise objections from some readers.) Of course, an equivalence argument is definitive but also a little more complex and some will quibble with the arbitrariness of the equivalence interval you specify.
If we talk about p-Values, you cannot prove that you will never get an effect, because if you test inf participants - you will!
Rather, you might want to look at effect sizes (η²). For instance, a minimum-effect null-hypothesis would state that an effect is practically negligible if η² < 0,01 (less than 1% of explained variance).
If you are able to understand german, have a look at this:
Murphy, K. R., & Myors, B. (2004). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests (2nd ed.).Mahwah, NJ: Lawrence Erlbaum Associates.
Not sure if this has already been "clearly" pointed out, but if you increase sample size, you are at some point garanteed to reject the null hypothesis of no effect. The slightest difference can be found to be statistically significant, given a large enough sample. What you want to demonstrate is practicaly impossible in the natural world.
For calculating statistical power one could simply simulate data according to deisred parameters and conduct the test a large number of time and see the proportion of the trial where the test correctly rejects the null hypothesis. The way of simulation makes it more easy to apply statistical power analysis on unconventional tests and maybe on things like interraction terms
I agree with the others. I don't think a power analysis at this point is appropriate nor will it provide the information that you really want to show. If you want to show the likelihood of replicating your results, I would suggest that you report the p-rep value (see Killeen, 2005 attached). Feel free to contact me directly if you have any specific questions. Good luck.
Pardon me, but I disagree. Power analysis is a possible solution to Catherine's problem (see my comment above).
Roughly, it works like this:
1) you have to define a "minimal effect size" ... if your empirical effect is smaller, you can consider it as practically negligible, even if it is statistically significant.
2) you have to test a lot of participants. How large your sample needs to be depends on your design (degrees of freedoms) and of course, your predefined "minimal effect size".
You can lookup the optimal sample size in a table here:
Bortz J, Döring N. 2006. Forschungsmethoden und Evaluation für Human- und Sozialwissenschaftler. 4th ed. Berlin, Heidelberg, New York: Springer.
The power analysis approach is quite useful for applied research, because there it makes sense to consider whether an effect size is worth the effort of an intervention. For instance, if you want to sell an expensive drug against a deadly disease, few people will buy it if they survive 3mins longer relative to baseline (although 3mins may be a highly significant increase in survival time if you had tested a million people).
I don't know whether power analysis is accepted in fundamental research, though. Maybe it is worth to give it a try.
Noelle and Gabor, the p-rep calculation that Killeen developed was temporarily mandatory for Psychological Science papers. But then this commentary and response came out and now it's not anymorre.
Iverson, G. J., Lee, M. D., & Wagenmakers, E.-J. (2009). Prep misestimates the probability of replication. Psychonomic Bulletin & Review, 16(2), 424-429.
Lecoutre, B., & Killeen, P. R. (2010). Replication is not coincidence: Reply to iverson, lee, and wagenmakers (2009). Psychonomic Bulletin & Review, 17(2), 263-269. doi: 10.3758/PBR.17.2.263
Iverson, G. J., Lee, M. D., & Wagenmakers, E.-J. (2010). The random effects prep continues to mispredict the probability of replication. Psychonomic Bulletin & Review, 17(2), 270-272. doi: 10.3758/PBR.17.2.270
While I think that the answers already given cover the ground quite well, I would direct your attention to the counternull statistic. Any GLM statistic value is associated with a range of possible values. One can use the size of the observed confidence interval to generate a statistical conclusion of the form .... I am x% certain that the observed value is less than y. Now instead of arguing about the possibility of true null effects, you can evaluate your observations against sizes that may be clearly unimportant in your context.
The best explication of the counter null I have found is in Rubin & Rosenthal's book on Contrast Analysis. There are also some easily accessible articles. e.g. doi:10.1111/j.1467-9280.1994.tb00281.x
Personally I think Bayes factors could be useful here, if you are genuinely interested in obtaining evidence for the null as opposed to not obtaining a significant effect (of course this depends on your theory). Bayes Factors for a 2 x 2 interaction aren't too tricky to calculate and I can point you to some relevant material if you're interested.
Apparently you have decided what the result should be.
I don't condone prejudice, simply put - pre-judgement. Anyway, the answer is fairly straightforward. A "power-analysis" is generally employed in the design of a study. Apparently, the study is done, so one needs only generate a valid confidence interval based on the data and if it includes the point of "no-effect" or overlaps with the region of "no-effect," then you are done. If you are still accruing patients or whatever and want to show futility, then that is another discussion.
As other commentators have pointed out, it may be fruitless to try and prove that you'll "never get the effect". However, it is of course possible to test if the lack of a significant effect may be due to a lack of power.
To do so, you may compute a post-hoc power analysis. That is, you calculate power based on an assumed population effect size, the actual N of your study, and the alpha level of your significance test. Regarding the population effect size, Jacob Cohen has introduced conventional values for "small", "medium", and "large" effect sizes in his book "Statistical power analysis for the behavioral sciences".
A relatively simple way of doing the calculations you need is to use G*Power, a software that is free for non-commercial use (download here: http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/download-and-register ).
Once you have started the program, choose the kind of statistical test you are applying and in the field "type of power analysis" choose "post-hoc". Then enter the input parameters, including the assumed population effect size. When you drag the mouse pointer over the effect size field, the program will show you Cohen's conventions for small, medium, and large effects. Of course you may wish to test all three possibilities.
Your final report may then look like this: "To examine if the absence of a significant interaction effect may be due to low power, I conducted a post-hoc power analysis using G*Power (ref.). This analysis showed that, for detecting a medium-sized population effect (f = .25; see Cohen, 1988) at an alpha-level of .05 (two-tailed), effective power was .xx."
Of course you wish ".xx" to be as large as possible. :o)
Power analysis will give you added room and you can learn about power best by diagramming the normal curvve and drawing in lines for where the power gives you extra room on your graph..Perhaps, this is not what you are asking but the more you increase your sample the greter power and eventually if you keep testing larger and larger samples you will end up with a normal bell curve. This is inevitable and will definitely be your result if you continually increase the sample size.
Best,
Patrice
Khan academy will answer your question with a video discussion.This will clarify this issue for you.
Go to SAS forums but first look up Jeff, Kromrey USF and macro's to find power. He is a specialist on power. It is described well in our text for Doctoral, however look up Dr. [email protected]. He will help you. He usually has us draw out the normal curve and the normal variance and then put in the power line and see the added room. Look up a video and it is really clear. By tomorrow I will get a video url for you if you do not have your answer yet.
You could also use MorePower 6.0 (Campbell & Thompson, 2012). MorePower 6.0 computes sample size, effect size and power statistics for a specified ANOVA effect. It also calculates confidence intervals for the effect based on formulas from Jarmasz and Hollands (2009), as well as Bayesian posterior probabilities for the null and alternative hypotheses using the Bayesian Information Criterion (Masson, 2011; Wagenmakers, 2007). The program affords a straightforward comparison of these alternative approaches to interpretation of ANOVA. MorePower 6.0 is freely available at https://wiki.usask.ca/pages/viewpageattachments.action?pageId=420413544
The MorePower calculator is very easy to use, but let me know if you have any questions.
Cheers
Jamie
Campbell, J. I. D., & Thompson, V. A. (2012). MorePower 6.0 for ANOVA with relational confidence intervals and Bayesian analysis. Behavior Research Methods, 44, 1255-1265. doi: 10.3758/s13428-012-0186-0
Dr.Kromrey my Professor is a specialist or,obsessed with power analysis. However, if your results are not significant you need to look at the research design and look at more than that your power was good. It does not really matter to the world if you have great power...the point of having more power is to get a significant result. I think it sounds like you need to star with your sample size and how your experiment is designed. When the design of the study is appropriate you most probably will get better results and a good study should always augment the power as a given.
Sorry was answering a slightly different question. If you increase your sample size you are going to get more power and the results will eventually produce a normal curve as when you increase the same to enormous for instance the central limit theorem will help you. You need to increase your sample size and you also need to add another variable which will give you more leverage in your design and therefore more power.
@Patrice : My pojnt was to convince a reviewer that the fact that my lack of interaction is not due to the fact that I did not test enough participants.
Post hoc power is often not very convincing though. You could try to find out if there is a simple Bayes factor calculator for your test, then you can just compute a probability that there is no effect.