Journals certainly ought to consider failures to replicate, but whoever is involved needs to consider some key points.
One is the effect size and the issue of power. An effect found with n = x and a p vlaue close to .05 has little greater than 50% of replicating. That is, if the true distribution of sample effects is centred near the cut-off value compared to the null, researchers sampling in that area are going to have p values > .05 a fair bit of the time. Related to this point, I suspect the shoestring budgets in psychology bias findings toward large effect sizes that are difficult to replicate
Another issue is the occasionally adversarial tone in the debates. Like it or not, replication is haunted by the possibility of fraud, especially given evidence that fraud exists (Stapel et al.), not to mention dubious tactics around selecting effects to write up for publication. But an effect can be bounded by cultural or historical context as well as experimental conditions, which means failure to replicate can generate thoughtful discussion as long as it isn't derailed by the taint of cheating.
Finally, the logic of decision-making around our statistics works against replication. In the logical framework we generally use, failure to replicate is failure to reject the null, which is not "evidence of absence." It's an inherently weaker position compared to the researcher with an effect size in her pocket. Another way to frame that issue is the relative difficulty of finding a counterfactual. Confidence intervals are one way to go about it, but that's a demanding standard for establishing failure to replicate.
I can see why journals would be reluctant to publish failures to replicate with those conditions, but I do think we should try. I like Deak's reference to Psych File Drawer above. Perhaps if replication efforts were routinely stored there, we could periodically review them and arrive at a more accurate estimate of the true effect size. Meta-analysis is a good master's thesis topic!
I would not hesitate to try to publish an outright attempt to replicate regardless of how it came out. Currently this is such a "hot" topic due to well-publicized failures to replicate hard-to-believe (and ultimately spurious) findings by Bem, Bargh and others that I think editors now will be amenable to publishing replications of controversial studies. Of course the replications should be as good or better than the original: I've seen a few sad examples where a student tried to replicate a finding that no one cared about, using the same flawed method. Talk about a waste of time!
I would go further and urge editors to give publication priority to well-conducted replications of findings that were originally published in their journal, especially (yes, especially) if the findings did not replicate. In other words, journals should be obliged to clean up their own messes. I can think of a recent case where a group did an improved version of a study and ended up debunking a crazy claim about infant social knowledge that had been published in Nature. Nature was originally going to publish the better-controlled debunking study, but after talking to (possibly being pressured by?) the authors of the original study, declined to publish. It was thereafter published in a more reputable journal. But I think that sort of thing constitutes scientific dishonesty by a journal.
At any rate, another option, if journal publication proves difficult, is to submit findings to Psych File Drawer, an open repository of replication attempts. This is a good idea. See http://www.psychfiledrawer.org/.
Journals certainly ought to consider failures to replicate, but whoever is involved needs to consider some key points.
One is the effect size and the issue of power. An effect found with n = x and a p vlaue close to .05 has little greater than 50% of replicating. That is, if the true distribution of sample effects is centred near the cut-off value compared to the null, researchers sampling in that area are going to have p values > .05 a fair bit of the time. Related to this point, I suspect the shoestring budgets in psychology bias findings toward large effect sizes that are difficult to replicate
Another issue is the occasionally adversarial tone in the debates. Like it or not, replication is haunted by the possibility of fraud, especially given evidence that fraud exists (Stapel et al.), not to mention dubious tactics around selecting effects to write up for publication. But an effect can be bounded by cultural or historical context as well as experimental conditions, which means failure to replicate can generate thoughtful discussion as long as it isn't derailed by the taint of cheating.
Finally, the logic of decision-making around our statistics works against replication. In the logical framework we generally use, failure to replicate is failure to reject the null, which is not "evidence of absence." It's an inherently weaker position compared to the researcher with an effect size in her pocket. Another way to frame that issue is the relative difficulty of finding a counterfactual. Confidence intervals are one way to go about it, but that's a demanding standard for establishing failure to replicate.
I can see why journals would be reluctant to publish failures to replicate with those conditions, but I do think we should try. I like Deak's reference to Psych File Drawer above. Perhaps if replication efforts were routinely stored there, we could periodically review them and arrive at a more accurate estimate of the true effect size. Meta-analysis is a good master's thesis topic!
I like to replicate the original study before conducting my research, and also ask my students to do it. However, most of our replications are failed and became file drawer. Years ago, I tried to submitted 5 failed replications to a journal. The editor rejected it immediately, he wrote:” null results are typically not publishable in psychology”. I believed this is why people hesitate to submit failed replication. As far as I know, psychologists are trying to improve this since the scandal of Netherland Psychologist Stapel. However, I am not sure if these efforts can really improve the progress of psychology. I am interested in the issue very much. Anyone is very welcome to contact and discuss with me.
That is a maddening response from your editor, not least because it is tautological: we don't publish replications because replications aren't published.
I think the best approach is to frame the work as an effort to find the best estimate of the putative effect size, rather than as replication. There is a paper I very much liked in Psy Methods by Schmidt about systematic review and building power (unhelpfully, I don't have it to hand right now). He wasn't concerned especially with failure to replicate, but I find the paper instructive for that purpose.
What aggregation of multiple studies does is get at under what conditions the effect exists or doesn't exist. For a crude example, think of the cross-cultural efforts to replicate the "fundamental" attribution error. It is more culturally-bound than originally thought, but we actually have some sense of when and why it exists, which is an advance in knowledge. If an effect only exists under rare circumstances, then perhaps it isn't a meaningful effect.
As Matthew said" What aggregation of multiple studies does is get at under what conditions the effect exists or doesn't exist." However, what bother us is that most papers available for estimating effect size is published paper, which often with larger (or even false positive) effect size. So, although I agree with encouraging aggregation of multiple studies (e.g., meta-analysis), I believed that this make little advance for the situation of psychology. Unless encouraging and publishing replication, there will be always false positive in psychology.
I have a manuscript that I am currently writing up, that is completely opposite to what one of the main papers have published.. but then, I am not going to stop, because I have my reasons to publish my results and I have validation to my investigative finding. I think by publishing your finding completely opposite to what the others have published, you may open doors to an alternate thinking.. but we need to be clear on our point/stay focused and ensure we clearly explain our findings, and validate our hypothesis.
Lakshmi - by "opposite", do you mean you have statistically significant effects in the other direction to previous work? I am sure your paper will receive scrutiny, as a challenge to the status quo. But, in your position, I would be hopeful about publication, at least compared to a set of null results (failure to replicate).