I would argue that if there is a causal relationship between two variables, then there will have to be a correlation between them.
In thinking of this within a simple regression framework, you would have Y = a + b(treatment), where the beta coefficient will tell us whether there is a statistically significant treatment effect of treatment on Y (note that this is assuming that the data are from an RCT, and that indeed the relationship is causal). As such, this relationship also represents a correlation between Y and treatment.
However, if these data are not from an RCT, we would still have the same regression (albeit with added covariates to control for confounding), but we can only assume that the relationship is causal. Thus we may have a correlation between Y and treatment, but the causal relationship is based on the assumption that we have controlled for all sources of confounding something that in practice we can never know). Conservative commentators will argue in this case that we can claim correlation, but not causation, by the fact that the data are not derived from a randomized trial.
Ariel, correct me if I'm wrong here, but I believe there could be relatively rare cases where two factors (e.g. treatment, which I'll call X, and Y) that are causally related could be empirically uncorrelated. This could occur if there are two causal pathways between X and Y mediated by different factors (e.g. X->A->Y and X->B->Y). If one of these pathways has a negative effect, the other has a positive effect and the absolute magnitude of these effects is equal or very nearly equal, we would observe no correlation between X and Y. In reality, such situations will be very rare, but we are left unable to say that causation always implies correlation.
You bring up a good point. From a theoretical standpoint, there would have to be a 100% indirect effect, that is, X impacts M and M impacts Y, but X does not directly impact Y. From a practical standpoint I have never seen such a case where there is absolutely no direct effect of X on Y.
So from a theoretical perspective, there could be a scenario where there is causation but no correlation, however in the real world this is a highly unlikely scenario.
By the way, I am attaching a paper I recently wrote on mediation which discusses these issues in a more specific case .
Ariel
Article Using mediation analysis to identify causal mechanisms in di...
Thank you, Ariel! A very nice paper. I look forward to the increasing application of mediation analyses to diverse fields. On the original question, I will also note that in highly nonlinear systems, we also cannot rely on the assurance that simple correlation will be present when there is a true causal link. Here is a short and accessible editorial from physicist Mark Buchanan discussing the issue.
Appreciate your answers and responses to my quite a naive question Jessica and Ariel.
In non linear relationships or for that matter, in the simplest of a non linear relationship like a quadratic relation of the kind y = a + bx + cx2 the derivative will be of the kind dy = (b + 2cx).dx, a linear kind of function.
What would we say about correlation between y and x.
Dr Gupta, in the quadratic example you pose, whether you could find a correlation would depend on where in the state space and with what tools you are trying to look for one. It should be straightforward to find a quadratic relationship, given that you know this is correct functional form of the relationship. If you want to find a correlation on the linear scale, there are some parts of the curve where this would be possible. But failure to find a linear relationship doesn't mean there's no causation. In the attached graph, there is indeed causation between x and y. I know because I simulated the data as y = 2.5 - x^2 with some stochastic noise added. Yet we can't detect the relationship with linear regression. Of course, in this simple case, a look at the graph would quickly set us on the right path. In more complicated systems, I think it can be more of a challenge to assure we have an appropriate functional form. Is this any help?
My question hints at the challenge that we might face in analyzing big-data where simple linear relationships may not exist. This may be the time to look at other ways of understanding data than the association-correlation-causation-regression methods.
I agree. This an issue I've been thinking about quite a bit lately. It seems to me that the investigation of complex, nonlinear and even chaotic systems is both the promise and the peril of "big data". In epidemiology, we have embraced graphical models, especially directed acyclic graphs (DAGs) as a way to sort through the multitude of interacting associations and improve our estimates of causal links. But I haven't seen them applied quite as widely to dynamical systems.
On moving beyond association, I'm not sure quite how. I don't think I have yet to come upon a empirical method that is not relying on association in form or another to infer causation (not that I've come upon them all). As statistician Edward Tufte said, "Correlation is not causation, but it sure is a hint." Simulation modeling is one way to tackle the problem from the other end, i.e. understand the generative process from first principles making plausible assumptions and see if you can explain observed phenomenon. But that also has its issues.
I see very interesting theoretical work on this issue coming out of the field of ecology. Perhaps this blog post I stumbled upon recently would interest you.
I absolutely agree with Ariel Linden's statement: "..from a theoretical perspective, there could be a scenario where there is causation but no correlation, however in the real world this is a highly unlikely scenario."