We have been told that ‘association does not mean causation’ but in the last 30 years there has been a revolution in our understanding of what we need to make the causal leap when we have observational data, and just as crucially, what not to do. The knowledge has come from a set of varied sources, including machine learning, philosophy, statistics, public health, econometrics, and epidemiology.
Something of the excitement of this work is captured in this website:
http://csm.lshtm.ac.uk/themes/causal-inference/
Here is a short quote:
“Over the last thirty years, however, a formal statistical language has been developed in which causal effects can be unambiguously defined, and the assumptions needed for their estimation clearly stated. This clarity has led to increased awareness of causal pitfalls ..... and the building of a new and extensive toolbox of statistical methods especially designed for making causal inferences from non-ideal data under transparent, less restrictive and more plausible assumptions than were hitherto required. Of course this does not mean that all causal questions can be answered, but at least they can be formally addressed in a quantitative fashion.”
If you want to read an excellent collection that sets out the key papers for observational (that is non experimental data)- see
Davies, P (2014) Data inference in obserbational settings, 1648 pages , 4 volume set in Sage Benchmark series. Yes that is 1648 pages – but they are summarised in this power point overview
For a single paper summarizing recent developments for a social science audience, see
Gangl, M ( 2010) Causal Inference in Sociological Research Annual Review of Sociology Vol. 36: 21-47
However as Troy points out - these developments are not a panacea and might restrict what we study - for a cogent development of this argument see
Deaton, A (2009) Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development, . Proceedings of the British Academy 162: 123-160.
For the social sciences, experiments are difficult to conduct due to ethical issues among other things. In some cases, it is just not possible to do the kind of randomisation and implement the kinds of controls that are required to have a true experiment. This makes causal inferences quite difficult. Even when there is strong correlation, theory is indispensable for making conclusions about a causal relationship. There are several classical examples of this (eg. storks nesting and the birth of human babies). Although there is correlation, there may be a common cause of the phenomena under consideration which when accounted for, explains the previously observed spurious correlation.
Statistical models and techniques are tools that can identify relationships within the data. Almost any two variables that are analysed will show some kind of correlation. Causality is therefore not to be inferred from statistical results. Rather, there must be some basis for believing that the deduced statistical relationships observed even make sense. Theory is indispensable for causality so first of all, social scientists need to really understand their field of study.
While agreeing with Troy about the key role of theory, there do exist methods for reasonably inferring and estimating causal effects from observational data (propensity modeling, counterfactual-based estimation, etc.). The corresponding "cost" is usually a set of fairly restrictive assumptions, though.
We have been told that ‘association does not mean causation’ but in the last 30 years there has been a revolution in our understanding of what we need to make the causal leap when we have observational data, and just as crucially, what not to do. The knowledge has come from a set of varied sources, including machine learning, philosophy, statistics, public health, econometrics, and epidemiology.
Something of the excitement of this work is captured in this website:
http://csm.lshtm.ac.uk/themes/causal-inference/
Here is a short quote:
“Over the last thirty years, however, a formal statistical language has been developed in which causal effects can be unambiguously defined, and the assumptions needed for their estimation clearly stated. This clarity has led to increased awareness of causal pitfalls ..... and the building of a new and extensive toolbox of statistical methods especially designed for making causal inferences from non-ideal data under transparent, less restrictive and more plausible assumptions than were hitherto required. Of course this does not mean that all causal questions can be answered, but at least they can be formally addressed in a quantitative fashion.”
If you want to read an excellent collection that sets out the key papers for observational (that is non experimental data)- see
Davies, P (2014) Data inference in obserbational settings, 1648 pages , 4 volume set in Sage Benchmark series. Yes that is 1648 pages – but they are summarised in this power point overview
For a single paper summarizing recent developments for a social science audience, see
Gangl, M ( 2010) Causal Inference in Sociological Research Annual Review of Sociology Vol. 36: 21-47
However as Troy points out - these developments are not a panacea and might restrict what we study - for a cogent development of this argument see
Deaton, A (2009) Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development, . Proceedings of the British Academy 162: 123-160.
Some very informing papers you have shared with us... I will really interested in your comment. Kelvyn.. 'Of course this does not mean that all causal questions can be answered, but at least they can be formally addressed in a quantitative fashion.'
Yes, not EVERYTHING can be answered through numbers!?