Causality is a complex and challenging topic in statistics and data science, and there is no single method that can prove and establish the causality mathematically. However, there are some approaches that can provide evidence or support for causal claims, depending on the type and quality of the data available.
One approach is to use **experimental methods**, such as randomized controlled trials (RCTs), where you manipulate one variable (the treatment) and observe its effect on another variable (the outcome), while controlling for other confounding factors. This can provide strong evidence for causality, but it may not be feasible or ethical in some situations.
Another approach is to use observational methods, such as regression analysis, where you model the relationship between two variables using data that are collected without intervention. This can provide some evidence for causality, but it may be subject to bias, confounding, or reverse causality. To address these issues, you may need to use additional techniques, such as:
- Instrumental variables (IV): These are variables that are correlated with the treatment variable, but not with the outcome variable or any confounder. They can be used as proxies for the treatment variable to estimate its causal effect on the outcome variable.
- Propensity score matching (PSM): This is a technique that matches units (individuals, groups, etc.) that have similar probabilities of receiving the treatment, based on their observed characteristics. This can reduce the imbalance between the treatment and control groups and improve the comparability of the outcomes.
- Difference-in-differences (DID): This is a technique that compares the changes in outcomes between two groups (treatment and control) before and after an intervention. This can account for time-invariant confounders and measure the causal effect of the intervention.
- Granger causality: This is a technique that tests whether the past values of one variable can help predict the future values of another variable, using time series data. This can indicate whether there is a causal direction between the two variables.
A third approach is to use graphical methods, such as causal diagrams or directed acyclic graphs (DAGs), where you represent the variables and their causal relationships using nodes and arrows. This can help you visualize and understand the causal structure of the data, identify potential confounders or mediators, and design appropriate methods for causal inference.
In addition to Antonov's excellent answer, you might also consider correlating the suspected dependent variable against a time-lagged time series of your causal variable. For example, suppose you suspect an increase in interest rates leads to increased unemployment, but that it takes 3 months for the increase to take effect, then you would look for a higher correlation between interest rates 3 months in the past with the current unemployment, and check that this lag leads to the highest correlation.
In addition to some nice answers above, I want to recommend the following book:
Glymour, M., Pearl, J., & Jewell, N. P. (2016). Causal inference in statistics: A primer. John Wiley & Sons.
This book offers a good brief introduction to causal inference in a "intuitive and theoretical way". This will help you to understand concepts such as confounding variable.
In the practical context, see the methods recommended above, such as instrumental variables.