Algorithm to determine causality between two variables?

Hello, Georgi Hristov

Causality is a complex and challenging topic in statistics and data science, and there is no single method that can prove and establish the causality mathematically. However, there are some approaches that can provide evidence or support for causal claims, depending on the type and quality of the data available.

One approach is to use **experimental methods**, such as randomized controlled trials (RCTs), where you manipulate one variable (the treatment) and observe its effect on another variable (the outcome), while controlling for other confounding factors. This can provide strong evidence for causality, but it may not be feasible or ethical in some situations.

Another approach is to use observational methods, such as regression analysis, where you model the relationship between two variables using data that are collected without intervention. This can provide some evidence for causality, but it may be subject to bias, confounding, or reverse causality. To address these issues, you may need to use additional techniques, such as:

- Instrumental variables (IV): These are variables that are correlated with the treatment variable, but not with the outcome variable or any confounder. They can be used as proxies for the treatment variable to estimate its causal effect on the outcome variable.

- Propensity score matching (PSM): This is a technique that matches units (individuals, groups, etc.) that have similar probabilities of receiving the treatment, based on their observed characteristics. This can reduce the imbalance between the treatment and control groups and improve the comparability of the outcomes.

- Difference-in-differences (DID): This is a technique that compares the changes in outcomes between two groups (treatment and control) before and after an intervention. This can account for time-invariant confounders and measure the causal effect of the intervention.

- Granger causality: This is a technique that tests whether the past values of one variable can help predict the future values of another variable, using time series data. This can indicate whether there is a causal direction between the two variables.

A third approach is to use graphical methods, such as causal diagrams or directed acyclic graphs (DAGs), where you represent the variables and their causal relationships using nodes and arrows. This can help you visualize and understand the causal structure of the data, identify potential confounders or mediators, and design appropriate methods for causal inference.

Vinícius Litvinoff Justus

In addition to some nice answers above, I want to recommend the following book:

Glymour, M., Pearl, J., & Jewell, N. P. (2016). Causal inference in statistics: A primer. John Wiley & Sons.

This book offers a good brief introduction to causal inference in a "intuitive and theoretical way". This will help you to understand concepts such as confounding variable.

In the practical context, see the methods recommended above, such as instrumental variables.

Good luck!

David L Morgan

With only two variables, you are essentially limited to a correlation -- which is exactly what it says it is (i.e., not causality).

Georgi Hristov

Thank you, David L Morgan ! What about multiple input variables and one target variable?

Alun Wyn-jones

In addition to Antonov's excellent answer, you might also consider correlating the suspected dependent variable against a time-lagged time series of your causal variable. For example, suppose you suspect an increase in interest rates leads to increased unemployment, but that it takes 3 months for the increase to take effect, then you would look for a higher correlation between interest rates 3 months in the past with the current unemployment, and check that this lag leads to the highest correlation.

Thank you, Vinícius Litvinoff Justus !

Is this a facetotecta nauplius?

Following click reaction in cell lysates, protein is immobile and remains at the top of the gel in SDS-PAGE?

Which filtration method to go for run off water from dirty solar panels to be used again?

When making abraxane with Nab tech, why is 9:1 Chloroform&EtOH used as solvents?

Can diamond be grown using molecular beam epitaxy?

How can I use LabVIEW to control of an ethernet-based driver?

Systematic review meta-analysis paper?

Do the SHSY5Y cells tend to form multiple layers over each other when highly confluent?

Adding isopropanol directly to eluted RNA samples?

HI, How to find phase response in CST?

Feedback defines the constitution of an organism?

How to learn more about SPSS and its Application?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?

Need help with my research project on open source SIEM and machine learning?

Swimming/space travel depends on the proprioceptive muscle spindles?

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?