Artificial Intelligence 33 min read

A Review of Causal Inference Methods: Potential Outcomes, Structural Causal Models, and Recent Advances

This article reviews the two main streams of causal inference—potential‑outcome (Rubin) models and structural causal (Pearl) diagrams—covers classic techniques such as A/B testing, instrumental variables, matching, difference‑in‑differences, synthetic controls, matrix completion, heterogeneous treatment effect estimation, and discusses modern machine‑learning‑based approaches and causal discovery algorithms.

DataFunTalk
DataFunTalk
DataFunTalk
A Review of Causal Inference Methods: Potential Outcomes, Structural Causal Models, and Recent Advances

Causal inference is a crucial branch of data science used for product iteration, algorithm evaluation, and incentive design, aiming to estimate the effect of interventions by combining data, experiments, or statistical models. The article reviews the two dominant paradigms: the Potential Outcomes Model (Rubin Causal Model) and the Structural Causal Model (Causal Diagrams), summarizing their core ideas, assumptions, and recent methodological developments.

1. Potential Outcomes Model

The potential‑outcome framework defines for each unit i a treatment indicator T_i (1 for treated, 0 for control) and two potential outcomes Y_i(1) and Y_i(0), of which only one is observed. The average treatment effect (ATE) is E[Y_i(1)‑Y_i(0)]. Estimating ATE is challenging because we only observe one potential outcome per unit, and selection bias may arise when treatment assignment is not random. The article discusses A/B testing as the most common implementation, emphasizing the Stable Unit Treatment Value Assumption (SUTVA) and the need to address interference, budget constraints, and split‑traffic designs.

2. Instrumental Variable (IV) Methods

IV methods address endogeneity in linear regression by using a variable Z that affects the treatment X but is independent of unobserved confounders U. Two‑stage least squares (2SLS) is the standard estimator. The article notes the risk of weak instruments and mentions Deep IV, which replaces the two stages with deep neural networks to relax parametric assumptions. Real‑world examples include LinkedIn’s network‑sampling experiments and Airbnb’s two‑sided platform designs.

3. Matching Methods

Matching creates comparable treated and control groups by pairing units with similar covariates (e.g., coarse exact matching or propensity‑score matching). It is a non‑parametric approach that mimics randomized experiments and can be combined with difference‑in‑differences for low‑penetration features.

4. Panel‑Data Methods

Traditional panel‑data techniques include Difference‑in‑Differences (DiD) with the regression model y_it = α_0 + α_1 Treat_i + α_2 Post_t + α_3 Treat_i·Post_t + ε_it, where α_3 estimates the causal effect under the parallel‑trend assumption. Extensions such as two‑way fixed effects, triple‑difference, synthetic control, and synthetic DiD (SDID) are discussed, highlighting their assumptions and recent literature (Abadie et al., Arkhangelsky et al.). Matrix‑completion methods are introduced as a way to recover missing counterfactuals when treatment timing varies across units.

5. Heterogeneous Treatment Effect (HTE) Estimation

HTE methods aim to estimate individual treatment effects τ_i = Y_i(1)‑Y_i(0) or conditional average treatment effects CATE τ(x) = E[Y(1)‑Y(0) | X=x]. Approaches include causal forests (non‑parametric trees that split on covariates to maximize treatment‑effect heterogeneity), meta‑learners (T‑Learner, S‑Learner, X‑Learner), and double‑machine‑learning (DML) / doubly‑robust (DRL) frameworks implemented in Microsoft’s EconML library. Practical guidance on model selection, regularization, and statistical inference is provided.

6. Causal Discovery Algorithms

When the causal graph is unknown, two families of algorithms are used: constraint‑based (e.g., PC, IC) that test conditional independencies, and score‑based (e.g., NOTEARS, CGNN) that optimize a differentiable score. Assumptions such as Causal Markov, Causal Sufficiency, and Causal Faithfulness are required. Recent work combines reinforcement learning and deep generative models for scalable causal discovery.

7. Stable Learning and Future Directions

Stable learning seeks models whose performance is invariant across environments, linking causal structure to out‑of‑distribution robustness. The article cites recent research on stable learning, its connections to causal inference, and applications in recommendation systems, autonomous driving, and NLP, emphasizing the growing synergy between causality and machine learning.

Overall, the piece provides a concise yet comprehensive overview of causal inference techniques, their theoretical foundations, practical implementations, and emerging research trends.

machine learningA/B testingcausal inferencetreatment effecteconometrics
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.