Fundamentals 18 min read

Why Causal Relationships Matter: From Prediction to Counterfactuals

Understanding why causal relationships matter reveals the limits of predictive machine learning, introduces counterfactual reasoning, explains potential outcomes, treatment effects, bias, and how to distinguish correlation from causation using simple examples like tablet distribution in schools.

Model Perspective
Model Perspective
Model Perspective
Why Causal Relationships Matter: From Prediction to Counterfactuals

Why Causal Relationships Matter

Machine learning excels at prediction, but it cannot answer counterfactual "what‑if" questions that require causal reasoning. As Ajay Agrawal, Joshua Gans, and Avi Goldfarb note in *Prediction Machine*, AI’s current wave provides powerful prediction tools, not true intelligence.

Prediction works when the problem can be framed as estimating a future outcome (e.g., translating English to Portuguese, detecting faces, or steering an autonomous car). However, ML fails when the data distribution shifts or when we need to know the effect of an intervention.

Such "inverse‑causality" problems demand counterfactual reasoning: what would happen if we changed a price, a diet, a credit limit, or gave every child a tablet? These questions are fundamentally causal and cannot be solved by correlation‑based prediction alone.

Every decision—whether about business revenue, education policy, immigration, or personal choices—poses a causal question. Unfortunately, causal questions are harder to answer because they involve unobservable potential outcomes.

When Correlation Is Causal

Intuitively we know that correlation does not imply causation. For example, schools that provide tablets often have more resources, so their better performance may be due to wealth, not the tablets themselves.

To discuss causality formally we introduce notation: let t denote the treatment (e.g., giving a tablet), and y the observed outcome (e.g., test score). The fundamental problem of causal inference is that we can never observe both the treated and untreated outcome for the same individual.

Two roads diverged in a yellow wood, Sorry I could not travel both And be one traveler, I stood long And looked as far as I could Till the path vanished among the brush;

We therefore consider potential outcomes : y₀ (outcome without treatment) and y₁ (outcome with treatment). The observed outcome is either y₀ or y₁ , never both.

The individual treatment effect is τᵢ = y₁ᵢ - y₀ᵢ , which is unobservable. Instead we estimate aggregate quantities such as the average treatment effect (ATE) :

E[y₁ - y₀]

and the average treatment effect on the treated (ATT) :

E[y₁ - y₀ \mid t = 1]

Consider a toy dataset of four schools, where t indicates whether tablets were provided and y is the test score:

i

y₀

y₁

t

y

te

0

500

450

0

500

-50

1

600

600

0

600

0

2

800

600

1

600

-200

3

700

750

1

750

50

The average of the last column (te) is –50, meaning that, on average, tablets would reduce scores by 50 points in this artificial example. The ATT (average effect among treated schools) is –75.

In reality we cannot observe the missing potential outcomes, so naïvely comparing treated and untreated averages conflates causal effect with bias.

Bias

Bias arises when the treated and untreated groups differ in ways other than the treatment itself. In the tablet example, richer schools are more likely to receive tablets, so their higher scores may stem from wealth, not the tablets.

Formally, the observed difference between groups equals the true treatment effect plus a bias term that captures pre‑treatment differences:

Observed Difference = ATT + Bias

If the groups are comparable before treatment (i.e., the bias term is zero), the observed difference equals the causal effect.

Randomized assignment eliminates bias because treatment is independent of other factors, making the simple difference in means an unbiased estimate of the average causal effect.

Visual illustrations (omitted here) show how bias and treatment effects combine, and how randomization removes the bias component.

Understanding these concepts is the first step toward mastering methods that identify causal relationships and eliminate bias.

Next, we will explore basic techniques for estimating causal effects, starting with randomized experiments as the gold standard.

Source: https://github.com/xieliaing/CausalInferenceIntro

Causal diagram
Causal diagram
Treatment vs control
Treatment vs control
Potential outcomes
Potential outcomes
No bias scenario
No bias scenario
Randomized assignment
Randomized assignment
Machine Learningcausal inferencetreatment effectbiascounterfactuals
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.