Fundamentals 21 min read

Why Adding Non‑Confounding Controls Can Boost Causal Estimates (And When They Hurt)

This article explains how adding covariates that are not confounders can reduce outcome variance and improve causal inference, while controlling for variables that only predict treatment may introduce selection bias and inflate estimation error.

Model Perspective
Model Perspective
Model Perspective
Why Adding Non‑Confounding Controls Can Boost Causal Estimates (And When They Hurt)

Considerations Beyond Controlling Confounders

Good Controls

We have seen that adding extra covariates to a regression can help identify causal effects, but only if those covariates are not themselves confounders. In the era of big data it is easy to throw hundreds of variables into a model, which is often unnecessary and can even hurt identification. We therefore turn our attention to non‑confounding controls and illustrate both good and problematic cases.

Case 1

Imagine you are a data scientist at a fintech debt‑collection team and want to know the effect of sending an email asking borrowers to negotiate their debt. The outcome is the amount paid by overdue customers.

The team randomly selects 5,000 overdue customers, flips a coin for each, and sends the email to heads; the others form the control group. Because the assignment is random, the simple difference in means estimates the average treatment effect (ATE).

Using a linear regression with confidence intervals yields the plot below, but the estimated ATE is –0.62, a puzzling negative effect with a high p‑value.

Credit limit and risk score are strong predictors of payment, but they are not confounders. Adding them as controls can reduce the variance of the outcome and make the treatment effect easier to detect.

We first regress payment on credit limit and risk score, then regress the residuals on the treatment indicator. This two‑step approach lowers the residual variance from 10807 to 5652.

<code>Payments Variance 10807.61241599994
Payments Residual Variance 5652.453558466197
Email Variance 0.24991536000001294
Email Residual Variance 0.24918421069820038</code>

After controlling for credit limit and risk, the estimated ATE becomes +4.4 with a statistically significant confidence interval.

In practice, simply adding the control variables to the regression yields the same estimate.

Case 2

Consider two hospitals running randomized trials of a new drug that reduces hospital stay. Hospital A treats 90 % of patients, Hospital B treats 10 %. Hospital A also tends to receive more severe cases.

Regressing length of stay on treatment alone produces a misleading positive effect because severity is a confounder.

Including severity as a covariate removes the bias. Adding hospital as a covariate, however, inflates variance because hospital predicts treatment but not outcome once severity is controlled.

The key lesson: add variables that predict the outcome (even if they are not confounders) to reduce variance, but avoid variables that only predict treatment, as they increase variance and can introduce selection bias.

Bad Controls – Selection Bias

We return to the debt‑collection example. Variables such as “opened” (whether the email was opened) and “payment agreement” are downstream of the treatment; controlling for them blocks part of the causal pathway and biases the estimate.

Selection bias arises when we control for a variable that lies on the causal path from treatment to outcome or that is a common effect of both.

Examples of common pitfalls include:

<code>1. Adding a dummy for full debt repayment when estimating a collection strategy.
2. Controlling for occupation when estimating education’s effect on income.
3. Controlling for loan conversion when estimating interest‑rate impact on loan duration.
4. Controlling for marital happiness when estimating children’s effect on infidelity.
5. Splitting expenditure into a participation model and a positive‑outcome model (the “Bad COP”).</code>

The “Bad COP” illustrates how decomposing a randomised outcome into two conditional models can introduce bias by removing zero‑outcome observations, making treatment and control incomparable.

In summary, always include true confounders and strong predictors of the outcome, but never control for variables that are only mediators or that predict treatment without improving outcome prediction.

Key Takeaways

Non‑confounding controls that predict the outcome reduce variance and increase power.

Controls that only predict treatment increase variance and can create selection bias.

Never condition on variables that lie on the causal path from treatment to outcome.

Source : https://github.com/xieliaing/CausalInferenceIntro

regressioncausal inferenceselection biasVariance Reductioncontrol variablesobservational study
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.