Fundamentals 6 min read

Why Identical Statistics Can Hide Very Different Data: The Lesson of Anscombe’s Quartet

Anscombe’s Quartet shows that four data sets can share identical means, variances, regression lines and correlation coefficients yet display completely different scatter‑plot shapes, highlighting why visualisation is crucial and why relying only on summary statistics can mislead analysts.

Model Perspective
Model Perspective
Model Perspective
Why Identical Statistics Can Hide Very Different Data: The Lesson of Anscombe’s Quartet

Anscombe’s Quartet

Anscombe’s Quartet, introduced by statistician Francis Anscombe in 1973, consists of four distinct 2‑dimensional data sets, each containing 11 (x, y) pairs. Although their statistical summaries—means, variances, regression line y = 3 + 0.5x, and correlation coefficient ≈0.67—are virtually identical, their visual patterns differ dramatically.

Statistical characteristics shared by the four sets:

Mean : average x = 9, average y = 7.5.

Variance : x variance ≈ 11, y variance ≈ 4.1.

Linear regression : same regression equation y = 3 + 0.5x with r² ≈ 0.67.

Plotting Reveals the Truth

Scatter‑plot visualizations show distinct shapes:

Dataset 1: points lie close to a straight line, a typical linear distribution.

Dataset 2: despite the same regression result, points follow a clear curved pattern, exposing a non‑linear relationship.

Dataset 3: most points align on a line but one obvious outlier heavily influences the regression.

Dataset 4: all points share the same x value, offering virtually no horizontal variation; the regression line is misleading because a single special point forces the same equation as the other sets.

Takeaway

The quartet demonstrates that relying solely on summary statistics such as means, variances, or correlation coefficients can be deceptive. Visual inspection is essential to uncover underlying patterns, outliers, or non‑linear relationships that numbers alone may hide. In practice, always complement statistical analysis with appropriate visualizations.

Data Visualizationoutliersregression analysisAnscombe's Quartetstatistical pitfalls
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.