Product Management 14 min read

Understanding A/B Testing: Statistical Foundations, Metric Evaluation, and Practical Applications

This article explains the principles of A/B testing, the statistical concepts such as population, sample, hypothesis testing, p‑values and t‑tests, describes how to calculate metrics for rate and mean indicators, and illustrates a real‑world experiment with detailed evaluation methods.

Xueersi Online School Tech Team

Feb 14, 2020

Understanding A/B Testing: Statistical Foundations, Metric Evaluation, and Practical Applications

What is A/B Testing

A/B testing is a data‑driven method that splits traffic so that different versions of a product run simultaneously; by recording and analysing user behaviour on each version, it provides a scientific comparison that supports product decision‑making.

Core Principles

The core of A/B testing includes ensuring similarity and uniformity of the experimental population, adhering to the single‑variable principle, and using scientific effect evaluation.

Application Scenarios

Quote from Zhang Yiming: "Even if you are 99.9% sure a name is the best, just test it. What’s the harm?"

Role of Statistics in A/B Testing

A/B testing is essentially a comparative experiment; statistical theory provides the scientific basis for drawing conclusions from sample data.

Statistical Concepts

Population : the whole set of users of a website or app.

Sample : a subset drawn from the population, representing the control and test groups.

Parameter : a numeric description of the population (e.g., overall mean).

Statistic : a numeric description of the sample (e.g., sample mean).

Mean, Variance, Normal Distribution : basic measures and the theoretical foundation for many inference methods.

Sampling and Parameter Estimation

Sampling must produce a representative sample; otherwise, estimates lack logical basis. Parameter estimation can be point estimation (single value) or interval estimation (range with confidence level).

Hypothesis Testing

Two hypotheses are defined: the null hypothesis (H₀) – the status quo we aim to reject, and the alternative hypothesis (H₁) – the effect we hope to support. Errors of the first kind (α) and second kind (β) are explained, with typical α = 0.05 and β = 0.2.

Significance Level (p‑value)

The p‑value is the probability of rejecting H₀ when it is true. In practice, a p‑value ≤ 0.05 leads to rejecting H₀, indicating a statistically significant result.

Statistical Significance

If the sample data reject H₀, the result is called significant; otherwise, it is not significant.

t‑Test

Common hypothesis‑testing methods include z‑test, t‑test, and chi‑square test. For A/B testing, an independent two‑sample t‑test is appropriate.

Variables: x₁, x₂ (sample means); S₁, S₂ (standard deviations); n₁, n₂ (sample sizes). The t‑statistic is computed and converted to a p‑value.

Rate Metric (Bernoulli) p‑Value Calculation

Mean Metric (Gaussian) p‑Value Calculation

Metric Evaluation Methods

Composite metrics (e.g., conversion rate) require using the denominator of the composite metric as the effective sample size when calculating p‑values.

Practical Example: High‑School Coupon Pop‑Up

Control group: 30% of users see a coupon; Test group: 70% see a premium course. The click‑through rate is a composite metric derived from two base metrics.

Metric calculation uses exposure UV as the denominator and click count as the numerator; the resulting p‑value curve shows a significant result (p = 0.0329 < 0.05) on 2019‑12‑21, indicating the premium course performs better.

Conclusion

A/B testing puts the user at the centre of product decisions, offering scientific, data‑driven insights that improve decision efficiency and reduce adverse user impact. It has been widely adopted across internet companies and is now applied to product revisions, UI styles, recommendation systems, and advertising within the online school platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

A/B testing hypothesis testing p-value Statistical Analysis experiment design product metrics t-test

Written by

Xueersi Online School Tech Team

The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.