How to Calculate Minimum Sample Size for Reliable A/B Tests
This article explains common pain points in A/B testing, introduces essential statistical concepts such as sampling distribution, parameter estimation, confidence intervals, and hypothesis testing, and provides step‑by‑step formulas and a concrete example for calculating the minimum sample size needed to run a trustworthy experiment.
Preface
A/B experiments have forward‑looking, statistical, and scientific characteristics. When used correctly, they fully leverage data analysis in the big‑data era to solve problems and provide strong evidence for decision‑making, but users often encounter pain points and doubts.
Pain Points
How much traffic each experiment needs.
No clear idea of how long an experiment should run.
Solutions
Determine the required traffic to verify a specific feature.
Decide the appropriate experiment duration.
Statistical Basics
Research Object
Population X: a metric of interest.
Individual: an element xi in the population.
Sample: a subset of individuals Xi.
Statistical Tools
(1) Sample Mean – reflects the population mean.
(2) Sample Variance – average of squared deviations, reflects population variance.
Sample correction (image omitted for brevity).
(3) Sample Standard Deviation – the square root of variance.
(4) Sample K‑th Moment – see image.
(5) Sample K‑th Central Moment – see image.
Sampling Distribution
Detailed discussion is omitted; the concepts are used later in derivations.
Standard normal distribution N(0,1)
Chi‑square distribution
t‑distribution
F‑distribution
Parameter Estimation
Using sample statistics to estimate population parameters, e.g., sample mean estimates population mean, sample proportion estimates population proportion, sample variance estimates population variance.
(1) Point estimation vs. interval estimation
Point estimation directly uses the sample statistic as the estimate.
Interval estimation provides a range (confidence interval) for the population parameter.
(2) Confidence interval and confidence level
A confidence interval is the range constructed from the sample statistic that likely contains the true parameter. Example: with 100 samples, 95% of the constructed intervals contain the true value.
Hypothesis Testing Example
Rice yield: expected 310 kg/acre, sample of 10 plots shows 320 kg/acre. Assuming normal distribution N(μ,144), test at α=0.05 (Z₀.₀₅=1.645, Z₀.₀₂₅=1.96). Use Z‑test if variance known, t‑test otherwise.
A Simple Complete A/B Test Example
Background and Setup
Web app integrates Volcano Engine A/B testing SDK to report events.
Goal: improve registration conversion rate.
Current flow uses image captcha; new flow proposes SMS verification to reduce user friction.
Core metric: registration conversion rate.
Two versions: control (image captcha) and experiment (SMS code).
Traffic split: 50% total, evenly distributed (25% each version).
Result Analysis
After about two weeks, each version received 25% of users. The new version increased conversion by ~10% with a 95% confidence interval of [8%, 12%]. This indicates a high probability of a real uplift.
Decision
The product manager decides to roll out the SMS verification to all users, significantly boosting the registration conversion rate.
Detailed Sample Size Calculation
The minimum sample size per group is calculated by:
Where n is the sample size per group, α and β are type‑I and type‑II error probabilities (commonly 0.05 and 0.2), Z is the normal quantile, Δ is the expected difference between groups, and σ is the standard deviation.
Assuming equal variance, the formula simplifies to:
Example: registration rates e₁=50% and e₂=60%, power 0.8, α=0.05 → each group needs at least 385 samples.
If the two versions have unequal traffic weights, adjust the total sample N using:
Method 2 uses hypothesis testing power calculations. The power is:
And the required sample per version can be derived as:
Conclusion
For typical A/B scenarios, assuming equal population variance, the presented formulas allow practitioners to compute the minimum sample size needed to achieve a desired confidence level and statistical power, guiding traffic allocation and experiment duration.
ByteDance Data Platform
The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.