Understanding Online Experiments: Origins, Types, and Applications
This article explains the concept, history, and various forms of online experiments such as AB testing, ABn, AA, and multivariate tests, highlighting their role in causal inference, value evaluation, risk control, and product optimization within modern internet businesses.
In internet business, growth is a perpetual theme, yet the era of abundant traffic has faded, making effective growth under invisible strategy effects a challenge; online experiments have become essential tools for strategy validation, product iteration, algorithm optimization, and risk control since Google first applied them in 2000.
Alibaba's advertising division, Alibaba Mama, has accumulated extensive practice and technology in online experiments, summarizing a complete set of experimental techniques and methods to be shared in a series of articles covering topics such as “Understanding Online Experiments,” “AB Testing under Online Traffic Splitting,” and “AB Testing under Offline Sampling.”
1. What is an online experiment
Online experiments originate from randomized controlled trials (RCT) in biomedicine, which compare treatment groups with control groups to draw causal conclusions; the method has been extended to many fields and now underpins data‑driven growth in internet companies.
The development of the internet has made large‑scale randomization feasible, solving traditional limitations of sample size, cost, and confounding factor control, thereby providing reliable causal inference for product decisions.
2. Why conduct experiments
AB testing serves as a gold standard for causal inference, allowing clear determination of whether a variable causes an effect, while also providing quantitative value assessment (e.g., revenue, user growth) and risk control by testing changes on small traffic before full rollout.
3. Classification of experiments
3.1 ABTest
AB testing compares two versions of a single variable using random assignment and statistical hypothesis testing (typically two‑sample t‑test or z‑test for large samples).
The typical workflow includes defining goals, determining randomization units, calculating sample size (power analysis), and performing hypothesis testing.
3.1.2 ABn testing compares multiple versions of a single variable, using either pairwise or multi‑sample hypothesis tests.
3.1.3 AA testing runs two identical versions to verify the experimental setup; significant differences indicate design flaws or random error.
3.1.4 Multivariate Testing (MVT) evaluates multiple variables simultaneously, allowing analysis of interaction effects and independent contributions.
3.2 Quasi‑experimental designs
When randomization is infeasible, quasi‑experiments (class experiments) such as pre‑post designs, non‑randomized control groups, or self‑controlled studies are employed, following similar principles of variable control.
4. Summary
Online experimentation has evolved into a distinct scientific discipline that merges statistical theory with internet technology, enabling rigorous hypothesis testing at scale and supporting data‑driven decision making; future articles will explore specific applications within Alibaba Mama’s business scenarios.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.