Big Data 11 min read

Designing Evaluation Metrics and Building an Overall Evaluation Index (OEC) for AB Testing

The article explains how to design experiment evaluation metrics—from top‑down objectives to core, quality, and observation types—and construct an Overall Evaluation Criterion by processing, weighting, and aggregating metrics, providing a robust, scalable framework for credible AB‑test assessment and product optimization.

Alimama Tech
Alimama Tech
Alimama Tech
Designing Evaluation Metrics and Building an Overall Evaluation Index (OEC) for AB Testing

The article is part of the "Alibaba Mama Data Science Series" and focuses on the design of experiment evaluation metrics and the construction of a comprehensive metric system for AB testing.

It explains why a well‑defined metric system is essential for judging the success of an AB test, ensuring experiment credibility, and facilitating scaling decisions.

Experiment Evaluation Metric Design : Metrics should be derived top‑down from the experiment’s objectives. High‑level goals (e.g., user activity, adoption) are broken down into concrete definitions, and multiple metrics can be aggregated into a single objective function such as an Overall Evaluation Criterion (OEC).

Metric Characteristics include:

Sensitivity – how responsive the metric is to the factor being tested.

Robustness – how insensitive the metric is to unrelated factors.

Metric distribution – analysis of historical data to understand the metric’s statistical properties.

Both sensitivity and robustness can be validated through small‑scale pilots or A/A tests.

Metric Classification and Selection :

Core metrics – highly sensitive, directly affected by the experiment (e.g., click‑through rate, conversion rate, per‑user consumption). They should be few (≤3) and clearly linked to business stage.

Quality metrics – serve as safety nets or constraints (e.g., performance indicators, user experience scores).

Observation metrics – auxiliary metrics that help explain changes in core metrics or reveal side effects (e.g., exposure, secondary retention).

The article provides practical guidance on choosing each type of metric based on product lifecycle and business goals.

Construction of the Overall Evaluation Index (OEC) involves four main steps:

Establish a good metric system tailored to different business scenarios.

Process metrics: directionality adjustment (convert inverse or moderate metrics to positive direction) and dimensionless scaling (e.g., normalization, standardization).

Assign weights: combine subjective methods (expert weighting, Analytic Hierarchy Process) with objective methods (variance‑coefficient, correlation, entropy).

Aggregate weighted metrics using linear, geometric, hybrid, or model‑based methods to obtain a composite score.

Advantages of OEC include a holistic view of experiment health, avoidance of multiple‑testing problems, a standardized framework for cross‑business comparison, and the ability to track longitudinal performance.

In summary, a scientifically designed metric system and a well‑constructed OEC are crucial for accurate AB test evaluation, decision‑making, and long‑term product optimization.

AB testinganalyticsdata scienceExperiment Evaluationmetric designOEC
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.