Big Data 20 min read

How NetEase Yanxuan Detects and Diagnoses Metric Anomalies at Scale

This article explains NetEase Yanxuan's end‑to‑end practice for automatically detecting, classifying, and diagnosing metric anomalies in e‑commerce, covering background motivation, three anomaly types, statistical detection frameworks (GESD, volatility, trend), post‑processing, contribution‑decomposition methods, dimension‑explosion challenges, and practical optimizations.

Data Thinking Notes
Data Thinking Notes
Data Thinking Notes
How NetEase Yanxuan Detects and Diagnoses Metric Anomalies at Scale

Background Introduction

Metrics are tightly linked to business health; rapid, accurate detection of abnormal metrics helps identify and resolve issues promptly. As e‑commerce evolves with fast iteration and complex logic, the number of metrics grows, their distributions vary widely, and manual alarm thresholds become error‑prone and costly.

Goals of an automated solution are:

Automation without user input: no need for manually defined rules or dimensions.

Generality: adapt to diverse metric distributions.

Timeliness: support day‑level and hour‑level detection.

Accuracy and proactivity: enable data‑driven root‑cause identification.

Metric Anomaly Detection

1. Types of Anomalies

We define metric anomalies as any abnormal spikes, drops, or trends that require alerts and diagnosis. Three independent anomaly categories are:

Absolute value anomaly – points that deviate from the metric’s inherent distribution (statistical outliers).

Volatility anomaly – sudden large increases or decreases compared to the previous period.

Trend anomaly – medium‑ to long‑term upward or downward trends indicating potential risk.

These categories may co‑occur; for example, point A can be both an absolute and volatility anomaly.

2. Detection Framework

We designed an unsupervised, statistical‑test‑based framework to achieve generality, automation, and timeliness.

Absolute Value Detection

Based on the Generalized ESD (GESD) test, we assume at most r outliers. The algorithm iteratively finds the sample farthest from the mean, computes the statistic R_i , and compares it with a critical value λ_i derived from the t‑distribution (confidence level α≈0.05). The process repeats r times, and any sample with R_i > λ_i is flagged as an anomaly. This method does not require pre‑specifying the exact number of outliers and overcomes the low detection rate of the classic 3‑sigma rule.

Volatility Detection

Volatility anomalies are identified by locating inflection points in the volatility distribution using second‑order derivatives and distance measures. Since most volatility series are non‑normal, we cannot apply GESD directly; instead, we detect points where volatility exceeds its turning‑point bounds. When inflection points are missing or too early, fallback methods such as quantile‑based thresholds are used.

Trend Detection

Trend anomalies are detected with the Mann‑Kendall test. We compute the statistic S from pairwise sign comparisons, standardize it to obtain Z , and translate Z to a p‑value. A p‑value below 0.05 indicates a statistically significant trend. This non‑parametric test works for any distribution and does not require continuous series because absolute‑value outliers are removed beforehand.

Post‑Processing

After detecting the three anomaly types, we apply post‑processing to reduce false alarms:

Data‑level filtering: if a volatility anomaly is caused by a previous period’s spike (e.g., a 100% rise followed by a 50% drop), we suppress the current alert unless the absolute anomaly aligns.

Promotion‑level filtering: during large‑scale campaigns where anomalies are expected, we mute alerts to avoid noise.

Metric Anomaly Diagnosis

1. Diagnosis Hierarchy

We categorize diagnostic conclusions into three layers based on feasibility and certainty: deterministic inference, probabilistic inference, and speculative inference.

2. Diagnosis Method Comparison

Different inference layers correspond to distinct methods:

Deterministic inference – uses contribution‑decomposition algorithms (additive, multiplicative, or divisional) to pinpoint exact impact of each sub‑metric.

Probabilistic inference – employs machine‑learning models (regression, SHAP values, Bayesian networks) for importance ranking, but lacks single‑instance explainability.

Speculative inference – relies on expert experience and is not covered in this article.

3. Business Context

NetEase Yanxuan transitioned from platform e‑commerce to brand e‑commerce, focusing on multi‑channel flagship products. Metrics are organized into three layers: strategic (e.g., GMV as the north‑star), tactical (department‑level KPIs), and execution (product‑level KPIs). Accurate, explainable diagnostics are essential for this hierarchical structure.

4. Contribution Decomposition Calculation

The contribution of each sub‑metric is computed via three formulas:

Additive decomposition: ΔX_i / Y_0.

Multiplicative decomposition (LMDI): apply logarithmic mean to obtain additive equivalents.

Divisional decomposition: separates volatility contribution (A_Xi) and structural change contribution (B_Xi), addressing Simpson’s paradox scenarios.

All contributions are additive, satisfying the MECE principle; summing contributions across a dimension yields the overall change ΔY%.

5. Dimension‑Explosion Problem

When decomposing a top‑level metric (e.g., GMV) across many dimensions (channels, regions, product categories, etc.), the number of intermediate tables grows exponentially (2^n for n dimensions), leading to massive storage and computation costs.

6. Optimizations for Dimension Explosion

We applied three key optimizations:

Aggregation‑based contribution: compute the finest‑grain contributions once, then group‑by required dimensions, eliminating intermediate‑table I/O.

Pruning via dimension grouping and combination limits: avoid redundant combinations (e.g., if channel‑level is already aggregated, skip lower‑level combos) and restrict the number of dimensions per combination to two or three.

Dimension ranking with Gini coefficient: prioritize dimensions where top‑value contributions are large and Gini is low, enabling precise root‑cause identification.

QA

Q1: How do you evaluate the accuracy of the diagnosis?

A1: Deterministic diagnosis yields clear conclusions; accuracy is ensured by the underlying code implementation. Business‑level validation involves collecting bad cases to assess false‑positive/negative rates.

Q2: Do you mix decomposition methods (e.g., additive then multiplicative) for GMV?

A2: Yes. A greedy search can select the best dimension at each step, switching between additive and multiplicative based on contribution reduction. In practice, NetEase Yanxuan mainly uses additive decomposition for brand e‑commerce because other factors (traffic, conversion) are treated as black boxes.

e-commercestatistical analysisdata monitoringmetric anomaly detectioncontribution decomposition
Data Thinking Notes
Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.