Artificial Intelligence 21 min read

Advances in Pre‑Ranking: The COLD System for Large‑Scale Advertising

This article reviews the evolution of coarse‑ranking in large‑scale ad systems, explains the two main technical routes—set selection and precise value estimation—introduces the Computing‑Power‑Cost‑Aware Online Lightweight Deep (COLD) pre‑ranking framework, and presents experimental results and future directions for deeper integration with fine‑ranking.

DataFunSummit
DataFunSummit
DataFunSummit
Advances in Pre‑Ranking: The COLD System for Large‑Scale Advertising

In large‑scale ranking scenarios such as search, recommendation, and advertising, cascade ranking architectures are widely adopted. Alibaba’s online advertising pipeline typically consists of recall, coarse ranking, fine ranking, and re‑ranking. Coarse ranking must select a few hundred candidates from tens of thousands within a strict latency budget of 10‑20 ms.

The article is organized around three themes: the historical development of coarse ranking, the latest COLD (Computing‑Power‑Cost‑Aware Online Lightweight Deep) system, and a summary with outlook.

1. Background of Coarse Ranking

Coarse ranking sits between recall and fine ranking, aiming to satisfy compute and response‑time constraints while selecting candidates that meet downstream objectives. Compared with fine ranking, it faces stricter latency, larger scoring volume, and a more severe solution‑space mismatch.

Two Main Technical Routes

Set‑selection techniques model the candidate set directly. Common methods include multi‑channel selection, listwise approaches such as LambdaMART, and sequence‑generation strategies that greedily build a set.

Precise value estimation directly predicts the final system metric (e.g., eCPM = pCTR × bid ) in a pointwise fashion, offering strong controllability at the cost of higher compute.

2. Evolution of Coarse Ranking

Early generations relied on static quality scores and simple logistic‑regression models. The third generation introduced deep dual‑tower models that compute user and ad embeddings and score them via inner‑product, greatly improving expressiveness while keeping inference cheap.

The fourth generation, COLD, treats compute as a variable and co‑optimizes model architecture and latency. It uses a group‑wise embedding network (GwEN) with flexible depth, supports arbitrary cross‑features, and incorporates engineering optimizations to balance effectiveness and resource usage.

Model Architecture

Feature selection is performed with a Squeeze‑and‑Excitation (SE) block that assigns an importance score s_i to each feature embedding e_i . The top‑K features are chosen based on offline metrics (GAUC, QPS, RT). Structured pruning multiplies each neuron output by a scaling factor γ and applies sparsity regularization, removing neurons with γ = 0 .

Engineering Optimizations

Parallelism: independent ad computations are parallelized across threads and GPUs.

Row‑column transformation: sparse feature matrices are reorganized for column‑wise access, enabling SIMD‑accelerated kernels.

Float16 & Mixed‑Precision: most layers run in Float16, while batch‑norm layers stay in Float32; a custom linear_log activation handles the reduced dynamic range.

Multi‑Process Service (MPS) and NPU acceleration further double QPS.

3. Online Service Architecture

COLD supports real‑time training and scoring, allowing rapid adaptation to data distribution shifts and improving cold‑start performance for new ads and users. The system integrates feature computation and neural inference in a unified pipeline.

Experimental Results

Offline evaluation shows COLD outperforms the vector‑inner‑product baseline in GAUC and recall. Online A/B tests report +6.1 % CTR and +6.5 % RPM on regular traffic, with larger gains (+9.1 % CTR, +10.8 % RPM) during peak events. System‑level metrics place COLD between the fast inner‑product model and the slower fine‑ranking model, achieving a balanced trade‑off.

Future Directions

Two possible paths are discussed: (1) deeper integration of coarse and fine ranking through joint training and knowledge distillation from fine‑ranking logs, reducing pipeline inconsistency and operational cost; (2) a return to pure set‑selection modeling that directly optimizes the downstream candidate set, potentially saving compute by avoiding unnecessary internal ordering.

In summary, coarse ranking has fully entered the deep‑learning era, with vector‑inner‑product and COLD representing the two dominant paradigms. The choice between them depends on specific latency, compute, and feature‑richness constraints.

advertisingmachine learningdeep learningfeature selectionpre‑rankingCOLDonline systems
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.