Artificial Intelligence 11 min read

STAR: Star Topology Adaptive Recommender for Multi-Scenario CTR Prediction

STAR introduces a star‑shaped CTR prediction architecture that jointly learns shared and scenario‑specific patterns via a fully‑connected network with central and private parameters, partitioned normalization, and an auxiliary scenario network, delivering consistent offline gains and +8% online CTR improvement while scaling to many domains without extra cost.

Alimama Tech
Alimama Tech
Alimama Tech
STAR: Star Topology Adaptive Recommender for Multi-Scenario CTR Prediction

The Alibaba Momma display advertising team needs to provide ranking capability for a large number of scenarios. To address the rapid expansion of scenario count, they investigated multi‑scenario joint modeling and proposed a star‑shaped CTR prediction model called STAR. The work has been accepted by CIKM 2021.

Background – Advertising ranking must serve many contexts such as the Taobao homepage, post‑purchase pages, various promotional activities, and external traffic redirects. Traditional practice builds separate models per scenario, which suffers from data sparsity in long‑tail scenarios and high system and manpower costs. Multi‑scenario modeling must capture both commonalities across scenarios and their differences.

The authors identify three challenges: (1) many scenarios with long‑tail distribution, (2) large distribution gaps among scenarios, and (3) limited resources for model development.

Problem Definition – In single‑scenario CTR modeling, samples are drawn from one domain assuming i.i.d. data. In multi‑scenario modeling, the model predicts CTR for a sample x together with a domain indicator p . Data are sampled from multiple related but distribution‑different scenarios, and samples are i.i.d. only within each scenario.

Multi‑scenario modeling differs from multi‑task learning: the former handles the same task (CTR) across different domains, while the latter handles different tasks within the same domain.

STAR Model – The core idea is to learn scenario‑specific behavior and shared behavior simultaneously. STAR consists of three components:

STAR Topology Fully‑Connected Network : each fully‑connected (FC) layer has a shared central parameter and a scenario‑private parameter. The final parameters for a scenario are obtained by element‑wise product of the two.

Partitioned Normalization (PN) : unlike standard Batch Normalization (BN) that uses shared statistics across all samples, PN maintains domain‑specific moving averages and variances, applying domain‑specific affine transformations. This preserves scenario‑specific distribution characteristics.

Auxiliary Network : an additional small network receives scenario‑related features; its output is added to the STAR network output, allowing explicit influence of scenario features on the final predicted CTR.

The STAR architecture enables shared learning of common patterns while preserving scenario‑specific nuances without extra computational overhead.

Comparison with MMoE – While many industrial systems use Multi‑Gate Mixture‑of‑Experts (MMoE) for multi‑scenario modeling, STAR offers several advantages: shared parameters learn common behavior, explicit domain knowledge is retained via private parameters, computational cost does not increase with the number of scenarios, and new scenarios can be added by initializing private parameters to 1, avoiding cold‑start issues.

Offline Experiments – Experiments on a production advertising dataset compared STAR with baselines such as Shared‑Bottom, MulANN, MMoE, and Cross‑Stitch. STAR consistently improved performance across all scenarios. In online streaming training, samples are split by scenario and shuffled to avoid distribution shocks. Deployment of STAR achieved +8% CTR and +6% RPM gains without extra features, compute, or latency.

Beyond Multi‑Scenario Modeling

“业务没有多场景,STAR还能用吗?”

The authors argue that STAR’s approach is applicable to any mixed‑distribution setting, such as gender‑based user differences or ad category differences, providing fine‑grained modeling by treating any meaningful partition as a “scenario”. Experiments show GAUC improvements of >0.2% for various partition schemes.

References

1. Sheng et al., “One Model to Serve All: Star Topology Adaptive Recommender for Multi‑Domain CTR Prediction”, CIKM 2021. https://arxiv.org/pdf/2101.11427

2. Zhou et al., “Deep Interest Network for Click‑Through Rate Prediction”, KDD 2018.

advertisingmachine learningCTR predictionmulti-scenario modelingpartitioned normalizationSTAR model
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.