AutoHERI: Hierarchical Representation Automatic Aggregation for CVR Estimation in Advertising
AutoHERI, a hierarchical representation automatic aggregation model discovered via one‑shot neural architecture search, jointly learns CTR and CVR (and other downstream tasks) to capture cascade relationships, achieving superior AUC and conversion‑rate lifts in large‑scale Alibaba advertising datasets and prompting full production deployment.
This article introduces the AutoHERI model (Automated Hierarchical Representation Integration) developed by the Alibaba Mom Advertising Algorithm Team for conversion rate (CVR) prediction and multi‑task learning. The model leverages hierarchical representation automatic aggregation to improve post‑click conversion estimation across multiple advertising scenarios, and the work has been published in CIKM 2021.
CVR estimation predicts the probability that a user who clicks an ad will later convert. In out‑of‑site (外投) advertising, conversion events are sparse, leading to data‑scarcity challenges and a mismatch between training (click samples) and inference (all traffic) distributions.
To mitigate sparsity, CVR models are often jointly trained with richer upstream tasks such as click‑through‑rate (CTR) prediction. Existing multi‑task approaches (e.g., ESMM, DBMTL, GMSL, ESM2, Multi‑DR, AITM) treat tasks as loosely coupled. AutoHERI explicitly models the cascade relationship among tasks (e.g., CTR → CVR) by aggregating intermediate representations from upstream tasks into downstream task networks.
The core of AutoHERI is a one‑shot neural architecture search that automatically discovers optimal layer‑wise aggregation connections between tasks. Each candidate binary edge indicates whether a representation from a layer of the CTR network should be fed into a layer of the CVR network. The search relaxes binary edges to continuous probabilities (Bernoulli expectations) and optimizes them jointly with network parameters via gradient descent, incorporating an entropy regularizer.
In the basic two‑task setting (CTR and CVR), the base network consists of a shared embedding layer followed by two separate deep neural networks. Hierarchical aggregation adds layer‑wise connections from the CTR network to the CVR network, allowing the downstream task to benefit from richer, higher‑level features learned by the upstream task.
For scenarios with more than two tasks (e.g., click‑through, post‑click activation, and conversion), AutoHERI extends the architecture to three DNNs with shared embeddings and learns aggregation edges among them, primarily focusing on adjacent task pairs.
Experimental evaluation on a public dataset (Ali‑CCP, 80M impressions) and a large‑scale Alibaba out‑of‑site advertising dataset (400M impressions) shows that AutoHERI consistently outperforms baseline models in AUC and negative log‑likelihood for CVR and CTCVR tasks. Ablation studies confirm that hierarchical aggregation yields larger gains than parallel aggregation, and that one‑shot search achieves comparable performance to evolutionary search while reducing training time.
Online A/B tests in production demonstrate a 4.9% lift in conversion rate and a 5.8% reduction in payment cost, leading to full‑scale deployment across multiple advertising products.
The authors conclude that hierarchical automatic aggregation, powered by one‑shot architecture search, effectively captures task cascade relationships and adapts to new business scenarios. Future work includes exploring richer aggregation operators beyond simple concatenation‑and‑linear‑mapping and applying AutoML to other components of multi‑task models.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.