Artificial Intelligence 22 min read

Advancements in Keyword Recall for Search Advertising: From Binary Retrieval to Hierarchical Bidding Graph

The paper reports a year‑long evolution of Alibaba’s search‑advertising keyword recall, replacing the traditional two‑stage rewrite‑and‑score pipeline with a low‑storage binary retrieval model and then a joint recall framework built on a hierarchical bidding graph, delivering near‑full‑precision recall, 16× memory savings, and quota‑free global ranking.

Alimama Tech
Alimama Tech
Alimama Tech
Advancements in Keyword Recall for Search Advertising: From Binary Retrieval to Hierarchical Bidding Graph

This article shares a year‑long practice of keyword recall in Alibaba’s search advertising system, describing two major algorithmic upgrades: moving from a two‑stage approximate scoring pipeline to a single‑stage binary retrieval, and then to a joint recall framework based on a hierarchical bidding graph.

Background : In e‑commerce search, advertisers bid on Bidwords that link user queries to ads. Traditional keyword recall relies on a two‑stage process—first rewriting and then scoring —which suffers from quota allocation issues, high computational cost, and a performance ceiling.

Challenges identified are (1) strong compute constraints on inverted‑index retrieval, (2) the need for quota‑free global ranking, and (3) the desire for joint recall of without the “rewrite‑then‑score” bottleneck.

From Two‑Stage to Single‑Stage: Binary Retrieval : The authors replace the two‑stage pipeline with a low‑storage, low‑compute binary representation model. A 128‑bit vector and bit‑wise operations enable massive candidate scoring with minimal memory and latency, eliminating the quota problem and simplifying the update chain from to level.

Quantization Techniques : Two families of learning‑to‑hash methods are examined—activation‑based (e.g., tanh‑approximated sign) and regularization‑based (e.g., Laplacian prior). Experiments show that regularization‑based binary quantization offers more stable recall while preserving enough representation power for large‑scale retrieval.

Low‑Precision Correction : To mitigate the loss of accuracy, an auxiliary task discriminating ad categories is added during pairwise training, and a propensity‑score debiasing layer balances ads across creation time and click volume.

Experiments & Results : Binary retrieval achieves Recall@1000 within 5 % of full‑precision models and reduces memory usage by 16×. Time complexity drops from 64 floating‑point multiplications to a single XOR‑plus‑bitcount operation.

From Single‑Stage to Joint Recall: Hierarchical Bidding Graph : The authors construct a tree‑structured index where nodes represent aggregated Bidwords. By jointly learning a dense retrieval model and the tree structure, they enable simultaneous scoring of and its associated during beam search, achieving a unified recall.

Graph Pre‑training : A heterogeneous graph containing , , , and edges is pre‑trained using metapath sampling and multi‑view fusion, providing robust node embeddings for the hierarchical index.

Joint Retrieval Benefits : The hierarchical approach improves offline recall and category‑match rates, compresses the inverted index, and supports DNN‑level similarity scoring. Offline experiments show consistent gains over brute‑force full‑library scoring.

Business Reflections : The authors discuss the trade‑off between recall diversity and ranking precision, the need for quota‑free multi‑objective recall channels, and future directions toward end‑to‑end gradient flow from downstream bidding and ranking modules back to the recall stage.

machine learningAISearch Advertisinglarge-scale retrievalbinary retrievalhierarchical bidding graphkeyword recall
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.