Artificial Intelligence 18 min read

User Modeling for Search Ranking: Practices, Model Design, and Experimental Analysis at Alibaba

This article presents Alibaba's comprehensive approach to user modeling for search CTR/CVR ranking, detailing the abstraction of user information, multi‑scale behavior processing, enhanced transformer‑based model structures, client‑side click and exposure modeling, and experimental results showing significant AUC improvements.

DataFunTalk

Aug 29, 2020

User Modeling for Search Ranking: Practices, Model Design, and Experimental Analysis at Alibaba

Background and Significance User modeling is a core technology for search and recommendation systems. In Taobao search, the ranking target is a <user, query, item> triple, where item features are dense while user features are sparse, requiring extensive generalized features.

Information Abstraction The authors enrich user modeling through static user profiles, preference tags mined from behavior, and fine‑grained real‑time behavior modeling to capture current interests.

Information Processing User behavior data are organized by behavior cycle (short‑term vs. long‑term) and content (explicit vs. implicit signals). Short‑term sequences are filtered by predicted query categories, while long‑term data span two years and are divided into quarterly sub‑sequences.

Model Architecture The overall model concatenates user profile, multiple behavior sequence features, item features, and contextual signals (e.g., weather, network) before feeding them into a DNN classifier. Sequence modeling uses an optimized self‑attention mechanism with cosine similarity (replacing dot‑product) and a query‑attention pooling layer to emphasize behavior relevant to the current query.

Short‑Term Sequence Improvements Cosine‑based similarity with scaling improves softmax logits, and query‑attention pooling further aligns historical actions with the current intent, avoiding the heavier target‑attention computation.

Long‑Term Sequence Modeling Quarterly sequences are embedded, masked, and processed through multi‑layer self‑attention and attention‑pooling, then concatenated to form a comprehensive long‑term preference representation, aiding seasonal personalization.

Client‑Side Click and Exposure Modeling Click sequences captured on the device provide millisecond‑level latency and richer signals (e.g., dwell time, button clicks). Exposure sequences (items shown but not clicked) are modeled via mean pooling and an auxiliary loss that enforces a margin between positive and negative item distances, improving the representation of user disinterest.

Experiments and Analysis Using online exposure and click data, the authors evaluate models on AUC. Incorporating the proposed sequence improvements yields up to 0.3% absolute AUC gain, while adding new sequence features contributes an additional ~0.7% gain. Attention weight visualizations show the model learns both positional importance and inter‑item relationships.

Conclusion and Outlook The presented user modeling techniques significantly boost Taobao search performance during major events, and future work will focus on finer user data perception, more scientific data organization, and further model refinements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Alibaba deep learning CTR prediction attention mechanism search ranking user modeling behavior sequence

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.