Artificial Intelligence 18 min read

User Modeling for Search Ranking: Practices, Model Design, and Experimental Analysis at Alibaba

This article presents Alibaba's comprehensive approach to user modeling for search CTR/CVR ranking, detailing the abstraction of user information, multi‑scale behavior processing, enhanced transformer‑based model structures, client‑side click and exposure modeling, and experimental results showing significant AUC improvements.

DataFunTalk
DataFunTalk
DataFunTalk
User Modeling for Search Ranking: Practices, Model Design, and Experimental Analysis at Alibaba

Background and Significance User modeling is a core technology for search and recommendation systems. In Taobao search, the ranking target is a <user, query, item> triple, where item features are dense while user features are sparse, requiring extensive generalized features.

Information Abstraction The authors enrich user modeling through static user profiles, preference tags mined from behavior, and fine‑grained real‑time behavior modeling to capture current interests.

Information Processing User behavior data are organized by behavior cycle (short‑term vs. long‑term) and content (explicit vs. implicit signals). Short‑term sequences are filtered by predicted query categories, while long‑term data span two years and are divided into quarterly sub‑sequences.

Model Architecture The overall model concatenates user profile, multiple behavior sequence features, item features, and contextual signals (e.g., weather, network) before feeding them into a DNN classifier. Sequence modeling uses an optimized self‑attention mechanism with cosine similarity (replacing dot‑product) and a query‑attention pooling layer to emphasize behavior relevant to the current query.

Short‑Term Sequence Improvements Cosine‑based similarity with scaling improves softmax logits, and query‑attention pooling further aligns historical actions with the current intent, avoiding the heavier target‑attention computation.

Long‑Term Sequence Modeling Quarterly sequences are embedded, masked, and processed through multi‑layer self‑attention and attention‑pooling, then concatenated to form a comprehensive long‑term preference representation, aiding seasonal personalization.

Client‑Side Click and Exposure Modeling Click sequences captured on the device provide millisecond‑level latency and richer signals (e.g., dwell time, button clicks). Exposure sequences (items shown but not clicked) are modeled via mean pooling and an auxiliary loss that enforces a margin between positive and negative item distances, improving the representation of user disinterest.

Experiments and Analysis Using online exposure and click data, the authors evaluate models on AUC. Incorporating the proposed sequence improvements yields up to 0.3% absolute AUC gain, while adding new sequence features contributes an additional ~0.7% gain. Attention weight visualizations show the model learns both positional importance and inter‑item relationships.

Conclusion and Outlook The presented user modeling techniques significantly boost Taobao search performance during major events, and future work will focus on finer user data perception, more scientific data organization, and further model refinements.

AlibabaDeep LearningCTR predictionattention mechanismsearch rankinguser modelingbehavior sequence
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.