Artificial Intelligence 24 min read

A Survey of User Behavior Sequence Modeling for Search and Recommendation Advertising

User behavior sequence modeling, crucial for search and recommendation advertising ranking, has evolved from simple pooling to attention, RNN, capsule, and Transformer architectures, with industrial applications across e‑commerce, social, video, and music platforms, and future directions include time‑aware, multi‑dimensional, and self‑supervised approaches.

DataFunTalk
DataFunTalk
DataFunTalk
A Survey of User Behavior Sequence Modeling for Search and Recommendation Advertising

Introduction

User behavior sequence modeling plays a vital role in the recall and ranking modules of internet search and recommendation advertising systems. This article reviews classic works and development trends of user behavior sequence modeling in the deep learning era.

1. Applications of User Behavior Sequence Modeling

It is widely used in e‑commerce (Taobao, JD, Pinduoduo, Amazon), social platforms (WeChat, Facebook), search engines (Baidu, Google), short‑video (Douyin, Kuaishou), video (iQIYI, YouTube), music (NetEase Cloud Music, Tencent Music), and life‑service apps (Ele.me, Meituan). Search and recommendation advertising systems retrieve and rank items from massive catalogs to provide real‑time personalized services.

The key problem in recall and ranking is modeling users' real‑time interests, which are reflected in their behavior sequences on the app.

Modeling the behavior sequence yields a user interest vector that can be fed into recall and ranking modules for personalized service. The figure below shows the application of behavior‑sequence modeling in Taobao search and recommendation.

2. Main Works on User Behavior Sequence Modeling

In the deep‑learning era, 22 models have been proposed, covering polling, attention, RNN, capsule, and Transformer stages. Transformer‑based models are now the mainstream in industrial search and recommendation systems (e.g., Taobao, JD, Meituan, Kuaishou).

2.1 Time Window of the Sequence

Recent sequence: Most works (e.g., DNN, DIN, DIEN) focus on the last few days/weeks/months, which best reflect the user's short‑term interest.

Long‑term sequence: Works such as MIMN, SIM, DMT, AliSearch model behavior over years to capture long‑term preferences (categories, brands, price ranges, etc.).

Session sequence: A session (a few minutes) is treated as a unit; DSIN, TISSA model each session separately before aggregating multiple sessions.

2.2 Attribute Information of User Behaviors

Item ID (high‑dimensional embedding).

Item attributes (brand, shop, title, etc.).

Behavior timestamp (time gap to current ranking).

Behavior type (click, add‑to‑cart, favorite, purchase).

Detailed behavior info (dwell time, image click, comment view, etc.).

2.3 Multiple Behavior Sequences

Researchers construct several sequences based on time windows and behavior types, e.g., recent/long‑term click/favorite/add‑to‑cart/purchase sequences, query sequences, and exposure‑without‑click sequences.

Mixed modeling: AliSearch mixes recent sequences by time order.

Separate modeling: DMT models each sequence independently to obtain multiple interest vectors.

2.4 Negative Feedback

Implicit negative feedback: Items shown but not clicked.

Explicit negative feedback: User explicitly marks “dislike”.

Works such as AliSearch, DFN, FINN, DSTN start to model negative‑feedback sequences to improve efficiency and user experience.

2.5 Multi‑Dimensional Modeling

Most works concatenate all attribute information into a single vector; a few (e.g., DHAN) explicitly model multiple dimensions (price, style, brand) to capture richer preferences.

3. Detailed Introduction of Modeling Methods

We categorize existing models by the underlying machine‑learning technique.

3.1 Pooling

Methods such as DNN aggregate behavior embeddings by mean, sum, or max pooling, treating all behaviors equally.

DNN (RecSys'16) applies pooling to video‑watch and search sequences for YouTube recommendation.

3.2 Attention

Pooling cannot differentiate the importance of each item. Attention‑based models (DIN, DSTN) compute an attention score between each historical item and the target ad, using it as a weight for aggregation.

DIN (KDD'18) computes attention scores without softmax to keep flexibility; the scores vary for different target ads, enabling multi‑interest modeling.

DSTN (KDD'19) models click, non‑click, and nearby‑ad sequences for Shenma search.

3.3 Recurrent Neural Networks (RNN)

RNN‑based models capture temporal order, which pooling and attention ignore.

GRU4Rec (ICLR'16) uses GRU for session‑based recommendation.

DIEN (AAAI'19) employs a two‑layer RNN with an auxiliary loss to model evolution of user interest.

DUPN (KDD'18) combines RNN and attention, incorporating attribute information.

HUP (WSDM'20) introduces a pyramid RNN to model micro‑behaviors and multiple interest levels.

DHAN (SIGIR'20) uses hierarchical attention to model preferences across categories, price, brand, etc.

Figures of each model are included in the original article (omitted here for brevity).

3.4 Capsule

Pooling‑based recall often returns very similar items. Capsule models (MIND, ComiRec) perform implicit clustering of behavior sequences to produce multiple interest vectors.

MIND (CIKM'19) applies capsule networks for Tmall recommendation recall.

ComiRec (KDD'20) combines capsule or self‑attention with a controllable multi‑interest aggregation module.

3.5 Transformer

Transformers address long‑range dependencies and interactions among behaviors. Many industrial systems now rely on Transformer‑based models.

ATRank (AAAI'18) uses self‑attention for multi‑behavior sequences.

BST (DLP‑KDD'19) applies Transformer for Taobao recommendation.

DSIN (IJCAI'19) splits behaviors into sessions, models each with Transformer, then aggregates sessions with RNN.

TISSA (WWW'19) combines RNN and self‑attention for session modeling.

SDM (CIKM'19) fuses short‑term Transformer‑based interest with long‑term attention via a gating mechanism.

KFAtt (NeurIPS'20) introduces Kalman Filtering Attention to merge multiple sessions for JD search.

DFN (IJCAI'20) models explicit and implicit feedback with Transformer in a deep‑feedback module.

SIM (CIKM'20) searches long‑term sequences with hard/soft search and self‑attention.

DMT (CIKM'20) uses multiple Transformers for click, add‑to‑cart, purchase sequences with MMoE and bias‑DNN.

AliSearch (Taobao) employs separate Transformers for recent mixed sequences, long‑term click, long‑term purchase, recent on‑device click, and exposure sequences.

4. Future Directions

Potential research avenues include:

Better modeling of temporal attributes (e.g., neural point processes for just‑in‑time recommendation).

Explicit multi‑dimensional modeling of behavior attributes.

Improved handling of noisy negative feedback.

Modeling relationships among multiple behavior sequences.

Capturing interactions between short‑term and long‑term interests.

Integrating sequence modeling with multi‑task learning for click, purchase, dwell‑time, etc.

Leveraging self‑supervised learning to enhance sequence models.

References are listed at the end of the original article.

E-commerceDeep Learningtransformeruser behavior modelingAttentionRecommendation systemssequence modeling
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.