Artificial Intelligence 10 min read

TWIN: Two-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction

This paper presents TWIN, a two-stage interest network that aligns the similarity metrics of coarse‑grained and fine‑grained modules to improve lifelong user behavior modeling for CTR prediction in large‑scale online recommendation systems.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
TWIN: Two-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction

Background: Kuaishou's recommendation system relies on lifelong user behavior modeling to extract interests from tens of thousands of historical video interactions for accurate CTR prediction.

Motivation: Existing two‑stage models (e.g., SIM) suffer from inconsistent similarity metrics between the coarse‑grained selection (GSU) and fine‑grained attention (ESU), causing GSU to pass irrelevant items to ESU and degrading performance.

TWIN Algorithm: The proposed TWIN forces GSU and ESU to use the same distance metric—multi‑head target attention—making the two stages behave like twins and ensuring that the items selected by GSU are those ESU would attend to.

Feature Splitting and Linear Mapping: To scale the attention mechanism from 100 to 10 000–100 000 behaviors, the behavior embedding is split into inherent features (shared across users) and user‑item cross features. Inherent features are accelerated via pre‑computed lookup tables, while cross features are compressed to one dimension per feature.

Attention Mechanism in TWIN: GSU computes weights as the sum of (1) a dot product between the compressed inherent feature vectors and a shared weight matrix, (2) a bias term learned per feature, and (3) the target query vector; ESU then applies multi‑head target attention on the top‑100 selected behaviors using the same metric.

System Design: The solution comprises an online training pipeline that processes 46 billion user‑video interactions daily, an offline pre‑computation step that caches inherent feature mappings for 8 billion video IDs covering 97 % of traffic, and an online service that handles 30 million video requests per second by saving 99.3 % of computation through the lookup‑table strategy.

Experimental Results: Compared with SOTA two‑stage methods, TWIN achieves higher top‑100 hit‑rate, closer to the oracle, and online A/B tests show significant improvements in CTR across multiple business modules, confirming the effectiveness of metric consistency and the proposed optimizations.

CTR predictionrecommendation systemuser behavior modelingattention mechanismtwo-stage networkKuaishoufeature splittingTWIN
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.