Artificial Intelligence 24 min read

Real-time Attention-based Look-alike Model for Recommender Systems

This talk presents a real-time attention-based look‑alike model (RALM) designed to address the long‑tail problem in recommendation systems by efficiently expanding seed users, leveraging user representation learning, attention mechanisms, and clustering to deliver timely, diverse content without retraining the model.

DataFunTalk
DataFunTalk
DataFunTalk
Real-time Attention-based Look-alike Model for Recommender Systems

This article summarizes a presentation on a real‑time attention‑based look‑alike model (RALM) proposed to solve the long‑tail problem in large‑scale recommendation systems such as WeChat "Look‑One".

Background : Traditional CTR‑based models struggle with the Matthew effect, where a small fraction of items receives most of the traffic, leaving the long‑tail content under‑exposed. The root cause is incomplete item behavior modeling caused by feature abstraction.

Look‑alike Idea : Replace items with a set of seed users who have historically interacted with the item, turning the problem into a user‑to‑user similarity task. Two classic variants exist—similarity‑based (fast but less accurate) and regression‑based (accurate but costly). Neither fits the real‑time, high‑throughput requirements of news feed recommendation.

Core Requirements :

Real‑time: new items must be served without retraining.

Efficiency: the model must improve long‑tail exposure while preserving CTR.

Speed: online inference must be low‑latency.

RALM Architecture :

1. User‑to‑user model : The item is represented by the embeddings of its seed users, turning the classic user‑item CTR model into a user‑user model.

2. Seeds representation : Captures both global (common to the whole seed group) and local (specific to the target user) information using two attention mechanisms—global self‑attention and target‑aware multiplicative attention.

3. Real‑time deployment : After offline training, only the local attention (target‑to‑seed) is computed online; global embeddings and cluster centroids are pre‑computed.

Offline Training consists of two stages:

User Representation Learning : Inspired by YouTube’s multi‑domain representation model, it merges heterogeneous user features via a self‑attention merge layer, producing high‑order user embeddings.

Look‑alike Learning : Uses the user embeddings from the first stage; seed users are clustered (k‑means) to reduce computation, then local and global attentions are learned to predict similarity scores.

System Architecture :

Offline training produces user embeddings and look‑alike model parameters, cached in a KV store.

Asynchronous online processing updates seed‑user lists, global embeddings, and cluster centroids on a minute‑level schedule.

Online service retrieves the target user embedding, the pre‑computed global embeddings and cluster centers, computes the local attention on‑the‑fly, and returns a cosine similarity score used as a feature for the downstream CTR ranker.

Experimental Results : Offline A/B tests showed stable or slightly improved CTR while significantly increasing diversity and long‑tail exposure. The similarity scores from RALM were fed either as a hard exposure threshold or as an additional feature for the CTR model.

Additional Details :

Attention‑based merge layer mitigates the imbalance between strong and weak feature domains.

Dropout and a simple stacking of multi‑domain user embeddings prevent over‑fitting.

Cold‑start items are initially served by a linear model based on semantic features until enough seed users are collected.

Q&A Highlights :

RALM can be used in the recall stage with a configurable exposure threshold or as an extra signal for the CTR model.

End‑to‑end training is difficult due to model size and the need for diverse domain‑specific user representations.

Reference : "Real-time Attention Based Look-alike Model for Recommender System" (arXiv:1906.05022).

real-timeClusteringAttentionrecommender systemslong-taillook-alikeuser representation
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.