Model‑Based Recall in Momo's Social Recommendation: Technical Exploration and Practical Applications
This article presents a comprehensive technical overview of Momo's model‑based recall system for social recommendation, detailing the underlying user‑scenario behavior models, social graph embeddings, multimodal content semantics, and deployment results that improve matching relevance and user interaction rates.
The article introduces Momo's social recommendation platform and explains why model‑based recall is crucial for the "Nearby Moments" and "Nearby People" scenes, where both content consumption and social matching need to be addressed.
It then describes the basic concepts of model‑based recall, contrasting traditional functional, hot, and business recall with modern embedding‑based approaches such as shallow, deep matching, and semantic models, and outlines the two‑stage framework consisting of offline model training and online ANN retrieval.
The technical evolution section reviews recent industry advances (e.g., DSSM, YouTube‑DNN, MIND) and highlights Momo's own progress from 2013 to 2020, emphasizing improvements in representation power and generalization.
In the application part, the paper details four major recall channels: (1) similarity‑based content recall (I2I), (2) direct preference recall (U2I), (3) similar‑preference user recall (U2U2I), and (4) social‑matching recall (U2U2I). Each channel leverages user interaction sequences, profile embeddings, and social graph representations.
User preference modeling combines long‑term and short‑term behavior sequences via Transformer‑based multi‑head attention, and employs a weighted‑hinge loss with batch‑negative and global‑negative sampling to improve training.
For social matching, a full‑platform user graph is built with edges for friendships, chats, and blacklist relations; Graph Convolutional Networks (GCN) learn node embeddings, and virtual edges are added based on shared interests to enhance generalization.
Dynamic content semantics are captured by a multimodal model that encodes text (e.g., Bi‑LSTM or Transformer) and images (ResNet), integrates them through an interactive attention module, and aligns text‑image representations to reflect real‑time publishing intent.
Extensive offline experiments and online A/B tests show that the model‑based recall pipelines increase interaction conversion rates by over 10% and improve social matching rates by more than 10%.
The article concludes with a forward‑looking discussion, advocating a problem‑driven rather than purely model‑driven approach for future recommendation research.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.