WeChat "Look" Content Recall Architecture and Deep Learning Techniques
This article details the technical architecture behind WeChat's "Look" content recall, covering content sourcing, profiling, multimodal tagging, knowledge‑graph representations, propensity and target detection, multi‑stage recall pipelines, and a range of deep learning models including sequence, translation, BERT, dual‑tower, hybrid, and graph neural network approaches.
WeChat's "Look" feature serves as a major content recommendation product, aggregating vast amounts of user‑generated and external media to provide personalized information streams. By leveraging implicit user feedback and extensive content metadata, the system builds detailed content profiles and multimodal tags for text, images, and video.
Content profiling involves normalizing diverse data sources, extracting static attributes, and aligning external tags with internal taxonomies. Two dimensions of content understanding are defined: content‑based attributes and user‑behavior‑derived signals, enabling fine‑grained tagging such as topic models, tag clusters, and entity extraction using deep models (BiLSTM+Attention+CRF, BERT).
Multimedia understanding incorporates facial recognition, video embeddings, OCR, and quality assessments, enriching the feature set for downstream recall and filtering models. Knowledge graphs integrate keyword relationships and user‑behavior‑derived entities to form an interpretable knowledge base for cold‑start and interest‑expansion scenarios.
Embedding techniques transform textual and visual signals into dense vectors at word, sentence, and document levels, while sequence modeling treats user click histories as natural language corpora, applying Word2Vec, RNN, Transformer, and seq2seq models to predict future interests.
Propensity detection tags content with demographic and regional tendencies, and target detection assigns deployment goals (e.g., click‑rate, watch‑time) for various feed streams, feeding both online recall and content library construction.
The recall architecture follows a multi‑path design, combining model‑based, similarity‑based, attribute‑based, social, exploratory, and operational strategies. Queue evolution progressed from attribute‑driven recall to collaborative/social, then to exploratory and deep‑model recall, each addressing diversity, robustness, and cold‑start challenges.
Deep‑model recall is categorized into sequence models, translation models, BERT models, dual‑tower (DSSM, Multi‑View DNN) models, hybrid deep neural networks, and graph models. Each class leverages specific strengths: sequence models capture long‑term interests, translation models enhance diversity, BERT provides bidirectional context, dual‑tower models enable efficient user‑item similarity search, hybrid models fuse heterogeneous features, and graph neural networks exploit the inherent graph structure of user‑item interactions.
Graph‑based approaches progress from shallow embeddings (DeepWalk, LINE, node2vec) to GraphSAGE, GAT, and multi‑task GAT, scaling to billions of nodes and edges while incorporating attention mechanisms and multi‑objective learning.
Overall, the system continuously iterates on model architectures, feature engineering, and deployment pipelines to balance accuracy, diversity, and latency, while maintaining extensibility for new data sources and business scenarios.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.