Embedding Techniques in Tencent Mobile News Recommendation System

This article reviews the practical use of embedding technologies in Tencent's mobile news recommendation pipeline, covering the fundamentals of embeddings, their historical development, item and image embeddings, user embeddings, various vector‑based recall methods, clustering strategies, and recent advances and challenges.

DataFunTalk
DataFunTalk
DataFunTalk
Embedding Techniques in Tencent Mobile News Recommendation System

01 What is Embedding

Embedding is a dense vector representation that replaces one‑hot encoding, allowing natural language and other categorical data to be computed efficiently; unlike one‑hot, embeddings are learned parameters of neural networks and capture relative rather than absolute meanings.

02 Embedding Development Milestones

From Hinton's 1986 proposal to the industrial success of word2vec, embedding evolved from implicit matrix factorization ideas to a core component of modern recommendation pipelines, enabling feature engineering, user profiling, and efficient nearest‑neighbor search with tools like Faiss.

03 Item Embedding

Items (text and images) are vectorized using word2vec‑derived text embeddings and ResNet‑based image embeddings; static embeddings (word2vec, fastText, GloVe) and dynamic embeddings (ELMo, GPT, BERT) are discussed, with dynamic models better handling polysemy.

04 Image Embedding

Images are encoded via ResNet, image captioning, face detection (FaceNet), OCR for comics, and style transfer; lower‑level CNN features are generic, while higher‑level features are task‑specific, motivating the use of pretrained lower layers and fine‑tuning higher layers.

05 User Embedding

User vectors are built from high‑importance profile features (tags, media IDs, categories, topics) using weighted sums, later enhanced with DSSM and BERT+LSTM models to place users and items in a shared vector space.

06 Embedding‑Based Recall

After obtaining item and user vectors, various recall strategies are applied, primarily using single embeddings; methods include i2i (item‑to‑item) recall with fastText+Faiss, tag2vec, media2vec, and u2i (user‑to‑item) recall using user2vec, DSSM, and cross‑tag techniques.

07 Incremental Clustering

Clusters are pre‑computed with K‑means; new data points are assigned to the nearest centroid, and cluster centers are periodically refreshed during low‑traffic periods to maintain accuracy.

08 Dynamic Rule Clustering

User interests are extracted, weighted, and merged into interest tags; small clusters are merged into similar larger ones, iteratively balancing cluster sizes and improving CTR by about 3%.

Other Embedding Recall Algorithms

Deep neural network based recall (CNN, attention, YouTube‑style models) combines multiple feature embeddings (category, media ID, topic, knowledge graph) into long vectors; challenges include short news lifecycles, sparse samples, and embedding space optimization (residual embeddings, frequency‑aware coding).

Overall, embedding remains a powerful technique for recommendation, evolving through stages from static word vectors to sophisticated end‑to‑end deep models, while addressing issues such as incremental updates, multi‑feature integration, and long‑tail data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

clusteringdeep learningEmbeddingTencentvector representation
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.