Artificial Intelligence 14 min read

Content Understanding for Personalized Feed Recommendation: Interest Graph and Techniques

This article explains how Tencent tackles content understanding for personalized feed recommendation by combining traditional classification, keyword, and entity methods with deep learning embeddings, introducing an interest graph composed of taxonomy, concept, entity, and event layers to capture full context and infer user consumption intent.

DataFunTalk
DataFunTalk
DataFunTalk
Content Understanding for Personalized Feed Recommendation: Interest Graph and Techniques

In modern feed recommendation, content understanding consists of two main parts: legacy technologies from the portal and search eras (classification, keywords, knowledge graphs) and deep‑learning‑driven embeddings. While classification is coarse and embeddings lack interpretability, Tencent proposes a solution that overcomes these issues.

1. Evolution of Content Understanding

The portal era (1995‑2002) relied on manually curated content types and later automated text classification. The search/social era (2003‑present) added keyword extraction and knowledge graphs to resolve entity ambiguity. The intelligent era (2012‑present) introduced personalized recommendation, demanding richer content understanding.

2. Recommendation vs. Search

Search sorts documents by the intersection of query terms, preserving full context. Recommendation sorts by the union of user interest terms, which can lose the contextual relationship between terms (e.g., "Wang Baoqiang" and "Ma Rong" become separate interests). Therefore, recommendation requires preserving the complete context of an interest point.

3. Why Users Consume Content

Traditional methods answer "what the article is" but ignore "why a user consumes it". Understanding the underlying intent (e.g., brand preference, safety concerns) is essential for effective recommendation.

4. Limitations of Traditional NLP Techniques

Classification: coarse granularity, limited to thousands of categories.

Keyword extraction: massive scale but suffers from ambiguity.

Entity words: precise but can create filter bubbles.

LDA: similar granularity issues as classification.

Embedding: unlimited scale but hard to interpret.

5. Interest Graph

The interest graph consists of four layers:

Category layer – a strict tree built by product managers (~1,000 nodes).

Concept layer – groups of entities sharing attributes (e.g., "fuel‑efficient cars").

Entity layer – knowledge‑graph entities such as "Liu Dehua".

Event layer – specific events like "Wang Baoqiang divorce".

This structure captures both operational needs (category layer) and reasoning about user intent (concept layer), while entities and events provide fine‑grained recall.

6. Concept Mining

Concepts are short phrases lacking labeled training data, so a weak‑supervision approach is used: search click data provides semi‑supervised signals, and UGC data helps determine appropriate granularity.

7. Hot Event Mining

Queries with bursty search volume indicate hot events. A DTW‑based similarity to a predefined trend template identifies bursts, followed by clustering similar queries into topics and filtering non‑event topics using URL‑based features.

8. Association Relationships

Entity co‑occurrence and sequential search behavior provide positive samples; random negative sampling yields a 1:3 ratio. Pairwise loss trains entity embeddings, enabling association scoring even for rarely co‑occurring pairs.

9. Content Understanding Components

9.1 Text Classification

PM‑defined taxonomy is refined using user click clustering and subsequent PM labeling.

9.2 Keyword Extraction

Traditional features + GBRank are used, followed by a re‑ranking layer that incorporates association‑relationship embeddings to demote misleading high‑score terms.

9.3 Semantic Matching

Concept and event tags are retrieved via a two‑stage recall (relationship recall from the interest graph and semantic vector recall) and then ranked using interaction‑based features.

10. Online Results

Adding concept and event layers to the baseline (which only used entities and categories) yields a significant lift in key metrics, confirming the effectiveness of the proposed interest‑graph‑based content understanding.

Overall, the talk demonstrates how a multi‑layer interest graph combined with weak‑supervision mining and embedding‑based association can overcome the shortcomings of traditional NLP techniques and substantially improve personalized recommendation performance.

Personalizationembeddingrecommendation systemsNLPContent Understandinginterest graph
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.