Artificial Intelligence 15 min read

Intelligent Entity Recommendation in Search Scenarios: Architecture, Relevance, Sparse Data Recall, and Multi‑Domain Strategies

This article presents a comprehensive overview of intelligent entity recommendation for search, covering scenario introduction, relevance modeling, handling sparse query and entity data with graph‑based methods, and multi‑domain, multi‑scenario ranking techniques to improve user experience.

DataFunTalk

Sep 26, 2022

Intelligent Entity Recommendation in Search Scenarios: Architecture, Relevance, Sparse Data Recall, and Multi‑Domain Strategies

Guest Speaker: Chen Xi, Tencent Researcher Editor: Wu Xiao, Southeast University Platform: DataFunTalk

Overview: The talk is divided into four parts: scenario introduction and overview, entity recommendation relevance, sparse‑data entity recall, and multi‑domain multi‑scenario entity recommendation.

01. Scenario Introduction and Overview

The recommendation system aims to return entities related to the user query and present them across different dimensions.

General Recommendation: For a query like "Andy Lau", recommend related persons and movies.

Vertical Domain Recommendation: For a query in the film domain, recommend related TV series and co‑actors; for a query in the novel domain, recommend similar novels or works by the same author.

Product Technical Framework

The framework consists of four layers: basic data, underlying capabilities, recommendation system, and application scenarios.

1. Basic Data: Raw logs (search logs, exposure clicks) plus knowledge graph, vertical knowledge bases, and document content.

2. Underlying Capabilities: Text understanding (intent detection, entity recognition, disambiguation) and entity understanding (quality, classification, association).

3. Recommendation System: Recall, ranking, and quality control. Recall uses multi‑channel queues (collaborative filtering, content‑based, semantic/knowledge‑graph methods). Ranking merges CTR‑based fusion with diversity and user‑experience considerations. Quality control filters low‑quality or sensitive results.

4. Application Scenarios: Besides the search results page, the system is applied in QQ Browser’s novel reader, encyclopedia, and third‑party pages.

02. Entity Recommendation Relevance

Ensuring relevance between the query and recommended entities is crucial. The approach predicts the implicit category of the query and then constrains the recommendation results.

Queries are grouped into three types:

Query does not contain an explicit entity but implies a need (e.g., "Olympic mascot" → virtual character).

Query contains one or more entities (e.g., "Plot of Renjian" → film entities).

Query is exactly an entity name (e.g., "Tian Long Ba Bu" → multiple knowledge‑graph IDs).

To enrich short queries, multi‑scenario information is added, such as knowledge‑graph attributes and search‑scene features (clicked titles, site information).

User session behavior is also leveraged: for ambiguous queries, the system selects the entity type that matches the user’s recent behavior (e.g., game vs. book).

All features (search‑scene, knowledge‑graph, session behavior) are embedded and fused via a multi‑tower architecture, each tower predicting a category probability and dynamically setting thresholds to decide which categories appear in the recall stage.

03. Sparse‑Data Entity Recall

After confirming user intent, the recall stage faces two sparsity challenges: query sparsity (cold‑start or newly emerging queries) and entity sparsity (long‑tail entities).

Query Sparsity Solutions:

itemCF: Leverages user behavior to retrieve related queries, even with spelling errors.

IR: Inverted‑index based lexical similarity.

SR: Dual‑tower semantic model for semantically similar queries.

The dual‑tower model is trained with an auxiliary loss that aligns the query’s intrinsic recall ability (historical click/entity count) with the main prediction.

Entity Sparsity Solutions: Use Graph Neural Networks to generalize entity embeddings.

Entity Graph Construction: Combine session‑derived relations, knowledge‑graph links, and document co‑occurrence.

Positive/Negative Sampling: Biased random walks for positives; easy negatives via random or popularity sampling; hard negatives via type‑aware or long‑walk sampling.

To enrich node representations, GraphSAGE is adopted: sample up to 2‑hop neighbors (30% uniform, 70% popularity), aggregate with attention, and inject pretrained EGES embeddings.

04. Multi‑Domain Multi‑Scenario Entity Recommendation

After graph‑based entity embeddings, vector retrieval enriches recall alongside behavior‑based and knowledge‑graph recall queues.

1. Multi‑Dimensional Feature Construction:

Query dimension: Keywords, intent, click signals, diversity.

Entity dimension: Quality, historical performance, relevance on result page.

Joint dimension: Historical interaction, query‑entity similarity, entity awareness.

2. Multi‑Domain Model Construction: Different domains have distinct entity attributes and data imbalance (head vs. tail). Shared query and generic entity features are trained jointly, while domain‑specific features have dedicated sub‑networks.

3. Multi‑Scenario Refinement: Besides image, name, and description, category hints are displayed to help users understand recommendations.

In summary, the system integrates query enrichment, graph‑based entity embeddings, and multi‑tower fusion to deliver relevant, diverse, and domain‑aware entity recommendations even under sparse data conditions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

graph neural networks Knowledge Graph Search Sparse Data entity recommendation Multi-domain

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.