Artificial Intelligence 20 min read

Knowledge Representation Learning for Knowledge Graphs: Business Overview, Algorithms, and Applications

This article presents an overview of Xiaomi's knowledge graph platform, introduces text‑augmented knowledge representation learning methods such as Jointly and DKRL, and details their practical applications in entity linking, entity recommendation, and knowledge graph completion within AI‑driven services.

DataFunSummit
DataFunSummit
DataFunSummit
Knowledge Representation Learning for Knowledge Graphs: Business Overview, Algorithms, and Applications

01 Business Introduction

1. Xiaomi Knowledge Graph Team

The Xiaomi Knowledge Graph team researches construction and application technologies for open‑domain and industry‑specific knowledge graphs, building large‑scale, high‑quality graphs that provide entity search, entity linking, concept graphs and serve products such as XiaoAi, Xiaomi.com and Xiaomi news feeds.

2. Knowledge Graph Empowering XiaoAi

When a user asks a question like "Where is Gong Li from?", the system retrieves the relevant entity information from the graph and returns the answer.

For a query such as "What good food is around Wuhan University?", the answer flow includes:

① Language recognition extracts the textual content.

② Intent analysis identifies the user’s interest in "food".

③ Entity matching resolves the core entity ("Wuhan University") and the attribute ("surrounding food").

④ The corresponding result is fetched from the graph.

⑤ An entity recommendation is generated to suggest similar questions.

02 Algorithm Introduction

Knowledge representation learning maps entities or relations into low‑dimensional dense vectors so that semantically similar objects are close in the vector space.

Typical methods fuse factual information (translation‑based, tensor factorization, neural networks, GNNs), but they struggle with long‑tail entities that have few or no triples, leading to severe sparsity.

To address long‑tail issues, two strategies are used:

① Leverage other information inside the KG such as textual descriptions, entity types, key paths, logical rules, attributes, temporal data and graph structure.

② Exploit massive external information from the Web that contains additional facts about entities and relations.

1. Advantages of Integrating Textual Descriptions

Discover semantic relevance between entities; precise textual cues (e.g., "Sabenning's spouse") help uncover hidden connections.

Enable representation of newly added entities that only have descriptive text, which traditional KG embedding methods cannot handle.

We construct a description by concatenating the entity type, its textual description and important triples, producing a richer text than a simple label.

2. Alignment of Text and Knowledge Graph

The classic Jointly model aligns entity, relation and word vectors in a shared semantic space. It consists of three parts:

Text embedding: a skip‑gram model that learns word‑word similarity via Euclidean distance.

Knowledge embedding: a Trans‑E model that learns structural constraints between entities.

Alignment model: constraints that force entity vectors and word vectors from the description to reside in the same space.

The follow‑up DKRL model extends Jointly by incorporating a translation‑based embedding with a text‑aware component, using continuous bag‑of‑words and deep convolutional networks to embed descriptions.

Trans‑E learns entity and relation embeddings from triples.

CBOW ignores word order, while the CNN captures sequential information.

Each entity receives both semantic and structural embeddings, which are fused by a set of formulas.

03 Algorithm Application

1. Entity Linking

The goal is to link mentions in text to the corresponding entities in the knowledge base.

Example: for the question "What lines does Li Bai have in King of Glory?", the pipeline is:

① Identify the core mention "Li Bai".

② Retrieve all candidate entities for "Li Bai" from the KG.

③ Disambiguate and select the correct entity.

Challenges include multiple surface forms for the same entity (e.g., "Qinglian Jushi", "Li Taibai") and the same surface form referring to different entities depending on context.

Disambiguation consists of a coarse‑ranking stage and a fine‑ranking stage.

Coarse ranking: Uses three features—Context (dot product between candidate entity vector and each query word vector), Coherence (consistency among candidate entities of different mentions), and LinkCount (prior frequency). A multilayer perceptron fuses these features to produce a score, and the top‑N candidates are passed to the fine stage.

Fine ranking: A BERT‑based sentence‑pair classifier takes the mention‑marked query (text_a) and the candidate entity description (text_b), extracts the [CLS] vector, concatenates it with the coarse‑ranking features, and outputs a final relevance score. The top‑1 entity is selected.

2. Entity Recommendation

The task is to recommend a set of related entities given a target entity. Current work focuses on similarity‑based recommendation without personalization.

Applications include:

Generating related QA pairs for smart‑assistant queries (e.g., recommending "Tsinghua University" when the user asks about "Wuhan University").

Suggesting related entities on a KG construction platform when a user searches for an entity.

The cold‑start problem is mitigated by treating co‑occurring entities in encyclopedia triples and news articles as positive samples, while random entities serve as negatives.

The recommendation model comprises:

Representation model: the DKRL model (section 2) learns joint embeddings of entities and their textual descriptions, with BERT replacing the original text encoder.

Matching model: a DSSM‑style network re‑uses the DKRL parameters to encode two entities and computes cosine similarity as the relevance score.

3. Knowledge Completion

When constructing a KG, many triples are incomplete (e.g., missing hyperlinks for persons). Knowledge completion aims to fill such gaps.

Given a head entity, a relation, and a tail mention, the pipeline is:

① Use the schema to determine the tail entity type.

② Generate candidate tail entities based on the mention.

③ Apply a triple‑classification model to judge the correctness of each candidate triple.

④ Rank the triples and select the top‑1 with a confidence above a threshold.

The model follows the KG‑BERT design: text_a is the head description, text_b is the relation name, text_c is the tail description. After BERT encoding, a fully‑connected layer produces semantic features that are fused with handcrafted structural features to compute a final score.

04 Summary and Outlook

This article briefly introduced how knowledge representation learning is applied to entity linking, entity recommendation, and knowledge graph completion, emphasizing industrial practicality, model efficiency, and the need for continued research in AI‑driven knowledge graph technologies.

References

Wang Z, Zhang J, Feng J, et al. Knowledge graph and text jointly embedding. EMNLP 2014: 1591‑1601.

Zhong H, Zhang J, Wang Z, et al. Aligning knowledge and text embeddings by entity descriptions. EMNLP 2015: 267‑272.

Xie R, Liu Z, Jia J, et al. Representation learning of knowledge graphs with entity descriptions. AAAI 2016, 30(1).

Xiao H, Huang M, Meng L, et al. SSP: semantic space projection for knowledge graph embedding with text descriptions. AAAI 2017, 31(1).

Reimers N, Gurevych I. Sentence‑BERT: Sentence embeddings using siamese BERT‑networks. arXiv:1908.10084, 2019.

Yao L, Mao C, Luo Y. KG‑BERT: BERT for knowledge graph completion. arXiv:1909.03193, 2019.

刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展. 计算机研究与发展, 2016, 53(2): 247.

artificial intelligenceKnowledge Graphrepresentation learningentity linkingentity recommendationknowledge completion
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.