Artificial Intelligence 21 min read

Knowledge Graph Construction and Entity Linking: Techniques, Applications, and Recent Advances

This article provides a comprehensive overview of knowledge graphs and entity linking, covering their definitions, practical uses in question answering, search and recommendation, the standard pipeline of mention detection, candidate generation and scoring, challenges such as scalability and multilinguality, and recent research advances including dual‑encoder, RELIC, deep retrieval, and multilingual BERT‑based models, followed by a discussion of modern knowledge‑graph construction methods.

DataFunTalk
DataFunTalk
DataFunTalk
Knowledge Graph Construction and Entity Linking: Techniques, Applications, and Recent Advances

Knowledge graphs (KGs) are heterogeneous graph structures that represent entities, relations, types, and attributes, bridging the gap between textual mentions and semantic concepts; popular public KGs include Wikidata, DBpedia, Freebase, and Google Knowledge Graph.

KGs enable various applications such as intelligent question answering, faceted search, and recommendation, where mapping textual mentions to KG entities allows precise reasoning and improves diversity and interpretability of results.

The core task of entity linking (EL) involves detecting mentions, generating candidate entities, and scoring candidates; mention detection can use sequence labeling, rule‑based or dictionary methods, while modern approaches leverage pretrained language models like BERT.

Traditional candidate generation relies on alias tables, which suffer from strong dependence on curated mappings and struggle with fresh or low‑frequency entities; recent work proposes deep retrieval with dual‑encoder models to learn dense representations for mentions and entities.

Scoring methods have evolved from feature‑engineered shallow models to reading‑comprehension approaches that jointly encode mentions and entity descriptions using cross‑attention, achieving strong zero‑shot performance.

Notable systems include RELIC (dual‑encoder with BERT), DEER, and multilingual extensions using mBERT, which address challenges of large‑scale KGs, tail‑entity coverage, and cross‑language linking.

Knowledge‑graph construction now largely relies on weak supervision: extracting mentions, relations, and performing entity linking and coreference resolution from unstructured text, often using tools such as DeepDive, Snorkel, and transformer‑based models.

A typical pipeline tokenizes text, performs POS tagging, NER, dependency parsing, extracts entities (e.g., "Taylor Swift"), identifies relations (e.g., PERFORMS_IN), resolves coreference, and links mentions to KG entities, followed by error analysis and iterative model improvement.

Open research directions include low‑resource KG construction, temporal knowledge extraction, multimodal KG building, and vertical domain KG creation.

AINatural Language Processingsemantic searchKnowledge Graphmultilingualentity linking
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.