Knowledge Graph Construction, Applications, and Recent Advances in Entity Linking
This article reviews the fundamentals of knowledge graphs, their practical uses in question answering, search and recommendation, and surveys recent research on entity linking—including dual‑encoder retrieval, BERT‑based models, multilingual approaches, and zero‑shot methods—while also outlining modern knowledge‑graph construction pipelines and open challenges.
The speaker, a senior researcher from Tencent, introduces knowledge graphs (KGs) as heterogeneous graph structures that represent entities, relations, types, and attributes, bridging the textual world and the semantic world. Prominent public KGs such as Wikidata, DBpedia, Freebase, and Google Knowledge Graph are mentioned.
KGs enable high‑quality reasoning for tasks like intelligent question answering, faceted search, and semantic understanding; examples illustrate how mapping ambiguous mentions to KG entities resolves user queries and improves recommendation diversity.
Entity linking (EL) is defined as the process of detecting mentions in text and mapping them to KG entities, also known as entity disambiguation or resolution. The typical EL pipeline consists of mention detection (often via NER), candidate generation (traditionally using alias tables), and candidate scoring (using feature‑based models or neural approaches).
Recent advances include:
Dual‑encoder (dual‑tower) retrieval models that encode mentions and entities into a shared vector space for efficient nearest‑neighbor search.
RELIC, a BERT‑based dual‑tower model that learns contextual entity representations and achieves state‑of‑the‑art performance.
Deep Retrieval methods that replace alias‑table dependence with learned dense representations, enabling multilingual EL via mBERT.
Zero‑shot EL using reading‑comprehension style models that jointly encode mention and entity description with cross‑attention.
Challenges highlighted are heavy reliance on alias tables, difficulty handling long‑tail entities, scalability to billions of entities, and multilingual coverage.
The article then shifts to knowledge‑graph construction techniques, describing the evolution from manual annotation to rule‑based and machine‑learning pipelines. Core components include mention extraction, relation extraction, entity linking, coreference resolution, and entity alignment, often built with tools such as DeepDive, Snorkel, and transformer models.
A concrete example walks through extracting entities (e.g., "Taylor Swift" and "New Jersey"), their relation (PERFORMS_IN), and linking them to a KG, illustrating the full pipeline from raw text to structured triples.
Finally, the speaker discusses open research problems such as low‑resource KG construction, noise reduction in distant supervision, temporal knowledge, multimodal KG integration, and vertical domain KG building.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.