Artificial Intelligence 15 min read

Joint Entity and Relation Extraction: Methods and Document‑Level Approaches

This presentation reviews the importance of entity‑relation extraction for knowledge‑graph construction, compares sentence‑level and complex contexts, and surveys joint extraction techniques—including sequence labeling, table filling, and seq2seq models—as well as document‑level graph‑based methods and future research directions.

DataFunSummit

Oct 2, 2021

Joint Entity and Relation Extraction: Methods and Document‑Level Approaches

Entity relation extraction is a crucial step in knowledge graph construction and information extraction, involving the identification of semantic relationships between entities.

Traditional sentence‑level extraction focuses on simple contexts, while complex contexts involve multiple triples within a sentence or cross‑sentence relations, as illustrated by examples from the DocRED dataset where over 40% of facts require joint extraction.

The talk reviews three major families of joint extraction methods: (1) sequence‑labeling approaches such as the NovelTagging scheme (ACL 2017) that encode relation tags with Begin/Inside/End/Single markers and use LSTM‑CRF models; (2) table‑filling approaches that represent entities and relations in a matrix, later extended with multi‑head selection and sigmoid‑based overlapping relation handling; (3) sequence‑to‑sequence models like CopyRE and its improvements (CopyMTL, Seq2UMTree) that treat triples as generated sequences and employ copy mechanisms to recover multi‑token entities.

Document‑level relation extraction is addressed by building graph representations of entire documents. Early methods aggregate sentence‑level predictions, while later works employ graph neural networks: GCNN (ACL 2019) constructs word‑level graphs with syntactic, coreference, and adjacency edges; EOG (EMNLP 2019) introduces heterogeneous edges among mentions, entities, and sentences; LSR (ACL 2020) learns latent graph structures end‑to‑end; and Double Graph (EMNLP 2020) separates mention‑level and entity‑level graphs, using GCN or random walks followed by MLP classification.

Experimental results on CDR, CHR, and DocRED datasets show that graph‑based models consistently outperform baselines, and that edge design (especially cross‑sentence edges) and graph refinement are critical for performance.

The concluding outlook highlights open challenges such as mitigating label‑bias in seq2seq decoding, exploring sequence‑to‑set formulations, addressing over‑smoothing in GNNs, and improving information flow among heterogeneous nodes in document‑level graphs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

NLP Knowledge Graph document-level entity-relation extraction joint extraction

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.