Databases 19 min read

Subgraph Matching in Graph Databases: Concepts, Algorithms, and Optimizations

This article introduces graph databases, contrasts them with relational databases, explains the subgraph‑matching problem and its computational complexity, surveys backtracking and multi‑way join algorithms, discusses worst‑case‑optimal joins, set‑intersection acceleration, hardware support, and presents PKUMOD’s gStore research and its distributed extensions.

DataFunSummit

Apr 24, 2022

Subgraph Matching in Graph Databases: Concepts, Algorithms, and Optimizations

The lecture begins by defining databases and reviewing relational (RDBMS) concepts, then explains how graph databases directly map E‑R models to nodes and edges, highlighting the distinction between conceptual (schema‑first) and physical (schema‑less) models.

It contrasts relational and graph databases, emphasizing that relational systems rely on predefined schemas while graph databases allow flexible, attribute‑rich nodes and edges, making them suitable for heterogeneous, user‑generated data.

The core topic is subgraph matching: given a query graph Q and a data graph G, each vertex of Q must be injectively mapped to a vertex of G. The problem is NP‑Complete for decision and NP‑Hard for enumeration, but practical systems use filtering and indexing to improve performance.

Two main algorithmic families are presented: (1) Backtracking Search (depth‑first with recursion) and (2) Multi‑way Join (breadth‑first). The latter includes Binary Join and Worst‑Case‑Optimal Join, the latter guaranteeing output‑size‑proportional intermediate results.

Worst‑Case‑Optimal Join relies heavily on set‑intersection operations. The authors propose SIMD‑based data layouts to accelerate set intersections, achieving significant speedups in graph algorithms and in systems such as Stanford’s graph processing engine.

Hardware acceleration is also discussed: GPU‑friendly implementations of set‑intersection avoid write conflicts without locks, enabling massive parallelism.

PKUMOD’s contributions are highlighted: the RDF‑graph database gStore (SPARQL answered via subgraph matching), its distributed version for large RDF graphs, and optimizations such as set‑intersection acceleration and hybrid join strategies (combining Worst‑Case‑Optimal and Binary Joins) that improve query performance.

References to the gStore project website, cloud service, and related publications are provided for further exploration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

graph databases gStore SPARQL subgraph matching

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.