Design and Implementation of a Graph Database Layer on Aerospike
This article describes the motivation, design, and implementation of a label‑property graph database built as a thin layer over Aerospike, detailing its data model, API operations for graphs, indexes, vertices, edges, and query capabilities.
Background: Starting last year, the team built a knowledge‑graph pipeline that required an online graph database for user access. After evaluating open‑source options such as Neo4j, Titan, OrientDB, Graph Engine, Cayley, and ArangoDB, they decided to develop their own solution, reusing Titan's ideas while delegating storage and indexing to the distributed NoSQL store Aerospike.
Graph Model – Label Property Graph: The model consists of four core concepts. Vertex represents an entity with a stable unique ID, optional label (e.g., Person, Product), and a set of properties. Edge represents a relationship as a SPO triple (from_id, label, to_id) with optional key‑value attributes; edges are uniquely identified by this triple and can be bidirectional. Property is a key‑value pair where the key is always a string and the value may be numeric, string, boolean, list, or a custom map type. Label is used to categorize vertices and edges.
Interface Overview: The system provides a set of REST‑like APIs (illustrated with screenshots) for managing graphs, indexes, vertices, edges, and queries.
Graph API : create a graph (dynamic namespace creation not supported by Aerospike), load an existing graph, delete a graph, and clear all data.
Index API : create, delete, and rebuild indexes; supports exact match, range queries, nested attribute paths, and array‑type indexing, with future plans for geo and full‑text search.
Entity API : add, update (partial updates allowed; set value to null to delete a property), and delete vertices (automatically removing incident edges).
Relationship API : add, update (partial), and delete edges.
Graph Query API : retrieve vertices/edges by ID and perform graph traversals using a Gremlin‑style DSL provided in the Java client; the query definition includes various clauses for filtering and projection.
Implementation Note: Currently only a Java client offering a Gremlin‑like DSL is available; there is no multi‑language SDK or flexible string‑based query language. Future work may include a Python SDK or custom HTTP API.
Author: Zheng Zhibin, a bilingual Computer Science student at South China University of Technology, with experience in e‑commerce, open platforms, mobile browsers, search advertising back‑ends, and big data/AI development at BAT.
Baidu Intelligent Testing
Welcome to follow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.