Building a Distributed Graph Database on Aerospike – Usage Example, Features, and Limitations
This article presents a practical example of creating a distributed graph database on Aerospike, covering graph creation, JSON data loading, query and traversal operations, a detailed feature checklist, and the current limitations of the platform.
Yesterday, expert Zheng Zhibin introduced GraphDB and provided a link to the first part of a series on building a distributed graph database using Aerospike. Today, he demonstrates a concrete usage example.
1. Create Graph
Using the Titan graph as an example (illustrated by several images).
2. Load Data via JSON
Loading pre‑saved offline JSON data is much simpler. The file data/product.json follows the format shown in the screenshots.
3. Query
Various query screenshots illustrate how to retrieve edges and vertices.
4. Traversal
Traversal examples are shown with multiple images, followed by a brief note that vertex and edge updates/deletions are straightforward and omitted here.
Features
Graph Model Label Property Graph Model – Phase 1 Schema‑less – Phase 1 Index‑based – Phase 1 Document‑oriented (seamless with JSON) – Phase 1 Allow unspecified label? – Phase 2 (planned)
Indexes Exact match (Match Index) – Phase 1 Range query (Range Index) – Phase 1 Nested property index – Phase 1 Array index – Phase 1 Geo query – Phase 2 (planned) Full‑text search – Phase 3 (planned) Multi‑condition index optimization – Phase 2 (planned)
APIs addEdge, updateEdge, deleteEdge, queryEdge – Phase 1 addVertex, updateVertex, deleteVertex (requires edge cleanup), queryVertex – Phase 1 createIndex, dropIndex, reIndex – Phase 1 loadGraph – Phase 1 dropData – Phase 1 Other graph, index, vertex, and edge APIs – Phase 1 createGraph – not supported yet (depends on future Aerospike releases) loadData (bulk import) – Phase 2 (planned) deleteGraph – not supported yet (depends on future Aerospike releases)
Graph Traversal Java SDK supports traversal – Phase 1 orderBy, limit, values selection – Phase 1 Gremlin syntax parsing – Phase 2/3 (planned) Python SDK – Phase 2/3 (planned)
Future Roadmap Aggregations (count, max, min, sum, group by, etc.) – Phase 3/4 Platformization – Phase 4, including system monitoring, cluster management, namespace/database management, CRUD UI, log viewing, custom UDF support, authentication/authorization, fine‑grained permissions, online requests, visual query UI, and more.
Limitations
Dynamic namespace management requires source code changes.
Namespace can hold at most 256 secondary indexes.
Bin name length limited to ≤14 characters.
Secondary‑index queries do not support logical operators (AND, OR, NOT), only single‑attribute queries.
No built‑in aggregation functions; UDFs are possible but inefficient.
Query does not support pagination or ordering.
Only exact match and range queries; no full‑text search.
Batch reads are supported, but batch writes are not.
Queries without an appropriate index fail instead of falling back to a full scan.
Queries without a set name search only records without a set.
Fast restart and graph partitioning optimizations are pending.
Author Introduction
Zheng Zhibin, a graduate of South China University of Technology (Bilingual Class), has worked at BAT on e‑commerce, open platforms, mobile browsers, search ad back‑ends, and big data/AI. He specializes in business architecture, platformization, and solution design.
For readers who missed the theoretical part, the previous article can be accessed via the provided link.
Baidu Intelligent Testing
Welcome to follow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.