Databases 21 min read

Nebula Graph Architecture, Deployment Strategies, and Performance Optimization at Ctrip

This article describes Ctrip's adoption of Nebula Graph, covering the reasons for choosing the open‑source distributed graph database, its modular architecture, multi‑data‑center and blue‑green deployment patterns, middleware integration, client session management, query language extensions, and a series of performance‑tuning practices that solved stability and CPU issues in production.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Nebula Graph Architecture, Deployment Strategies, and Performance Optimization at Ctrip

Patrick Yu, a Ctrip Cloud‑Native R&D expert, introduces the background of increasing data complexity and the growing demand for graph technologies, which led Ctrip to evaluate Neo4j, JanusGraph and finally select the open‑source Nebula Graph for large‑scale deployment.

Nebula Graph was chosen because its open‑source version supports horizontal scaling, uses a native storage engine that outperforms third‑party storage solutions, is compatible with Cypher, has an active community, and offers clean code with low technical debt.

The system follows a compute‑storage separation architecture composed of three services: graphd (computing), metad (metadata), and storaged (graph data). Three deployment modes are provided:

Three‑data‑center deployment for high availability, tolerating a whole‑site failure.

Single‑data‑center deployment for non‑critical services, avoiding cross‑site latency.

Blue‑green (dual‑active) deployment, combining read‑write separation and traffic‑splitting to meet high‑performance core‑service requirements.

Operations are managed via a Kubernetes CRD and Operator, with side‑car monitoring that collects core metrics and forwards them to Ctrip's Hickwall system. An internal domain‑name allocation mechanism ensures stable internal routing without IP changes.

Client‑side improvements include a robust Session Pool that serialises session creation, reuses sessions safely, and pre‑warms a configurable number of sessions to avoid spikes during startup.

Blue‑green deployment is extended with read‑write separation: reads can be served by both clusters, while writes target only one, and traffic is allocated proportionally using custom routing logic instead of Virtual IP.

For query convenience, Nebula Graph supports both openCypher and its native nGQL. A procedural builder API was added, for example:

Builder.match()
    .vertex("v")
    .hasTag("user")
    .property("name", "XXX", DataType.String())
    .edge("e", Direction.OUTGOING)
    .type("follow")
    .type("serve")
    .vertex("v2")
    .ret("v2", "Friends");

System‑level tuning cases are presented:

Session metadata overflow caused by missing session_idle_timeout_secs and too‑frequent session_reclaim_interval_secs syncs; fixing the timeout and extending the reclaim interval reduced WAL traffic.

High CPU on storaged due to lock contention in RocksDB block cache; solutions included increasing rocksdb_block_cache , enabling prefix filtering, disabling auto‑compactions, enlarging write_buffer_size , and limiting background compactions.

Lock contention from dense vertices; attempts such as data balance, compaction, cache bucket tuning, and switching to ClockCache were evaluated, with ClockCache ultimately deemed unsuitable.

Thread‑pool reduction (lowering num_io_threads , num_worker_threads , reader_handlers ) stabilized CPU usage.

Disabling block cache and related index caching, and setting max_open_files = -1 eliminated lock overhead, keeping CPU below 30% even under doubled traffic.

Additional operational enhancements include commands for moving shards and transferring leaders, and a roadmap that plans integration with Ctrip's big‑data platform, slow‑log analysis, parameterised queries, and richer visualisation features.

Kubernetesgraph databasePerformance TuningDistributed Databasesession managementNebula Graph
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.