Databases 21 min read

Vivo Feature Storage Practice: Architecture, Design, and Future Directions Using Nebula Graph

Vivo’s feature‑storage platform, built on Nebula Graph’s Raft‑based, storage‑compute‑separated architecture and exposed via Redis‑compatible proxies, meets massive, low‑latency AI data demands while offering strong consistency, horizontal scalability, backup, active‑active replication, and a roadmap toward general‑purpose KV, cloud‑native integration, and advanced storage engines.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Vivo Feature Storage Practice: Architecture, Design, and Future Directions Using Nebula Graph

This article introduces Vivo's internal feature‑storage practice, its evolution, and future outlook, aiming to inspire further ideas.

1. Requirement Analysis

AI techniques are increasingly used inside Vivo, and feature data play a crucial role in offline training and online inference. The storage system must handle very large values, massive data volume, high concurrency, low‑latency reads/writes, point‑lookup access patterns, periodic bulk ingestion, and ease of use.

2. Potential Needs

Extend to a general‑purpose disk‑based KV to support various scenarios.

Reuse resources across different NoSQL/NewSQL databases (graph, time‑series, object storage, etc.).

Maintain strong maintainability by using a mainstream language and minimizing third‑party dependencies.

3. System "Iceberg"

The final decision is to build a Redis‑compatible service that internally provides extensive reliability guarantees.

4. Nebula Overview

Nebula Graph is a high‑performance, highly available, strongly consistent open‑source distributed graph database.

4.1 Storage‑Compute Separation Nebula separates stateful storage services from stateless compute services, exposing a simple KV interface while allowing compute to focus on user logic.

4.2 Strong Consistency (Raft) Nebula uses Raft for multi‑replica consistency, already passing Jepsen linearizability tests.

4.3 Scalability Horizontal scaling is achieved via a hash‑based multi‑Raft implementation with a built‑in balancer.

4.4 Maintainability Implemented in C++, matching Vivo's tech stack, with clean abstractions for multiple storage engines.

5. Nebula Raft Details

5.1 Leader Election & Terms Each Raft group progresses through consecutive terms with a single leader per term.

5.2 Log Replication & Compaction Writes are turned into operation logs stored in WAL files; logs are replicated to followers and compacted via configurable wal_ttl .

5.3 Snapshot Mechanism When adding members, Nebula snapshots the RocksDB state and streams it to new nodes.

5.4 Multi‑Raft Data is sharded across multiple Raft groups using a hash‑based approach, enabling horizontal scaling.

6. Feature Storage Platform

6.1 System Architecture Built on Nebula with added components: Redis Proxy, Redis‑Cluster Proxy, and platform services. Meta stores cluster metadata (routing, spaces) as a Raft group; Storage nodes hold data replicas; Graph provides API services; Redis and Redis‑Cluster proxies expose Redis protocols.

6.2 Performance Optimizations

Cluster tuning based on workload.

Integration of WiscKey/TitanDB to separate large values into BlobFiles, reducing write amplification.

TTL mechanism using Compaction Filter; custom handling for large values stored in BlobFiles.

7. Ease of Use

Redis‑protocol compatibility via a KV‑rocks based proxy.

Bulk import from Hive to KV.

Platform‑level operations: one‑click deployment, monitoring, health checks, and DBaaS integration.

8. Disaster Recovery

Cold backup via Nebula's snapshot mechanism.

Hot backup in two phases: incremental backup with possible data loss, then Learner‑based full backup ensuring both incremental and full data replication.

9. Cross‑Datacenter Active‑Active

Phase 1: simple dual‑active without conflict resolution.

Phase 2: CRDT (LWW Register) based conflict handling for eventual consistency.

10. Future Outlook

General‑purpose KV storage with higher reliability and lower cost.

Enhanced platform capabilities and automated correctness checks.

Improved scheduling, possible Region‑based sharding, and cloud‑native integration.

Hot‑cold data separation, more storage engines (e.g., pure‑memory, AEP), remote HDFS cold backup.

Adoption of SPDK for near‑doubling throughput.

Exploration of KV‑SSD to eliminate write amplification.

Support for graph, time‑series, and object‑metadata workloads.

Conclusion

The practice requires continuous resource coordination, requirement collection, and product iteration to broaden scenario coverage and achieve a virtuous cycle.

scalabilityDatabaseDistributed StorageRaftFeature StoreKVNebula Graph
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.