Databases 20 min read

KGraph: Design and Implementation of a High‑Performance Distributed Graph Database

This article summarizes the design, architecture, and performance optimizations of KGraph, Kuaishou’s high‑throughput distributed graph database, covering its graph model, storage engine built on PMem, the KRPC network framework, scalability, key challenges, and future directions.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
KGraph: Design and Implementation of a High‑Performance Distributed Graph Database

Based on the QCon 2021 Shanghai talk by Zhang Shihang, this article presents KGraph, Kuaishou’s distributed graph database, and its design motivations.

As data scales and recommendation algorithms become more complex, traditional databases struggle with multi‑hop queries; graph databases excel in such scenarios, e.g., a three‑hop query on a social graph with average out‑degree 300 touches ~30 million edges.

KGraph achieves single‑machine 20 million QPS and cluster‑level billions of queries per second by combining a high‑performance PMem‑based storage engine with the custom KRPC network framework.

Graph Database Overview

Graph databases store vertices and edges with properties, offering natural advantages for relationship‑heavy workloads such as social networks, recommendation systems, and knowledge graphs.

Typical use cases include:

Social networks – users as vertices, relationships as edges.

Recommendation systems – users and videos as vertices, likes/comments as edges.

Knowledge graphs – entities and their relations.

Graph databases have grown in popularity, as shown by DB‑Engines trends.

They can be native (e.g., Nebula, Neo4j) or built on existing KV/Table stores (e.g., JanusGraph, KGraph).

Query languages include Gremlin (Apache TinkerPop) and Cypher (Neo4j).

Data Model

KGraph uses a directed heterogeneous property graph: vertices represent entities (e.g., users, videos) and edges represent relations (e.g., likes). Each vertex has an uint64_t or string ID, a type, and a set of key‑value attributes; each edge stores source, destination, type, and attributes.

Usage

KGraph can be accessed via C++/Java SDKs, gRPC, or Gremlin Server.

Two Gremlin examples illustrate queries for outgoing neighbors of node 666 and for the top‑10 video creators followed by fans of a specific creator.

Overall Architecture

The system consists of three independent layers that can be scaled horizontally:

KV storage layer – high‑performance RPC (KRPC) + Intel PMem.

Graph logic layer – defines vertices/edges and provides CRUD APIs.

Graph compute layer – GraphServer (proxy + algorithms) and GremlinServer (Gremlin execution).

KV Storage Layer

Clusters contain Clients, a Master, and DataNodes. Data is sharded across DataNodes; the Master maintains node health, shard placement, and routing information.

High‑Performance Requirements

Design goals include single‑machine tens of millions of QPS, persistent storage to avoid long startup times, and linear scalability for both compute and storage.

PMem Overview

PMem (Intel Persistent Memory) offers DRAM‑like latency with capacities far beyond DRAM and speeds orders of magnitude faster than SSDs. It can operate in memory mode (as volatile memory) or App Direct mode (as a persistent device with XFS/DAX).

Storage Engine Construction

On top of PMem (mounted with XFS + DAX) KGraph uses LSM‑Tree and hash engines; a custom LRU + thread‑local cache mitigates hotspot reads. Write paths are designed to avoid contention, enabling DataNodes to sustain tens of millions of QPS.

PMem Challenges

During random‑write benchmarks, write latency halved after a period due to reduced XFS log‑commit contention. Setting recycle_log_file_num=0 eliminated the latency swing, confirming that pre‑allocating WAL space stabilizes performance.

High‑Performance RPC Framework (KRPC)

KRPC targets 40‑core machines delivering 40 million QPS. It minimizes thread contention, avoids shared pointers, reduces string copies, and uses thread‑local caches for frequently accessed metadata.

KRPC Design

Four‑layer architecture: network layer (asio), message layer (packet framing), logic layer (core RPC), and protocol layer (extensible for gRPC, Redis, HTTP).

KRPC Performance

Benchmarks show single‑machine 50 million QPS, < 100 µs latency, multi‑protocol support, and high stability (daily calls > 10 trillion).

KGraph Performance

KGraph delivers up to 20 million QPS per machine, sub‑millisecond latency, linear scalability, and supports super‑nodes. A 12‑node cluster handles 1.3 billion QPS with average 600 µs latency.

Cluster Scale

KGraph is deployed in dozens of clusters, serving hundreds of servers, handling tens of billions of QPS and over 10 trillion edges.

Key Problem Analysis

Edge encoding strategies include per‑edge keys, packed out‑edge lists, and B‑tree‑like KV partitioning for super‑nodes. Configurable thresholds allow dynamic switching between these schemes to balance write amplification and read efficiency.

Conclusion and Outlook

KGraph provides a directed heterogeneous property graph model with Gremlin support and super‑node handling. By leveraging extreme single‑machine performance via PMem and KRPC, it reduces operational costs while achieving massive throughput. Ongoing work focuses on further PMem engine optimizations and native graph indexing.

Author

Zhang Shihang – Kuaishou platform R&D, speaker at QCon 2021, responsible for KGraph reconstruction.

Graph Databasehigh performanceDistributed StorageKGraphKRPCPMem
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.