High‑Performance Distributed Storage: Ceph vs Alibaba Pangu 2.0 vs XSKY INFINI
This article compares three high‑performance distributed storage systems—Ceph, Alibaba's Pangu 2.0, and XSKY INFINI—examining their architectures, key technologies such as RTC thread models, append‑only writes, kernel‑bypass, RDMA, data compression, and metadata management to reveal how they exploit modern flash hardware.
1. Ceph
Ceph builds on the RADOS object store and offers block, file, and object services. The three storage services are layered on top of RADOS, whose performance determines overall system speed.
(1) Block Service
Clients use librbd or the kernel rbd driver to access block devices; the block service performance hinges on RADOS.
(2) File System Service
CephFS provides a POSIX‑compatible distributed file system with strong consistency, managed by a centralized MDS that caches metadata in memory to avoid bottlenecks. Metadata is persisted in RADOS, and large numbers of inodes consume significant RAM.
(3) Object Service
Object storage is a thin layer over RADOS that implements S3‑compatible APIs; the gateway is not a performance bottleneck.
(4) RADOS Storage Engine
RADOS (Reliable Autonomic Distributed Object Store) uses the CRUSH algorithm for data placement, eliminating central metadata services. Clients can compute object locations locally, reducing request latency.
2. Pangu
Pangu is Alibaba's storage foundation. Version 2.0 consists of clients, a Master service using Raft for metadata, and ChunkServers that store data chunks. The system exposes an append‑only FlatLogFile interface.
(1) High‑Performance Techniques
Pangu 1.0 used a kernel file system (ext4) and TCP.
Pangu 2.0 adopts a user‑space storage file system (USSFS) to bypass the kernel, reducing context‑switch overhead.
It provides an append‑only write path to exploit SSD sequential write performance.
(2) CPU Optimizations
ChunkServer runs in user space (USSOS) with a run‑to‑completion thread model.
USSOS builds on SPDK/DPDK, enabling zero‑copy RDMA transfers.
Clients perform erasure coding (EC) and compression to cut network and storage usage.
Hardware offload (FPGA) handles compression, reducing CPU load.
3. XSKY INFINI
XSKY’s “Star‑Fly” all‑flash product follows a similar architecture to Pangu, using ChunkServers and an append‑only interface.
Key Design Highlights
NVMe‑oF shared‑everything storage fabric for direct SSD access.
End‑to‑end NVMe over Fabrics with polling mode and NUMA binding, achieving ~100 µs latency.
Append‑only writes aligned to SSD page size, reducing write amplification.
Intel QAT offloads compression/decompression, moving the workload from CPU.
4. Core Technologies Across Systems
(1) RTC Thread Model
All three systems adopt a run‑to‑completion model (Ceph’s SeaStore uses Seastar, Pangu 2.0’s USSOS, XSKY leverages NUMA‑bound threads) to eliminate costly context switches.
(2) Append‑Only Writes
Originally from LSM‑tree designs for HDDs, append‑only semantics now boost SSD performance by aligning writes and reducing latency.
(3) Kernel‑Bypass (User‑Space Stacks)
RDMA, SPDK, and DPDK move data paths to user space, achieving zero‑copy transfers and lower CPU overhead.
(4) RDMA Networking
All three platforms support RDMA/InfiniBand for high‑throughput, low‑latency communication.
(5) Data Compression & Erasure Coding
Compression is offloaded to dedicated hardware (FPGA in Pangu, Intel QAT in XSKY); EC is used to improve storage efficiency while maintaining performance.
5. Metadata Management
Ceph relies on the CRUSH algorithm, eliminating a central metadata service, whereas Pangu and XSKY use centralized metadata servers, offering simpler scaling but adding an extra RTT hop.
6. Conclusion
For small‑to‑medium teams with limited resources, Ceph remains the most accessible open‑source option. Large enterprises may prefer Alibaba’s Pangu or XSKY’s commercial solutions for their advanced user‑space optimizations and hardware offloads.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.