Cloud Native 11 min read

LiteIO: Open‑Source High‑Performance Cloud‑Native Block Device Service

LiteIO is an open‑source, high‑performance, cloud‑native block device service that uses NVMe‑oF and SPDK to provide point‑to‑point storage pooling, enabling efficient FinOps, serverless scaling, hot upgrades, zero‑copy I/O, snapshots, and thin provisioning for databases and applications in Kubernetes.

AntTech
AntTech
AntTech
LiteIO: Open‑Source High‑Performance Cloud‑Native Block Device Service

LiteIO is a high‑performance, easily extensible cloud‑native block device service designed for hyper‑converged Kubernetes environments. Developed within Ant Group and now open‑sourced, it pools local disks or logical volumes and shares them over the network using a point‑to‑point architecture.

The design addresses FinOps challenges by improving storage utilization and reducing costs; it solves uneven utilization and poor scalability of traditional distributed storage, and avoids the overhead of multi‑replica schemes.

LiteIO adopts a decentralized design built on the SPDK data engine and the NVMe‑over‑Fabric (NVMe‑oF) protocol, connecting compute nodes directly to remote storage nodes. This yields near‑local disk performance while simplifying I/O paths.

Key technical features include:

High‑performance protocol: NVMe‑oF over TCP provides close‑to‑native SSD performance, outperforming iSCSI.

Simplified I/O chain: Single‑hop point‑to‑point access eliminates metadata servers and reduces latency.

Zero‑copy: Shared memory and DMA remapping remove redundant data copies, achieving sub‑microsecond latency.

Hot upgrade: Fork‑based target replacement enables seamless upgrades with <100 ms I/O disruption.

Hot migration: Volume data can be moved between targets without service interruption using multi‑round incremental copying.

Snapshots & expand volume: CSI integration supports LVM and SPDK engines for snapshot creation and online volume expansion.

Multi‑disk aggregation: Fragments are combined into logical volumes, offering flexible storage composition.

Thin provisioning: Over‑commitment of storage space is possible, with rapid migration when capacity runs low.

In production, LiteIO is deployed on tens of thousands of Ant Group servers, increasing overall storage utilization by about 25% while adding only ~2.1 µs of extra I/O latency. Its generic storage‑compute separation benefits databases, other compute workloads, and enables serverless scaling within Kubernetes.

The project invites community contributions and provides a public GitHub repository for further development.

cloud-nativekubernetesstorageSPDKLiteIONVMe-oF
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.