Cloud Native 24 min read

Design and Implementation of a Next‑Generation Multi‑Protocol Unstructured Storage System for Machine Learning

This article presents the challenges of storing massive machine‑learning datasets, evaluates existing storage solutions, and details the design of OrangeFS—a cloud‑native, multi‑protocol, multi‑tenant unstructured storage system that integrates object and file interfaces, optimizes metadata services, supports hot upgrades, and provides robust scalability and reliability for AI workloads.

DataFunSummit
DataFunSummit
DataFunSummit
Design and Implementation of a Next‑Generation Multi‑Protocol Unstructured Storage System for Machine Learning

With the rapid growth of artificial‑intelligence technologies, Didi faces unprecedented storage demands for machine‑learning training data, requiring petabyte‑scale capacity, high‑throughput, low‑latency metadata services, and support for multiple protocols (POSIX, S3, HDFS).

The existing in‑house solutions (GIFT object storage, Ceph, HDFS, GlusterFS) each lacked one or more required features such as multi‑protocol access, atomic rename, or efficient random writes, prompting an exploration of combined approaches and third‑party options like JuiceFS.

OrangeFS was designed by synthesizing lessons from GIFT and open‑source projects (Ceph, CubeFS, JuiceFS, SeaweedFS). Its architecture includes:

A unified entry service that parses S3 and GIFT V2 protocols.

A metadata service (MDS) built on RDS with multi‑Raft, in‑memory transactions, and dynamic configuration.

A storage engine that stores data blocks (Blocks) in self‑developed BS engines or public‑cloud object stores, using a Chunk‑Blob‑Block hierarchy to enable high concurrency.

Separate VFS (POSIX) and PathFS (S3/HDFS) layers to provide seamless multi‑protocol file operations.

Key innovations include:

Optimized metadata latency through single‑RPC operations, queue‑based processing, TCP transport, and Raft batching.

Correctness guarantees via optimistic in‑memory transactions and serialized writes for high‑conflict operations.

Scalability achieved with follower reads, learner nodes, and dynamic load‑balancing.

Stability improvements such as snapshot‑based recovery, log‑compaction control, and busy/slow queues.

POSIX client features: high throughput, read/write decoupling via memory snapshots, and second‑level hot upgrades without service interruption.

Multi‑tenant isolation, TTL‑based automatic deletion, read‑only modes, and fine‑grained permission control.

A recycle‑bin mechanism that preserves deleted files for recovery while handling TTL expiration.

OrangeFS now supports tens of petabytes across dozens of teams, offering lossless multi‑protocol access, robust QoS, cloud‑native deployment via CSI, and seamless integration with both private and public cloud environments.

distributed systemscloud nativemachine learningmetadatastoragehigh performancemulti-protocol
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.