Ozone: The Next‑Generation Distributed Storage System Aiming to Replace HDFS
This article explains how Apache Ozone, built on the HDDS layer, addresses the scalability, memory, and performance limitations of HDFS by splitting metadata services, using RocksDB, implementing fine‑grained locking, RAFT‑based HA, and offering rich APIs, while outlining current challenges and future roadmap.
Apache Ozone is presented as the successor to HDFS, aiming to become the next‑generation distributed object storage system within the Hadoop ecosystem. The article first reviews HDFS’s master/slave architecture, highlighting its reliance on a single NameNode that stores all metadata in memory, leading to scalability limits, high GC pressure, global locks, and slow startup for large clusters.
Ozone’s design mitigates these issues by separating metadata management into two services—OzoneManager (OM) for object metadata and StorageContainerManager (SCM) for container management—both backed by RocksDB, which removes the need for massive heap memory and enables unlimited metadata growth.
The paper describes the implementation of a NameNode on HDDS (Hadoop Distributed Data Store) that provides traditional file‑system semantics on top of Ozone’s object store, allowing HDFS‑compatible APIs (Spark, Presto, Hive, etc.) and S3 gateway access, while also supporting Kubernetes CSI and Goofys for mounting.
Key architectural advantages include: (1) splitting the NameNode into OM and SCM, (2) using RocksDB for persistent metadata, (3) introducing storage containers to reduce block‑level reporting overhead, and (4) employing a tiered metadata cache to keep hot data in memory and cold data in RocksDB.
The development roadmap is divided into stages: a base implementation of HDDSFS (HDFS‑like filesystem) with client and NameNode support; a performance‑focused stage adding fine‑grained locks inspired by Alluxio and Meituan to improve concurrency; and a final stage that adds KV‑based metadata, RAFT‑based HA with three‑node NameNode groups, and lock‑pool management to detect deadlocks.
Challenges discussed include missing append/truncate support, longer RPC paths due to split services, lack of folder metadata, limited replication options in Ratis, and the need for container balancers and disk reservation features. Ozone’s community activity, contributions from Tencent and other companies, and resources such as videos, tutorials, and open‑source tools (e.g., Goofuse, HCFSFuse) are also highlighted.
In conclusion, the authors assert that Ozone’s layered design, extensive community involvement, and phased development approach position it to eventually replace HDFS for large‑scale big‑data workloads, while emphasizing rigorous testing, CI/CD, and collaborative engineering practices.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.