Big Data 14 min read

Ozone vs HDFS: Why Ozone Cannot Replace Hadoop’s Core Storage

In this article, senior Alibaba engineer Zheng Kai analyzes Ozone’s role in the Hadoop ecosystem, arguing that despite its usefulness, Ozone cannot solve Hadoop’s core challenges of complexity, cost, and performance, and that Hadoop must focus on storage innovation, compute‑storage separation, and cloud integration to stay relevant.

Big Data Technology Architecture

May 6, 2020

Ozone vs HDFS: Why Ozone Cannot Replace Hadoop’s Core Storage

Author Zheng Kai, a senior technical expert at Alibaba and founder of Apache Kerby, shares his perspective on the Hadoop ecosystem, particularly the emerging Ozone project, after years of optimizing Hadoop/Spark for large customers on Alibaba Cloud.

He acknowledges that Ozone is a valuable object‑storage system but argues that it cannot rescue Hadoop, which faces far larger challenges than Ozone can address.

Looking ahead, he predicts that while new projects like Spark and Flink will continue to evolve, Hadoop’s future depends on its core positioning as a big‑data platform, with storage and scheduling as its two pillars.

He explains Hadoop’s core storage role and notes that recent community work has focused on Ozone, an object‑storage system similar to AWS S3, introduced about five years ago.

He critiques Ozone’s relevance to Hadoop’s primary pain points—complex deployment, high cost, and performance—highlighting that Hadoop’s typical users are medium‑to‑large clusters where HDFS’s design still makes sense.

He describes the operational complexity of HDFS (multiple NameNodes, ZK services, JournalNodes, etc.) and how this complexity drives users toward alternatives like Spark and Databricks.

He discusses cost reduction via Erasure Coding (EC) and the historical attempts to integrate EC into HDFS, noting delays and community fragmentation that slowed adoption.

He points out that HDFS has struggled to adopt modern storage formats (Parquet, ORC, Arrow) and to support storage‑compute separation architectures such as Alluxio.

He argues that Hadoop’s ecosystem is becoming bloated with overlapping projects, and that the lack of unified leadership (e.g., the rivalry between Cloudera and Hortonworks) hampers coordinated progress.

Regarding Ozone, he concludes that it adds further complexity without substantially reducing cost or improving performance, and that most cloud providers already offer mature object‑storage services.

He recommends three focus areas for Hadoop: (1) strengthen big‑data storage solutions by accelerating EC and simplifying HDFS architecture; (2) support storage‑compute separation with caching layers like Alluxio and modern file formats; (3) embrace public‑cloud and cloud‑native technologies, integrating YARN with Kubernetes rather than building parallel object‑storage stacks.

He encourages community participation in emerging projects such as Ozone, Submarine, and JindoFS, emphasizing that open‑source contributions remain vital for Hadoop’s continued evolution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud HDFS Hadoop Ozone

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.