Big Data 15 min read

How to Adapt Hadoop for Domestic Big Data Requirements

The article analyzes Hadoop’s declining relevance, the dominance of CDH/HDP, security pressures from vulnerabilities, and outlines ten technical steps—including hardware adaptation, component selection, dependency resolution, compilation, Ambari integration, packaging, testing, and functional verification—required to create a domestic ARM‑based Hadoop distribution, which the authors have released as a free HDP 3.3.1 build.

Past Memory Big Data

Oct 29, 2022

How to Adapt Hadoop for Domestic Big Data Requirements

Hadoop Status and Market Landscape

Since Hadoop became an Apache top‑level project in January 2008, it has enjoyed more than a decade of development. However, with the rise of public‑cloud products, Hadoop is gradually “declining,” while many companies still maintain large private deployments.

According to a June 2019 survey by the China Academy of Information and Communications Technology, 39 domestic Hadoop platform vendors existed. Over 70% of them repackaged Cloudera’s CDH or Hortonworks’ HDP community editions, and 24% built on Apache sources. CDH/HDP effectively monopolised the market because they are free, and most enterprises lack the technical capacity to develop their own distributions.

CDH is Cloudera’s flagship product; HDP is Hortonworks’s. The two companies merged in January 2019, forming a new Cloudera that introduced CDP and discontinued community editions. From 31 January 2021, all Cloudera software requires a paid subscription, and CDH 6 / HDP 3 are the last enterprise releases, with EoS in March 2022, leaving users without new features or support.

Security Drivers and Domestic Push

The Apache Log4j 0‑day vulnerability highlighted the inability of free CDH/HDP versions to receive timely patches, forcing enterprises—especially state‑owned ones with strict security requirements—to seek version updates. Additionally, geopolitical factors such as U.S. export‑control “entity lists” increase the urgency for self‑reliant IT infrastructure and domestic alternatives.

Technical Roadmap for a Domestic Hadoop Distribution

Building a domestic Hadoop distribution involves ten major aspects:

Base hardware/software adaptation : Most existing CDP builds target x86_64 Intel CPUs. Domestic ARM (aarch64) CPUs like Kunpeng and Feiteng are supported only by Hadoop 3.3 series, and no ready‑made ARM‑based Hadoop distribution exists. Operating‑system adaptation (e.g., Red Flag, Kylin) is relatively straightforward because many domestic OSes are binary‑compatible with foreign ones.

Hadoop ecosystem component selection : Core components include Zookeeper, HDFS, YARN, MapReduce, Hive, Tez, Spark, Flink, HBase, Kafka, Ranger, Ambari, plus newer lake‑house components such as Ozone, Iceberg, Kyuubi. Nested dependencies (e.g., Solr, Phoenix) require deep technical knowledge.

Component dependency analysis : Integrating many components reveals version conflicts, especially for shared JARs like Log4j. Some JARs are upward/downward compatible, while others are not, leading to “dependency hell.”

Cross‑component JAR dependency resolution : Two approaches are possible—choose compatible component versions (avoid EOL releases) or modify source code to adapt mismatched libraries (e.g., adjusting Hive 3.1 to work with newer Zookeeper or Guava).

Front‑end framework adaptation : Many Hadoop UI components rely on Node.js, PhantomJS, etc. While ARM binaries exist for some tools, others must be compiled manually, adding extra effort.

Component compilation : Most components provide Docker, Maven, Ant, sbt, or Make build scripts. Successful compilation often depends on external network access for downloading dependencies.

Ambari assembly : Ambari remains the primary open‑source Hadoop cluster manager. Building Ambari for a new stack requires adjusting numerous JAR versions, re‑configuring component integrations, and handling missing support for newer components (e.g., Ozone, Kyuubi).

RPM/DEB packaging : Ambari’s package management expects RPM or DEB bundles. Options include using Apache Bigtop, Maven‑generated POMs, or rpmbuild spec files. Packaging must respect Ambari’s directory conventions, which differ from upstream Apache binaries.

Deployment and functional testing : After packaging, extensive regression testing is mandatory. Even components already integrated into Ambari (e.g., HDFS) can exhibit class‑loading issues on ARM. New components (Ozone, Kyuubi, Flink) often require additional validation and iterative fixes.

Our Implementation

After six months of effort, the HiDataPlus team released a free HDP 3.3.1 distribution for the aarch64 architecture (CentOS 7.9). The component versions are shown in the diagram below:

The distribution is available via Baidu Cloud:

Link: https://pan.baidu.com/s/1z_Yk-inzpZnOvtG8EHo_ow<br/>Extract code: wj68

Motivation and Call for Collaboration

We released the distribution for three reasons: personal fulfillment and contribution to domesticisation, increasing user feedback to improve the product, and seeking resource cooperation (hardware, OS, application scenarios) from partners.

We invite interested users to download the build, test it, and provide feedback through our WeChat public account (QR code shown in the original article). Companies with resources are also welcome to contact us for joint development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Open Source ARM Hadoop Ambari HDP CDH Domesticization

Written by

Past Memory Big Data

A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.