Big Data 16 min read

HDFS Upgrade from 2.6.0‑cdh to 3.1.2 with DataNode Federation and Mixed Deployment

This article details the background, planning, step‑by‑step procedures, encountered issues, and rollback strategies for upgrading a Hadoop HDFS cluster from version 2.6.0‑cdh to 3.1.2, including mixed‑deployment of DataNodes across different federations and necessary configuration changes.

360 Smart Cloud

May 28, 2024

HDFS Upgrade from 2.6.0‑cdh to 3.1.2 with DataNode Federation and Mixed Deployment

1. Background Hadoop consists of HDFS, MapReduce, and YARN. Compared with Hadoop 2.x, Hadoop 3.x adds features such as erasure coding, multiple NameNode support, inter‑rack balancer, and Router‑Based Federation. Using these new HDFS 3.x features improves cluster stability and meets diverse business needs.

Historically, 360 has operated multiple HDFS versions, including CDH. To leverage Hadoop 3.x capabilities, the HDFS version was upgraded from 0.20 to Hadoop‑3.1.2, and a 2.6.0‑cdh cluster was further upgraded to Hadoop‑3.1.2. Additionally, two clusters located in the same data center were merged by mixing their DataNodes.

The article will cover two main aspects: (1) how to mix DataNodes from different HDFS versions and federations; (2) how to upgrade HDFS from 2.6.0‑cdh to 3.1.2, including changing the super‑admin user and moving the metadata directory.

2. Solution Selection The final result is a single cluster where the original 2.6.0‑cdh HDFS is upgraded to 3.1.2 and the DataNodes of both clusters are mixed.

Upgrade steps include:

a) Mix DataNodes of the two versions.

b) Gradually decommission 2.6.0‑cdh DataNodes, upgrade them, and re‑join the cluster.

c) Upgrade each federation sequentially, including JournalNode, NameNode, and zkfc components.

3. Implementation Details

Step 1 – DataNode Mixed Deployment Goal: make 3.1.2 DataNodes report blocks to both the old (2.6.0‑cdh) and new (3.1.2) NameNodes, using decommission to phase out the old DataNodes.

Challenges:

a) Different ClusterId values prevent 3.1.2 DataNode from starting. Solution: modify the source code to bypass the ClusterId check.

b) Block report protocol differences (protobuf field positions). Solution: change the protobuf definition in 3.1.2 to use optional uint64 fullBlockReportLeaseIdnew = 5 [default = 0], overriding the conflicting field.

Step 2 – NameNode Upgrade The upgrade involves changing the HDFS super‑user and metadata directory. Changing the super‑user requires a short service interruption (~10 s) because zkfc cannot kill a NameNode started by another user.

Key issues:

1) Switching super‑user from account A to B using setfacl -m u:B:rwx -R /data/dfs/nn and updating permissions.

2) Moving the metadata directory by editing hdfs-site.xml, moving existing data, and restarting services.

3) zkfc cannot perform failover when the active and standby nodes run under different Unix accounts; a manual failover after stopping the old NameNode is required.

Rollback Plan Two rollback methods are considered: Rollback and RollingDowngrade. Because layout versions may differ, a full downgrade may fail. The plan includes:

1) Back up all FsImage and EditLog files.

2) Before switching the active NameNode, keep the standby and JournalNode ready for immediate rollback.

3) After the new NameNode is up, if layout versions differ, stop services, convert EditLog layout version via custom code, copy the converted EditLog to JournalNode, and then roll back the NameNode.

Upgrade Procedure

1. Upgrade Order

Upgrade JournalNode, zkfc, and NameNode together because account changes affect zkfc behavior.

2. Preparation

a) Create the new B account and prepare deployment directories.

b) Download Java and Hadoop packages, extract them, and adjust parameters using the older configuration files.

3. Upgrade JournalNode

a) On each node, as root, grant B user access: setfacl -m u:B:rwx -R /data/dfs/jn b) Stop JournalNode processes, modify hdfs-site.xml to point to the new edit log location, move existing edit logs, and restart JournalNode.

4. Create Rollback Files

On the active NameNode (root), run: hdfs dfsadmin -rollingUpgrade prepare to generate a rollback‑ready FsImage.

Check status with:

hdfs dfsadmin -rollingUpgrade query

5. Upgrade ZKFC and NameNode

a) Stop ZKFC on standby and active nodes, switch to B user, and verify ZKFC can start with hdfs --daemon start zkfc.

b) Stop the standby NameNode, switch to B user, modify hdfs-site.xml for the new metadata path, move the data, and start the NameNode with hdfs --daemon start namenode -rollingUpgrade started.

c) After the new NameNode is up, stop the old active NameNode, then start ZKFC on the standby to complete failover.

6. Finish Rolling Upgrade

On the active NameNode (B user), finalize the upgrade: hdfs dfsadmin -rollingUpgrade finalize At this point the federation upgrade is successful.

4. Issues Encountered During Cluster Integration

Mixed DataNode deployment problems: different ClusterId values (resolved by disabling the check).

Protocol mismatches between 2.6.0‑cdh and 3.1.2 preventing block reports (resolved by modifying source code).

Compilation failures on CentOS 6 due to missing bzip2-devel and OpenSSL libraries; fixed by installing the required packages and setting OPENSSL_ROOT_DIR and OPENSSL_LIBRARIES.

After NameNode upgrade, deleted files did not free space on DataNodes because the rollingUpgrade flag remained true; temporarily fixed by restarting DataNodes and permanently by correcting the protocol field after DataNode upgrade.

JournalNode upgrade error:

Unknown protocol: org.apache.hadoop.HDFS.qjournal.protocol.InterQJournalProtocol

. This is a known issue (HDFS‑14942) that was downgraded to DEBUG level and does not affect the upgrade.

In summary, careful planning, thorough testing, and timely communication with the Hadoop community are essential to ensure a smooth HDFS upgrade while minimizing risks.

We hope this article provides useful references for your HDFS upgrade. Good luck!

If you are interested in our storage products, please visit the Zhihui Cloud website: Zhihui Cloud – Enterprise Digital Core Engine (360.cn) .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

bigdata HDFS Hadoop DataNode

Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.