Why Did Multiple HDFS DataNodes Crash? Memory, GC, and Block Overload Explained
This article analyzes a midnight HDFS DataNode failure caused by excessive GC and OOM due to Spark batch jobs, examines how an unexpected surge in block count overloaded default memory settings, and presents concrete remediation steps and optimization recommendations to stabilize the cluster.
Problem Description
In the early hours, a DataNode (hadoop24) in an HDFS cluster went down, and four other DataNodes (hadoop1, hadoop5, hadoop21, hadoop22) appeared to be in a dead‑lock state: their processes were running but they failed to send heartbeats to the NameNode, which marked them offline.
Investigation Process
2.1 Direct Causes
(1) Log inspection of the failed DataNode showed frequent GC and OOM events just before the crash.
The initial hypothesis was that a Spark batch job running at night consumed excessive memory, affecting the DataNode. After restarting the DataNode around 10 am when resources were less constrained, the service recovered.
(2) About an hour after restart, the DataNode crashed again, this time without any Spark job running, indicating the issue stemmed from the DataNode’s own memory limits rather than resource sharing.
Further inspection of the HDFS Overview showed the total block count had surged to over 5.7 million (previous day 2 million), with a single DataNode holding more than 900 k blocks. This exceeded the default 1 GB memory allocation per DataNode, leading to the crashes.
Resolution: Increase NameNode and DataNode memory to 3 GB and reduce Spark Worker memory allocation on the same nodes, then restart the cluster.
2.2 Cause of Block Count Explosion
Comparing the original table with the cleaned‑data table for the same date, the block count jumped from 68 to 4 736. The Spark cleaning job read eight source tables and wrote each result into a single output table, creating many small files.
Solution: Union the eight RDDs, then perform a
repartition(32)before writing to HDFS, which normalized the block count per partition.
After this change, the total block count returned to normal levels.
Optimization Recommendations
(1) Add monitoring metrics for HDFS block count.
(2) Align NameNode JVM parameters with the expected number of filesystem objects (files + blocks):
File object count 10,000,000:
-Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512MFile object count 20,000,000:
-Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1GFile object count 50,000,000:
-Xms32G -Xmx32G -XX:NewSize=3G -XX:MaxNewSize=3GFile object count 100,000,000:
-Xms64G -Xmx64G -XX:NewSize=6G -XX:MaxNewSize=6G(3) Recommended JVM settings for a DataNode based on its average block count:
2,000,000 blocks:
-Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M5,000,000 blocks:
-Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G(4) When multiple tables write to the same result table in Spark, monitor the resulting block count and apply
repartitionif necessary before persisting.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.