Why Did My Hadoop Node’s Memory Spike at 3 AM? A Step‑by‑Step Debug Guide
This article details a systematic investigation of a Hadoop NameNode/DataNode that showed high memory usage at 3 AM, identifies zombie crond/sendmail/postdrop processes caused by a failed Postfix service, and provides cleanup commands and preventive measures for memory, disk, and inode issues.
2. Investigation Process
Root access was used to fully diagnose the issue.
2.1 Check component startup status
The listed processes belong to Hadoop, Yarn, and Zookeeper components; no abnormal Java processes were observed.
2.2 Examine resource consumption of Hadoop processes
Resource usage for processes under the hadoop user appeared normal, with no high‑consumption entries.
2.3 Investigate abnormal processes
Using
toprevealed many
crondprocesses, each consuming minimal resources.
Command to view specific processes:
<code>ps -ef|grep pid</code>The
sendmailprocess is started by
crond, and
postdropis started by
sendmail. When Postfix is not running, these processes cannot exit, creating numerous zombie processes.
3. Resolution
Terminate the related
sendmailand
postdropprocesses:
<code>ps -ef | egrep "sendmail|postdrop" | grep -v grep | awk '{print $2}' | xargs kill</code>To prevent recurrence, disable crond email notifications by setting
MAILTO=""in
/etc/crontaband
/etc/cron.d/0hourlyor adding
MAILTO=""as the first line in
crontab -e.
4. Outcome
Memory usage returned to normal levels.
5. Derived Issues
5.1 System disk full
Running
du -h --max-depth=1on the root directory timed out due to a massive
/var/log/maillogfile (32 GB). The log was cleared to free space.
5.2 Inode exhaustion on system disk
Although disk usage decreased, inode usage remained high because
/var/spool/postfix/maildrop/contained over 640 k small files generated by failed sendmail services.
Cleanup command:
<code>ls -f | xargs -n 1 rm -rf</code>After removal, inode metrics returned to normal.
6. Optimization Recommendations
(1) Disable crond email notifications by setting
MAILTO=""in relevant cron configuration files.
(2) Add disk usage monitoring and alerts.
(3) For systems that generate many files (e.g., databases, caches, file storage), implement inode usage monitoring and alerts.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.