Why Is Airflow Draining CPU? A Step‑by‑Step Diagnosis and Fix
A high‑CPU anomaly on a Spark‑enabled machine was traced through application checks, network TIME_WAIT analysis, and Airflow inspection, leading to kernel tweaks and an Airflow configuration change that finally restored normal CPU usage.
1. Problem Phenomenon
Machine A runs Spark Master, Airflow, Hive, Sqoop and other heavy workloads, resulting in high memory and CPU usage. Over the past three days the CPU stayed above 95% for most of the day, especially after 18:00 when Spark tasks are few.
2. Investigation Process
2.1 Check Applications
At around 09:30 the CPU was high while five SparkSubmit tasks were running; no abnormal applications were found and no single app showed excessive CPU or memory consumption.
2.2 Check Network Connections
netstat revealed many TIME_WAIT connections, mainly to MySQL on hadoop11, exceeding 3,700 connections. The kernel parameters were adjusted in
/etc/sysctl.conf:
<code>net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1</code>After applying with
/sbin/sysctl -p, TCP connections normalized but CPU remained high, indicating the issue was not caused by network sockets.
2.3 Check Airflow
Machine A (hadoop16) connects to MySQL on hadoop11 only via Airflow. Airflow runs webserver, scheduler, master, and worker processes, using CeleryExecutor with a parallelism of 16.
(1) Confirm Airflow as the cause
Restarting Airflow temporarily drops CPU usage, which spikes again once Airflow starts, confirming a correlation but not solving the root problem.
(2) Research similar issues
References include a StackOverflow discussion and the Airflow documentation on
min_file_process_interval.
(3) Apply fix
The Airflow configuration
airflow.cfgwas updated:
<code>min_file_process_interval = 10</code>After restarting Airflow, CPU usage returned to normal and matched the new file‑scan interval setting.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.