Operations 15 min read

Root Cause Analysis of CPU Sys Spikes and Memory Pressure in Linux Services

This article investigates two real‑world performance incidents—one caused by excessive disk I/O from a misconfigured Filebeat and another by kernel memory‑fragmentation bugs triggered by a trace feature—detailing observations, Linux diagnostic commands, analysis, and practical remediation steps.

Beijing SF i-TECH City Technology Team
Beijing SF i-TECH City Technology Team
Beijing SF i-TECH City Technology Team
Root Cause Analysis of CPU Sys Spikes and Memory Pressure in Linux Services

The author begins by noting that high CPU usage often stems from heavy user‑code instructions or frequent context switches, but the two incidents described involve distinct root causes: one related to disk I/O and the other to memory pressure.

Disk case: During a PHP service load test in October 2021, CPU busy and I/O saturation both reached 100 %. Using uptime to interpret load average, pidstat -d 1 and iotop identified the Filebeat process as the main I/O offender. Adjusting the configuration parameter filebeat.registry.flush: 2s and restarting the service restored normal operation.

Memory case: After enabling trace on an API gateway, intermittent spikes in CPU Sys appeared during peak traffic (9 – 11 am). Correlated metrics showed a drop in page cache and increased UDP packet loss. A perf record -e cpu-clock -ag -p [pid] -- sleep 30 sample revealed the kernel function isolate_free_pages_block consuming CPU cycles. The author explains the Linux buddy allocator, the meaning of free , buff/cache , and the three memory watermarks (min, low, high) derived from min_free_kbytes .

To mitigate the issue, the recommended fix is to raise min_free_kbytes (e.g., echo 1048576 > /proc/sys/vm/min_free_kbytes ) or manually drop caches with sync && echo 3 > /proc/sys/vm/drop_caches . Tests showed that setting the value to 1 GB or 2 GB eliminated the Sys spikes, while the default 66 MB reproduced them.

The article concludes with a summary of the tools used ( uptime , pidstat , iotop , perf , free ), a brief overview of kernel memory management, and practical advice for troubleshooting similar performance problems.

performanceOpsLinuxCPUmemoryperf
Beijing SF i-TECH City Technology Team
Written by

Beijing SF i-TECH City Technology Team

Official tech channel of Beijing SF i-TECH City. A publishing platform for technology innovation, practical implementation, and frontier tech exploration.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.