Diagnosing and Solving Node.js Out‑of‑Memory (OOM) Issues in Production
This article walks through a real‑world Node.js out‑of‑memory crash, explains how to detect the OOM killer via system logs, uses Heap Profiler and process.memoryUsage() to pinpoint memory growth, and presents a practical fix by throttling file‑write operations and improving monitoring.
In the introduction the author, a backend engineer at Qunar, describes encountering an unexpected Node.js service termination caused by an out‑of‑memory (OOM) condition, which is common in Java or .NET but often hidden in Node due to process managers.
The first clue comes from a log line like qnc:pm2-messenger process exitwith code: 1 , prompting the author to add listeners for various process events. Sample code added includes:
process.on('uncaughtException', (err, origin) => {
console.error(err, origin);
});
function handle(signal) {
console.log(`Received ${signal}`);
}
process.on('SIGINT', handle);
process.on('SIGHUP', handle);
process.on('SIGBREAK', handle);
process.on('SIGTERM', handle);System logs from /var/log/messages reveal that the kernel invoked the OOM killer for the Node process, showing entries such as:
kernel: node invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
kernel: Out of memory: Kill process 12562 (node) score 786 or sacrifice childUnderstanding Linux's OOM scoring, the author proceeds to locate the memory leak by analysing Heap Profiler snapshots. By comparing snapshots taken at the start (18:35) and just before the kill (19:02), the biggest size delta appears in Node/SimpleWriteWrap , although the total heap remains a small fraction of system memory.
To get a clearer picture, the built‑in process.memoryUsage() API is used to record RSS, heapTotal, heapUsed, and external memory every minute. Example output:
{"rss":279068672,"heapTotal":64028672,"heapUsed":23900184,"external":4878125}The data shows RSS rising from ~270 MB to over 1 GB shortly before termination, while CPU usage stays high and memory usage climbs from 7 % to 54 %.
After confirming the OOM is caused by excessive resource consumption rather than a true leak, the author inspects the code and finds a line forkedProcess.send(xxObject) that triggers heavy file‑write activity in a child process. Throttling the write operation to run only once per minute dramatically reduces memory pressure, as shown by a rapid drop in RSS after the write pause.
Finally, the article offers operational recommendations: limit I/O frequency, regularly check kernel logs with dmesg or similar, and monitor oom_score via /proc/[pid]/oom_score for new services, adjusting oom_score_adj when necessary.
Through systematic log inspection, profiling, and targeted code changes, the Node.js service stabilises and avoids future OOM crashes.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.