TCP Out‑of‑Memory on AWS EC2: Diagnosis and Kernel Parameter Tuning
An AWS EC2 instance behind an Elastic Load Balancer became unresponsive due to repeated TCP out‑of‑memory errors, which were resolved by examining kernel messages, adjusting tcp_mem‑related kernel parameters, and rebooting the server after a long uptime.
In a production environment several EC2 instances run a Java 8/Tomcat 8 application behind an Elastic Load Balancer; one instance suddenly stopped responding while the others continued to handle traffic, returning a proxy error page that reported “Proxy Error – The proxy server received an invalid response from an upstream server – Reason: Error reading from remote server”.
APM monitoring showed normal CPU and memory usage but no traffic reaching the problematic instance, and standard diagnostics (vmstat, iostat, netstat, top, df) revealed nothing abnormal. Restarting Tomcat also failed to restore responsiveness.
Running dmesg on the instance displayed many lines like [4486500.513856] TCP: out of memory -- consider tuning tcp_mem indicating that the shortage originated in the TCP layer rather than the application code.
Web searches for the exact error yielded almost no useful results; the issue was temporarily fixed by rebooting the EC2 instance, which had been running for over 70 days, suggesting that long‑term operation exhausted TCP memory resources.
A colleague recommended checking several kernel parameters: net.core.netdev_max_backlog , net.core.rmem_max , net.core.wmem_max , net.ipv4.tcp_max_syn_backlog , net.ipv4.tcp_rmem , and net.ipv4.tcp_wmem . The current values were:
net.core.netdev_max_backlog = 1000
net.core.rmem_max = 212992
net.core.wmem_max = 212992
net.ipv4.tcp_max_syn_backlog = 256
net.ipv4.tcp_rmem = 4096 87380 6291456
net.ipv4.tcp_wmem = 4096 20480 4194304Increasing these limits to the following values resolved the problem:
net.core.netdev_max_backlog = 30000
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 87380 67108864Key takeaways:
Modern APM tools have limits: they may not surface low‑level kernel memory issues.
dmesg is valuable: kernel logs can reveal the root cause when applications appear hung.
Memory problems can occur outside the application layer: TCP and kernel buffers may exhaust, requiring tuning of sysctl parameters.
Cognitive Technology Team
Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.