Debugging Rare Core Dumps and Memory Leaks in High‑Concurrency Nginx with OpenSSL
The article describes a real‑world investigation of extremely low‑probability core dumps and memory leaks in a heavily modified Nginx + OpenSSL stack, detailing the debugging workflow, custom high‑concurrency test harness, use of tools such as GDB, perf, Valgrind, AddressSanitizer, and the performance‑hotspot analysis that ultimately resolved the issues.
The author recounts encountering a rare core‑dump bug in a custom‑modified Nginx that uses OpenSSL for HTTPS handshakes, where a null‑pointer dereference occurs under extreme concurrency (tens of thousands of QPS), making the bug hard to reproduce.
Initial attempts with gdb and debug logs proved ineffective because the asynchronous event‑driven architecture obscured the call stack, and enabling full DEBUG logging dramatically reduced performance.
To improve observability, the author built a high‑concurrency pressure‑testing framework using wrk ( wrk -t500 -c2000 -d30s https://127.0.0.1:8443/index.html ) and a distributed controller to drive many client machines, allowing stable reproduction of core dumps.
Bug reproduction required constructing abnormal network conditions and request patterns, such as random TCP connection closures, premature SSL handshake termination, and malformed HTTPS payloads, to trigger the failure consistently.
After stabilizing the test environment, the root cause was identified: under massive load, a connection structure used for asynchronous proxy computation was recycled without proper non‑reusable handling, leading to NULL pointers in later events.
While fixing the core‑dump, a severe memory‑leak surfaced during high‑load tests. The author evaluated Valgrind (which greatly slows performance) and AddressSanitizer (enabled via --with-cc="clang" \ --with-cc-opt="-g -fPIC -fsanitize=address -fno-omit-frame-pointer" ), finding the latter suitable for large‑scale testing.
Performance‑hotspot analysis was performed with Linux tools such as perf , oprofile , gprof , and systemtap . Flame graphs generated from perf record -F 99 -p PID -g -- sleep 10 and subsequent processing highlighted RSA‑related functions consuming the majority of CPU time, guiding further optimizations.
The article concludes with reflections on the debugging mindset, emphasizing systematic testing, collaborative discussion, and the value of turning painful bugs into learning opportunities.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.