Backend Development 9 min read

Debugging a Memory Leak in Baidu's Bigpipe Broker Using GDB and pmap

This article describes how Baidu's QA team identified and resolved a memory‑leak issue in the Bigpipe Broker backend by analyzing a live process with pmap and GDB, extracting reference‑count problems, and fixing the atomic_add misuse that caused the leak.

Baidu Intelligent Testing
Baidu Intelligent Testing
Baidu Intelligent Testing
Debugging a Memory Leak in Baidu's Bigpipe Broker Using GDB and pmap

In Baidu's Quality Department, QA’s core skill is discovering, locating, and ensuring proper fixing of bugs; this case study collects several bug‑discovery and analysis examples, focusing on a memory‑leak problem in the Bigpipe Broker backend server.

Problem description : Bigpipe is Baidu’s internal distributed transmission system. Its Broker module uses an asynchronous framework and heavy reference‑counting to manage object lifetimes. During stress testing, the Broker’s memory usage grew steadily, indicating a leak.

Preliminary analysis : A recent monitoring feature adds reference‑count increments for each parameter read and decrements after use. Several parameter objects exist, and the team needed to pinpoint which one leaked.

Code & business analysis : Using Valgrind was considered, but reproducing the exact leak was difficult and Valgrind often missed leaks hidden in containers. Therefore, the team decided to leverage the still‑running leaking process with GDB, despite the risk that attaching would pause the process and cause the Broker to exit.

Proposed solution : Use GDB to print memory information and infer the leak location.

Step 1 : Run pmap -x {PID} to view memory layout (e.g., pmap -x 24671 ).

Step 2 : Start GDB and attach to the process: gdb ./bin/broker then attach 24671 .

Step 3 : Enable logging with set height 0 and set logging on ; logs are saved to gdb.txt .

Step 4 : Dump a memory region with x/18497024a 0x000000000109d000 (the anon heap region). Commands can be stored in command.txt and executed via source command.txt .

Step 5 : Analyze gdb.txt . The first column shows the address, the next columns show the stored values and symbol information. For example, address 0x22c2f00 contains a virtual function table pointer (vptr) for bigpipe::BigpipeDIEngine , whose vtable is at 0x10200d0 and the destructor at 0x53e2c6 .

By counting occurrences of symbols (e.g., using cat gdb.txt | grep "<" | awk -F '<' '{print $2}' | awk -F '>' '{print $1}' | sort | uniq -c | sort -rn > result.txt and filtering with cat result.txt | grep -P "bmq|Bigpipe|bigpipe|bmeta" | grep "_ZTV" > result2.txt ), the most frequent project‑related object was identified as CConnect .

Root cause : Inspection of the atomic_add implementation revealed it returns the value *before* the increment, but callers assumed it returned the value *after* the increment. This misunderstanding caused reference counts to stay non‑zero, preventing _free from being called and leading to the leak. The issue is similar to the difference between __sync_fetch_and_add and __sync_add_and_fetch .

Solution : Modify the code to use the correct atomic operation (see the image below for the corrected implementation).

Summary :

Locating leaks in asynchronous frameworks requires combining logs, GDB, and pmap.

Valgrind is not the only tool and has limitations.

Function names should clearly reflect their behavior to avoid misuse.

Always read library documentation to understand usage.

The presented method works when a leaking process is still running and leaves identifiable symbols in memory; it may not work if the leak leaves no trace.

BackenddebuggingCmemory-leakReference CountingGDB
Baidu Intelligent Testing
Written by

Baidu Intelligent Testing

Welcome to follow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.