Using Go pprof for Online Performance Profiling: Case Studies and Lessons
The article demonstrates how Go’s built‑in pprof tools can be used for live performance profiling, walking through two real‑world cases—one where a malformed JSON request caused massive object allocation and CPU spikes, and another where per‑call self‑referencing structs leaked memory—while offering practical tips on input validation, allocation reduction, and GC monitoring.
This article introduces online performance problem diagnosis and optimization for developers, focusing on profiling as a powerful tool. Profiling collects runtime events and samples, enabling precise pinpointing of bottlenecks. The Go language’s built‑in runtime/pprof and net/http/pprof packages, together with visual tools, are used as examples.
What profiling is
Profiling (performance analysis) records CPU usage, memory consumption, thread states, and blocking information while a program runs. By examining these metrics, developers can locate the root cause of performance issues.
Go support for profiling
Go provides the runtime/pprof library and an HTTP endpoint for on‑the‑fly analysis. Adding a single import and starting an HTTP server is enough to expose profiling data:
import _ "net/http/pprof"
func main() {
go func() {
log.Println(http.ListenAndServe("0.0.0.0:8005", nil))
}()
// ... business logic
}After deployment, developers can fetch a 30‑second CPU profile with:
go tool pprof -http=:1234 http://your-prd-addr:8005/debug/pprof/profile?seconds=30The tool renders a flame graph that visualizes which functions consume CPU.
Case Study 1 – CPU usage spikes to 99%
Symptoms:
CPU idle drops to ~0% on three machines simultaneously.
Issue is intermittent and resolves after about two hours.
Steps taken:
Enabled pprof on the production service.
Collected a CPU profile when the problem re‑occurred.
The flame graph highlighted GetLeadCallRecordByLeadId as the dominant CPU consumer, especially its database calls. A deeper look revealed unusually high activity in runtime.gcBgMarkWorker , indicating a GC pressure caused by a massive number of short‑lived objects.
Further investigation showed that the endpoint /lp-api/v2/leadCallRecord/getLeadCallRecord was receiving a JSON string for leadId (should be an integer). The malformed request caused the SQL builder to treat the string as a parameter, pulling millions of rows and creating billions of objects.
[net/http.HandlerFunc.ServeHTTP/server.go:1947] _com_request_in||traceid=091d682895eda2fsdffsd0cbe3f9a95||spanid=297b2a9sdfsdfsdfb8bf739||hintCode=||hintContent=||method=GET||host=10.88.128.40:8000||uri=/lp-api/v2/leadCallRecord/getLeadCallRecord||params=leadId={"id":123123}||from=10.0.0.0||proto=HTTP/1.0Root cause: the backend function GetLeadCallRecord accepted leadId string without type validation, directly embedding the value into the SQL query.
func GetLeadCallRecord(leadId string, bizType int) ([]model.LeadCallRecords, error) {
sql := "SELECT record.* FROM lead_call_record AS record " +
"where record.lead_id = {{leadId}} and record.biz_type = {{bizType}}"
conditions := make(map[string]interface{}, 2)
conditions["leadId"] = leadId
conditions["bizType"] = bizType
cond, val, err := builder.NamedQuery(sql, conditions)
}Fix: enforce correct parameter types and validate inputs before building SQL.
Case Study 2 – Memory usage climbs to 90%+
Symptoms:
CPU remains low (idle >85%).
Memory grows rapidly from 2 GB to 15 GB within weeks.
Profiling the heap revealed that 92 % of live objects originated from event.GetInstance , each occupying only 16 bytes.
var (
firstActivationEventHandler FirstActivationEventHandler
firstOnlineEventHandler FirstOnlineEventHandler
)
func GetInstance(eventType string) Handler {
if eventType == FirstActivation {
firstActivationEventHandler.ChildHandler = firstActivationEventHandler
return firstActivationEventHandler
} else if eventType == FirstOnline {
firstOnlineEventHandler.ChildHandler = firstOnlineEventHandler
return firstOnlineEventHandler
}
// ... other cases omitted
return nil
}The function creates a self‑referencing struct on every call, causing a massive linked list of 16‑byte objects that the GC cannot reclaim.
func init() {
firstActivationEventHandler.ChildHandler = &firstActivationEventHandler
firstOnlineEventHandler.ChildHandler = &firstOnlineEventHandler
// ... omitted
}Underlying Go runtime details explain the 16‑byte size (a pointer to the type table and a data pointer):
type iface struct {
tab *itab
data unsafe.Pointer
}
type eface struct {
_type *_type
data unsafe.Pointer
}
func convT2I(tab *itab, elem unsafe.Pointer) (i iface) {
t := tab._type
if raceenabled {
raceReadObjectPC(t, elem, getcallerpc(), funcPC(convT2I))
}
if msanenabled {
msanread(elem, t.size)
}
x := mallocgc(t.size, t, true)
typedmemmove(t, x, elem)
i.tab = tab
i.data = x
return
}Fix: initialize the singleton handlers once (e.g., in init ) and avoid per‑call allocations.
Key Takeaways
When GC‑related functions dominate CPU, inspect object counts ( --inuse/alloc_objects ).
For CPU‑bound issues, focus on object allocation volume.
For memory‑bound issues, monitor allocated space ( --inuse/alloc_space ).
Always validate input types, especially for SQL parameters.
Prefer passing pointers to structs to reduce copying and heap pressure.
Avoid unnecessary cyclic references and per‑call initialization.
By leveraging Go’s pprof tooling, developers can systematically trace performance problems from high‑level symptoms down to concrete code defects.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.