Cutting Go Memory Allocations by 100×: Profiling, Tracing, and Fixing Middleware
This article walks through generating load with Vegeta, using pprof and Go trace to pinpoint massive heap allocations caused by the chi compression middleware, and shows how upgrading the library and disabling the middleware reduced allocations by nearly a hundred‑fold while improving GC performance.
In this post I share how I analyzed an open‑source Go project (Flipt) to find and fix a severe memory‑allocation issue, ultimately reducing allocations by almost 100× and cutting GC overhead.
1. Generate Load
To expose performance problems I needed realistic traffic, so I used the HTTP load‑testing tool Vegeta to generate a steady stream of POST requests to the /api/v1/evaluate endpoint.
<code>echo 'POST http://localhost:8080/api/v1/evaluate' | vegeta attack -rate 1000 -duration 1m -body evaluate.json</code>This command sends 1,000 requests per second for one minute, providing enough load to observe heap behavior.
2. Measure
Go’s built‑in profiling tool pprof was used to capture heap profiles while the load ran. By adding the chi middleware middleware.Compress(gzip.DefaultCompression) , the application exposed a large number of allocations.
<code>pprof -http=localhost:9090 localhost:8080/debug/pprof/heap</code>Inspecting the heap profile showed that alloc_objects and alloc_space grew dramatically, tracing back to flate.NewWriter called from the compression middleware.
3. Fix
Commenting out the compression line eliminated the massive allocations:
<code>// r.Use(middleware.Compress(gzip.DefaultCompression))</code>To capture more detailed execution data I lowered the request rate (to 100 rps) and recorded a Go trace:
<code>echo 'POST http://localhost:8080/api/v1/evaluate' | vegeta attack -rate 100 -duration 2m -body evaluate.json</code> <code>wget 'http://localhost:8080/debug/pprof/trace?seconds=60' -O profile/trace</code> <code>go tool trace profile/trace</code>The trace confirmed that the compression middleware was the culprit.
4. Upgrade the Library
Checking the chi version ( github.com/go-chi/chi v3.3.4+incompatible ) revealed that the compression middleware still created a new flate.Writer for each response. The maintainer’s recent PR introduced a sync.Pool for writers, reducing allocation overhead.
After pulling the latest changes and rebuilding, the memory‑allocation problem disappeared.
5. Result
Running the load test again showed a stable heap growth curve, far fewer GC cycles, and dramatically lower GC pause time.
Conclusion
Never assume popular open‑source libraries are fully optimized.
A tiny issue can cause massive performance regressions under load.
Using sync.Pool for reusable objects can cut allocations.
Load testing and profiling are essential for uncovering hidden inefficiencies.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.