Backend Development 8 min read

Cutting Go Memory Allocations by 100×: Profiling, Tracing, and Fixing Middleware

This article walks through generating load with Vegeta, using pprof and Go trace to pinpoint massive heap allocations caused by the chi compression middleware, and shows how upgrading the library and disabling the middleware reduced allocations by nearly a hundred‑fold while improving GC performance.

360 Zhihui Cloud Developer

Nov 26, 2019

Cutting Go Memory Allocations by 100×: Profiling, Tracing, and Fixing Middleware

In this post I share how I analyzed an open‑source Go project (Flipt) to find and fix a severe memory‑allocation issue, ultimately reducing allocations by almost 100× and cutting GC overhead.

1. Generate Load

To expose performance problems I needed realistic traffic, so I used the HTTP load‑testing tool Vegeta to generate a steady stream of POST requests to the /api/v1/evaluate endpoint.

echo 'POST http://localhost:8080/api/v1/evaluate' | vegeta attack -rate 1000 -duration 1m -body evaluate.json

This command sends 1,000 requests per second for one minute, providing enough load to observe heap behavior.

2. Measure

Go’s built‑in profiling tool pprof was used to capture heap profiles while the load ran. By adding the chi middleware middleware.Compress(gzip.DefaultCompression), the application exposed a large number of allocations.

pprof -http=localhost:9090 localhost:8080/debug/pprof/heap

Inspecting the heap profile showed that alloc_objects and alloc_space grew dramatically, tracing back to flate.NewWriter called from the compression middleware.

3. Fix

Commenting out the compression line eliminated the massive allocations:

// r.Use(middleware.Compress(gzip.DefaultCompression))

To capture more detailed execution data I lowered the request rate (to 100 rps) and recorded a Go trace:

echo 'POST http://localhost:8080/api/v1/evaluate' | vegeta attack -rate 100 -duration 2m -body evaluate.json

wget 'http://localhost:8080/debug/pprof/trace?seconds=60' -O profile/trace

go tool trace profile/trace

The trace confirmed that the compression middleware was the culprit.

4. Upgrade the Library

Checking the chi version ( github.com/go-chi/chi v3.3.4+incompatible) revealed that the compression middleware still created a new flate.Writer for each response. The maintainer’s recent PR introduced a sync.Pool for writers, reducing allocation overhead.

After pulling the latest changes and rebuilding, the memory‑allocation problem disappeared.

5. Result

Running the load test again showed a stable heap growth curve, far fewer GC cycles, and dramatically lower GC pause time.

Conclusion

Never assume popular open‑source libraries are fully optimized.

A tiny issue can cause massive performance regressions under load.

Using sync.Pool for reusable objects can cut allocations.

Load testing and profiling are essential for uncovering hidden inefficiencies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization pprof memory profiling Compression Middleware Vegeta

Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.