Tuning a Go Service to Reach 200k QPS: GC Adjustment and UDP Optimizations
The article describes how a Go‑based high‑throughput service was tuned from 80k to over 200k QPS by enlarging the GC heap, reusing UDP connections with sync.Pool, reducing system‑call overhead, and applying several lightweight logging and discovery optimizations.
A colleague needed to launch a Go service originally designed for a few hundred thousand QPS, but scaling from 8 virtual CPUs (2.3 × 10⁴ QPS) to 40 virtual CPUs only yielded 8 × 10⁴ QPS, far below the target of 15‑20 × 10⁴ QPS comparable to a legacy C++ implementation.
By allocating a large unused byte slice (≈1 GB) at startup and increasing the GOGC value, the GC pressure dropped dramatically; QPS rose from 80 k to 140 k while memory usage grew to about 4 GB, and GC time became almost invisible in flame graphs.
The service also suffered heavy CPU usage from UDP client calls, which consumed more CPU than equivalent Redis TCP calls. Profiling showed that each UDP send involved many system calls (socket creation, fcntl, setsockopt, connect, epoll registration, write, close), and a new UDP socket was created for every packet.
To mitigate this, a reuse strategy was introduced: for each remote address a sync.Pool of UDP connections is kept in a sync.Map . When a request arrives, the appropriate pool provides a pre‑opened connection; after use the connection is returned to the pool, allowing the GC to reclaim unused file descriptors.
Potential concurrency issues such as packet reordering or loss were considered. Because UDP is connection‑less, using the same local port for multiple goroutines could cause cross‑talk between responses. The solution is to keep client behavior consistent with the server (either send‑only or send‑and‑receive) and avoid mixing both modes on the same host.
After applying the connection‑reuse pool, QPS increased from 140 k to 170 k, with the dial overhead largely eliminated. Additional minor optimizations—log reduction, metric aggregation, and improving client service‑discovery selection—pushed the throughput close to 200 k QPS.
The article also explains why Go’s UDP API includes a Dial operation: it performs a lightweight connect syscall that records the remote address, allowing subsequent Write calls without specifying the address each time. For unconnected UDP, WriteTo must bind and unbind the address on every send, incurring extra overhead.
In summary, a handful of parameter tweaks and a simple connection‑reuse pool saved significant resources and demonstrated that Go can handle very high QPS workloads when GC and networking are carefully tuned.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.