Backend Development 6 min read

Why Your Golang Service Misses System DNS Cache and How to Fix It

This article explains why a Golang service running on AWS EC2 failed to use the system‑level DNS cache provided by nscd, causing excessive DNS queries that triggered request timeouts, and describes the investigation and optimization steps that resolved the issue.

37 Interactive Technology Team
37 Interactive Technology Team
37 Interactive Technology Team
Why Your Golang Service Misses System DNS Cache and How to Fix It

Introduction

The Golang service deployed on an AWS EC2 instance did not benefit from the system‑level DNS cache (nscd), leading to an abnormally high DNS query rate and request timeouts.

Background

In a real‑world scenario, Business A's EC2 instance pushes data to Business B's EC2 via a domain name that resolves to a load balancer. During peak traffic, the Golang client occasionally reports request timeouts.

Investigation Scope

Requests from Business A to Business B timed out before reaching the load balancer, indicating the problem originated on Business A's EC2 server.

Abnormal Metrics

Server‑side CPU, memory, network, and bandwidth appeared normal, but the linklocal_allowance_exceeded metric was over the limit, reflecting excessive request metadata or DNS resolution attempts (requests + DNS > 1024 /s).

Root Cause Confirmation

Network captures showed that the server’s nscd service cached DNS, yet the Golang process bypassed the system resolver and performed its own DNS lookups, which were not limited. Further captures revealed DNS queries exceeding 1024 /s, confirming DNS‑rate throttling as the root cause.

Problem Analysis

Two issues were identified:

Why the nscd DNS cache was ineffective.

Why Golang’s default connection reuse was not applied.

Why DNS Cache Was Ineffective

Golang bypasses the system resolver and reads /etc/resolv.conf directly, implementing its own DNS lookup logic, so the nscd cache is never used.

Why Connection Reuse Was Disabled

The Go HTTP transport’s DisableKeepAlives flag (default false) controls connection reuse. The business code set DisableKeepAlives=true , disabling reuse.

Solution

Two possible fixes were considered:

Force Golang to use the system DNS resolver by setting the environment variable GODEBUG=netdns=cgo (non‑standard and risky).

Enable connection reuse by setting DisableKeepAlives=false or removing the flag (preferred).

Effect

After enabling connection reuse, DNS query volume dropped by about 90%, the linklocal_allowance_exceeded metric stayed within limits, and request timeouts disappeared.

Appendix

References:

Monitoring linklocal_allowance_exceeded : https://docs.aws.amazon.com/zh_cn/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html

AWS DNS rate‑limit documentation: https://docs.aws.amazon.com/zh_cn/vpc/latest/userguide/AmazonDNS-concepts.html

Golang DNS client implementation: https://go.dev/src/net/dnsclient_unix.go

Golang DisableKeepAlives parameter: https://pkg.go.dev/net/http#Transport.DisableKeepAlives

Diagram showing DNS query flow and bottleneck
Diagram showing DNS query flow and bottleneck
golangAWSDNSbackend performanceconnection reusenscd
37 Interactive Technology Team
Written by

37 Interactive Technology Team

37 Interactive Technology Center

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.