Why Your Kubernetes Pod Can't Reach the Server: DNS Search Domain Pitfalls and Fixes
An agent service running in a Kubernetes pod appeared healthy but failed to receive heartbeats due to DNS resolution errors caused by an unintended 'HOST' search domain, leading to incorrect IP resolution; the article details the investigation, explains Kubernetes DNS mechanics, and shows how adjusting ndots or using fully qualified names resolves the issue.
1. Fault Phenomenon
We deployed an agent service to a Kubernetes cluster; the pod status is Running, but the server never receives heartbeat signals. Logs show many "tcp timeout" messages when trying to connect to a specific IP address.
2. Fault Investigation Process
Log analysis revealed numerous I/O timeout errors when connecting to the IP. The service only contacts the server, whose domain name is set via an environment variable. The server is reachable from the host and the node, so the issue is not the server itself.
Testing from inside the pod shows the server cannot be reached, even though
pingresolves a domain. The resolved IP is not the server's external IP, indicating a DNS resolution problem. Using
nslookup(after installing
dnsutilsor
bind-utils) shows an odd name with a trailing HOST suffix.
Inspecting
/etc/resolv.confreveals a search domain that includes HOST , causing every DNS query to append this suffix.
Any domain ending with HOST resolves to an unexpected IP because HOST is a top‑level domain that performs wildcard resolution.
3. Fault Cause Analysis
Understanding how Kubernetes resolves service names is essential. Inside a pod, DNS queries are sent to the cluster’s
kube-dns(or
coredns) service IP (e.g.,
10.68.0.2) as defined in
/etc/resolv.conf. The file typically contains:
<code>nameserver 10.68.0.2
search devops.svc.cluster.local. svc.cluster.local. cluster.local.
options ndots:5</code>For intra‑namespace service calls, a simple name like
bis expanded using the
searchlist, eventually forming
b.devops.svc.cluster.local. For external domains, the same
searchlist is applied unless the
ndotsthreshold is met.
When a domain has fewer than five dots, the resolver appends each
searchsuffix in turn, generating multiple DNS queries. This was demonstrated by capturing packets with
tcpdumpfor
baidu.comand for a short‑dot domain
a.b.c.d.com. The captures show three DNS lookups for the short‑dot case, while a long‑dot domain (≥5 dots) is queried directly as an absolute name.
<code>// Example of /etc/resolv.conf
nameserver 10.68.0.2
search devops.svc.cluster.local svc.cluster.local cluster.local
options ndots:5</code>Two optimization strategies are presented:
Optimization 1: Use Fully Qualified Domain Names
Appending a trailing dot (e.g.,
a.b.c.com.) forces the resolver to treat the name as absolute, bypassing the
searchlist and eliminating extra DNS queries.
<code>nslookup a.b.c.com.</code>Optimization 2: Adjust ndots Value
Changing
ndotsfrom the default 5 to a lower value (e.g., 2) reduces the number of times the
searchlist is applied for short domain names. This can be done per‑deployment via
dnsConfig:
<code>spec:
containers:
- name: srv-inner-proxy
image: xxx/devops/srv-inner-proxy
...
dnsConfig:
options:
- name: ndots
value: "2"
dnsPolicy: ClusterFirst</code>Kubernetes supports four DNS policies for pods:
None – no DNS configuration (used with custom
dnsConfig).
Default – lets kubelet decide, typically using the node’s
/etc/resolv.conf.
ClusterFirst – pods use the cluster’s DNS service first, falling back to the node’s DNS if needed.
ClusterFirstWithHostNet – for host‑networked pods, still use the cluster DNS.
4. Conclusion
By setting
dnsPolicyin the deployment and lowering
ndotsto 2, the pod’s DNS resolution bypasses the problematic HOST search domain and correctly resolves the server’s IP address. The case highlights the importance of understanding Kubernetes DNS internals when troubleshooting connectivity issues.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.