How to Diagnose and Fix CoreDNS Timeout Issues in Kubernetes
This article explains why CoreDNS may experience DNS resolution timeouts in a Kubernetes cluster, how to analyze logs and timeout settings, locate upstream DNS problems, and apply practical solutions such as adjusting timeout values, switching upstream DNS servers, and deploying a local DNS service.
Problem Background
On a sunny day, a business container started failing DNS resolution frequently, with up to 10 failures per second.
Problem Analysis
1. Check CoreDNS logs for i/o timeout entries, which indicate that CoreDNS forwarded the request to the upstream DNS server (e.g., 100.x.x.1) but received no response.
2. Analyze CoreDNS timeout settings. In version 1.10.1, if the upstream DNS does not respond within 2 seconds, CoreDNS treats it as a timeout.
3. Locate the internal DNS (100.x.x.1) issue; the internal DNS did not return results within 2 seconds, and the upstream DNS does not guarantee SLA.
Solution Measures
1. Make the timeout configurable, rebuild the CoreDNS image, and set the timeout to 5 seconds.
2. Change the upstream DNS to Alibaba Cloud DNS by updating the forward address in the configuration file.
3. Deploy a local DNS service inside the cluster.
Local DNS receives queries first; if it has no cached record, it forwards to CoreDNS, reducing CoreDNS load.
Adjust the container's
resolve.confoption attempts so the libc library automatically retries failed DNS lookups.
Problem Summary
The root cause is the inability to resolve domain names within 2 seconds, which may be due to excessively long recursive DNS lookup times or packet loss on the CoreDNS network path.
Linux Ops Smart Journey
The operations journey never stops—pursuing excellence endlessly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.