Operations 8 min read

Understanding DNS Resolver Caching Issues in Salt-Minion and glibc res_init() Behavior

The article explains how salt‑minion’s DNS resolution can become stale after a data‑center migration because each Linux process caches resolver settings in a private _res structure initialized by glibc’s res_init(), and it offers practical ways to force a refresh or redesign the lookup mechanism.

Hujiang Technology
Hujiang Technology
Hujiang Technology
Understanding DNS Resolver Caching Issues in Salt-Minion and glibc res_init() Behavior

In an operations career, encountering various failures is common, and solving problems becomes the core of the job; technical articles typically start from the symptom, analyze the cause, and then propose a solution.

The specific issue arose when developers reported that salt-minion behaved abnormally and could not resolve hostnames. Although /etc/resolv.conf pointed to the new data‑center DNS server, the process continued using the old cached DNS address.

Investigation showed that salt-minion does not use a third‑party DNS library; it relies on Python’s socket.getaddrinfo , which is a wrapper around glibc’s getaddrinfo . The first time a process calls getaddrinfo , glibc invokes res_init() , which reads /etc/resolv.conf and stores the configuration in a static _res structure.

Because glibc’s static data is duplicated for each process, every process gets its own independent copy of _res . The first DNS query in a process triggers res_init() once, after which the nameserver information remains fixed in that process’s _res copy.

Consequently, modifying /etc/resolv.conf while a process is running does not affect the process, since its private _res still holds the old nameserver configuration.

Documentation excerpts illustrate the behavior on different platforms:

BSD: The res_init() function initializes the __res_state structure by reading the "TCPIP.DATA" configuration file.

Linux: Traditional resolver interfaces such as res_init() and res_query() use static global state stored in _res , making them non‑thread‑safe; newer re‑entrant interfaces like res_ninit() accept a res_state argument.

In CentOS 6.5 with glibc‑2.12, res_init() calls __res_vinit() , which works with the res_state data structure that ultimately uses __res_state . The article includes several diagrams (shown below) to illustrate the internal flow.

Solutions based on the root cause include:

Directly invoke the low‑level resolver interface and periodically call res_init() to detect changes in /etc/resolv.conf (higher implementation cost).

Restart the user process so that the first DNS lookup after restart triggers res_init() (low implementation cost).

Implement a full DNS client within the process (high implementation cost).

Use unconventional methods to modify the process’s in‑memory _res structure (high implementation cost).

Extending the scenario, many services (Redis, Memcached, RabbitMQ, Codis, Zookeeper, databases, etc.) are configured with hostnames. When IP addresses change due to failures or migrations, updating hundreds of configuration files or restarting numerous applications becomes painful. The optimal approach is to use the low‑level resolver functions rather than language‑specific DNS lookups, ensuring that connection pools re‑execute res_init() before each reconnection, or to adopt a custom DNS SDK that can refresh the resolver state dynamically.

Future operations‑related articles will be published in the "Hujiang Technology Academy".

OperationsLinuxDNSglibcresolversalt-minion
Hujiang Technology
Written by

Hujiang Technology

We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.