This topic describes the domain name resolution and caching policies for container pods in a Kubernetes cluster.
DNS resolution pipelines
The following figures show the domain name resolution pipelines for three types of application deployments.
For information about terms such as timeout and attempts in the figures, see the Resolution policies and Caching policies sections.
Non-containerized applications run directly on ECS instances.
Example: An application runs on an ECS instance.

Containerized applications run in Kubernetes. The pods use the ClusterFirst DNS policy.
Example: An application runs in a Kubernetes container pod.

Containerized applications run in Kubernetes. The pods use NodeLocal DNSCache.
Example: An application runs in a Kubernetes container pod, and NodeLocal DNSCache is deployed.

Resolution policies
Client Side
In most cases, applications resolve domain names using the interfaces provided by Glibc. The following table describes the configurable domain name resolution parameters in the /etc/resolv.conf file that Glibc uses.
Parameter | Description | Default value in Glibc | ECS | Pod with DNSPolicy set to ClusterFirst | Pod with DNSPolicy set to Default | Pod that uses NodeLocal DNSCache | Pod with DNSPolicy set to Default and that uses the host network |
| The DNS server used to resolve domain names. | Empty | VPC DNS servers② | CoreDNS ClusterIP③ | VPC DNS servers |
| VPC DNS servers |
| For requests involving a domain name that is not a fully qualified domain name (FQDN), the domain name is appended with the | Empty | Empty | <ns>.svc.cluster.local svc.cluster.local cluster.local | Empty | <ns>.svc.cluster.local svc.cluster.local cluster.local | Empty |
| If the number of dots in a domain name string is greater than the ndots value, the domain name is considered an FQDN and is resolved directly. If the number of dots is less than the ndots value, the domain name is appended with the search suffix before the query. | 1 | 1 | 5 | 1 | 3 | 1 |
| The timeout period for a single domain name resolution request, in seconds. | 5 | 2 | 5 | 5 | 1 | 2 |
| The number of retries if a domain name resolution fails. | 2 | 3 | 2 | 2 | 2 | 3 |
| Queries DNS servers in a round-robin fashion. | shutdown | Enabled | Disabled | Disabled | Disabled | Enabled |
| If this option is enabled and two requests are sent using the same socket, the resolver closes the socket after sending the first request and opens a new socket before sending the second request. | shutdown | Enabled | Disabled | Disabled | shutdown | Enabled |
①The attempts parameter takes effect only in specific scenarios, such as when the server returns SERVFAIL, NOTIMP, or REFUSED, or when the server returns NOERROR but without a resolution result. For more information, see attempts parameter request details.
②VPC DNS servers are the default DNS servers configured on ECS instances. Their IP addresses are 100.100.2.136 and 100.100.2.138. They are responsible for resolving domain names in PrivateZone and authoritative domain names.
③The CoreDNS ClusterIP is the IP address of the kube-dns service provided by the default CoreDNS deployment in the kube-system namespace of a Kubernetes cluster. It is responsible for resolving internal service domain names and forwarding resolution requests for PrivateZone and authoritative domain names.
④The NodeLocal DNSCache IP is 169.254.20.10. When the NodeLocal DNSCache component is deployed, it listens on this IP address on each node.
For more information about resolv.conf configuration, see resolv.conf.
In some cases, the domain name resolution policy on the client side may differ from the preceding configurations:
If you use Alpine as the container image, its built-in Musl library replaces Glibc, which causes significant differences in resolution behavior. For example:
Alpine does not adhere to the single-request and single-request-reopen options in /etc/resolv.conf.
Alpine 3.3 and earlier versions do not support the `search` parameter or search domains, which prevents service discovery from working.
Concurrent requests to multiple DNS servers configured in /etc/resolv.conf cause NodeLocal DNSCache optimizations to become ineffective.
Using the same socket to concurrently request A and AAAA records triggers Conntrack source port conflicts on older kernel versions, which leads to packet loss.
NoteFor more information about resolution behavior, see musl libc.
If you use programming languages such as Golang or NodeJS, the application might use the language's built-in domain name resolver, which also has significantly different behavior.
In-cluster DNS Servers
By default, the /etc/resolv.conf file of CoreDNS uses the ECS configuration. However, CoreDNS uses the built-in Forward plug-in to forward DNS requests.
NodeLocal DNSCache uses a built-in CoreDNS for DNS service forwarding. The configuration method is the same as for CoreDNS.
The following table describes the parameters that control the resolution policy of the Forward plug-in. For more information about the CoreDNS Forward plug-in, see Forward.
Parameter | Description | CoreDNS default value | NodeLocal DNSCache default value |
| Preferably uses UDP to communicate with the upstream server. | Enabled | Disabled |
| Forcibly uses TCP to communicate with the upstream server. | Disabled | Enabled |
| The number of consecutive failed health checks before an upstream server is considered unhealthy. | 2 | 2 |
| Keeps the connection to the upstream server for 10 seconds. | 10s | 10s |
| The policy for selecting an upstream server. | random | random |
| The health check interval. | 0.5s | 0.5s |
| The maximum number of concurrent connections to the upstream server. | None | None |
| The timeout for connecting to the upstream server. | 30s. The value dynamically decreases based on the actual time consumed. | 30s. The value dynamically decreases based on the actual time consumed. |
| The timeout for waiting for data from the upstream server. | 2s | 2s |
Caching policies
Client Side
The caching policy on the client side varies depending on the container and application. The actual caching policy depends on your specific configuration.
In-cluster DNS Servers
Parameter | Description | CoreDNS community default configuration | NodeLocal DNSCache ACK default configuration | CoreDNS ACK default configuration |
success Max TTL | The maximum time-to-live (TTL) for the cache of successful domain name resolution results. | 3600s | 30s | 30s |
success Min TTL | The minimum TTL for the cache of successful domain name resolution results. | 5s | 5s | 5s |
success Capacity | The number of successful domain name resolution results to cache. | 9984 | 9984 | 9984 |
denial Max TTL | The maximum TTL for the cache of failed domain name resolution results. | 1800s | 5s | 30s |
denial Min TTL | The minimum TTL for the cache of failed domain name resolution results. | 5s | 5s | 5s |
denial Capacity | The number of failed domain name resolution results to cache. | 9984 | 9984 | 9984 |
ServerError TTL | The TTL for resolution results when the upstream DNS server is abnormal. | 5s | 0s (The default is 5s for NodeLocal DNSCache Helm Chart versions earlier than 1.5.0) | 0s (The default is 5s for CoreDNS versions earlier than 1.8.4.2) |
serve_stale | Allows the use of expired local cache when the upstream DNS server cannot be connected. | shutdown | Enabled (Disabled by default for NodeLocal DNSCache Helm Chart versions earlier than 1.5.0) | Disabled by default for CoreDNS versions earlier than 1.12.1. Enabled by default for CoreDNS 1.12.1 and later. |
The actual TTL of the cache is determined by the TTL of the domain name resolution result, the maximum TTL, and the minimum TTL. The rules are as follows:
If the result TTL is greater than the Max TTL, the actual TTL is the Max TTL.
If the result TTL is less than the Min TTL, the actual TTL is the Min TTL.
If the result TTL is between the Min TTL and the Max TTL, the actual TTL is the result TTL.
Optimization suggestions
This section describes the resolution paths and parameter configurations in a Kubernetes cluster. You can modify the parameters by editing the Pod YAML, CoreDNS ConfigMap, or NodeLocal DNSCache ConfigMap. The following is an example.
When you set dnsPolicy:Default for a client pod, the VPC DNS server settings on the ECS instance are copied to the /etc/resolv.conf file in the container.
apiVersion: v1
kind: Pod
metadata:
name: example
namespace: default
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/example-ns/example:v1
name: example
# The dnsPolicy value in the Pod YAML is Default.
dnsPolicy: Default
# The /etc/resolv.conf file in the container at this time.
# cat /etc/resolv.conf
nameserver 100.100.2.136
nameserver 100.100.2.138Compared to an ECS instance, the container's configuration is missing the rotate single-request-reopen timeout:2 attempts:3 options. Occasional network jitter might cause domain name resolution to fail for your services. You can add these parameters to improve fault tolerance. Adjust the Pod YAML as follows:
apiVersion: v1
kind: Pod
metadata:
name: example
namespace: default
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/example-ns/example:v1
name: example
# The dnsPolicy value in the Pod YAML is Default.
dnsPolicy: Default
# Add the following fault tolerance configuration.
dnsConfig:
options:
- name: timeout
value: "2"
- name: attempts
value: "3"
- name: rotate
- name: single-request-reopen
# After modification, redeploy the Pod. The options parameter is added to /etc/resolv.conf in the container.
# cat /etc/resolv.conf
nameserver 100.100.2.136
nameserver 100.100.2.138
options rotate single-request-reopen timeout:2 attempts:3Use serve_stale to ensure DNS stability
The serve_stale option is a specific implementation of the stale serving feature in the CoreDNS cache plug-in. When serve_stale is enabled, CoreDNS continues to serve expired entries from the cache if the upstream DNS server is inaccessible. This feature can improve the reliability of DNS resolution and prevent resolution failures caused by upstream DNS service jitter or occasional exceptions.
This configuration is enabled by default in unmanaged CoreDNS 1.12.1 and later. For more information about this feature, see RFC-8767.
Configuration format
serve_stale [DURATION] [REFRESH_MODE]
DURATION: The validity period for expired entries. The default value is1 h. If a cached entry expires, reaches its validity period, and is still not updated, CoreDNS stops serving the entry.REFRESH_MODE: The policy for serving expired entries:verify: Before sending an expired entry to the client, verify whether the upstream DNS service is active. This method might increase the resolution latency for the client, but it can provide a new entry immediately if an update is detected.immediate: Immediately send the expired entry to the client, and then verify whether the upstream DNS service is active. This provides an immediate response, but the update time may lag behind the upstream DNS service update.
Configuration example
The following configuration is used by default in unmanaged CoreDNS v1.12.1.2 and later.
cache 30 {
...
serve_stale 30s verify
}Default configuration for unmanaged CoreDNS v1.12.1.1-4035d7a99-aliyun:
cache 30 {
...
serve_stale 1h immediate
}When you use the preceding default configuration, in some extreme scenarios (for example, when a client performs DNS resolution during the iterative update of a headless service), DNS might return an expired entry. If this situation occurs frequently, you can change the policy to verify as shown in the Configuration example.