All Products
Search
Document Center

Container Service for Kubernetes:DNS resolution and caching policies

Last Updated:Nov 20, 2025

This topic describes the domain name resolution and caching policies for container pods in a Kubernetes cluster.

DNS resolution pipelines

The following figures show the domain name resolution pipelines for three types of application deployments.

Note

For information about terms such as timeout and attempts in the figures, see the Resolution policies and Caching policies sections.

  • Non-containerized applications run directly on ECS instances.

    Example: An application runs on an ECS instance.

    DNS解析链路1.png

  • Containerized applications run in Kubernetes. The pods use the ClusterFirst DNS policy.

    Example: An application runs in a Kubernetes container pod.

    DNS解析链路2.png

  • Containerized applications run in Kubernetes. The pods use NodeLocal DNSCache.

    Example: An application runs in a Kubernetes container pod, and NodeLocal DNSCache is deployed.

    DNS解析链路3.png

Resolution policies

Client Side

In most cases, applications resolve domain names using the interfaces provided by Glibc. The following table describes the configurable domain name resolution parameters in the /etc/resolv.conf file that Glibc uses.

Parameter

Description

Default value in Glibc

ECS

Pod with DNSPolicy set to ClusterFirst

Pod with DNSPolicy set to Default

Pod that uses NodeLocal DNSCache

Pod with DNSPolicy set to Default and that uses the host network

nameserver

The DNS server used to resolve domain names.

Empty

VPC DNS servers

CoreDNS ClusterIP

VPC DNS servers

  • NodeLocal DNSCache IP

  • CoreDNS ClusterIP

VPC DNS servers

search

For requests involving a domain name that is not a fully qualified domain name (FQDN), the domain name is appended with the search suffix to form an FQDN before the request is sent.

Empty

Empty

<ns>.svc.cluster.local svc.cluster.local cluster.local

Empty

<ns>.svc.cluster.local svc.cluster.local cluster.local

Empty

ndots:n

If the number of dots in a domain name string is greater than the ndots value, the domain name is considered an FQDN and is resolved directly. If the number of dots is less than the ndots value, the domain name is appended with the search suffix before the query.

1

1

5

1

3

1

timeout:n

The timeout period for a single domain name resolution request, in seconds.

5

2

5

5

1

2

attempts:n

The number of retries if a domain name resolution fails.

2

3

2

2

2

3

rotate

Queries DNS servers in a round-robin fashion.

shutdown

Enabled

Disabled

Disabled

Disabled

Enabled

single-request-reopen

If this option is enabled and two requests are sent using the same socket, the resolver closes the socket after sending the first request and opens a new socket before sending the second request.

shutdown

Enabled

Disabled

Disabled

shutdown

Enabled

The attempts parameter takes effect only in specific scenarios, such as when the server returns SERVFAIL, NOTIMP, or REFUSED, or when the server returns NOERROR but without a resolution result. For more information, see attempts parameter request details.

VPC DNS servers are the default DNS servers configured on ECS instances. Their IP addresses are 100.100.2.136 and 100.100.2.138. They are responsible for resolving domain names in PrivateZone and authoritative domain names.

The CoreDNS ClusterIP is the IP address of the kube-dns service provided by the default CoreDNS deployment in the kube-system namespace of a Kubernetes cluster. It is responsible for resolving internal service domain names and forwarding resolution requests for PrivateZone and authoritative domain names.

The NodeLocal DNSCache IP is 169.254.20.10. When the NodeLocal DNSCache component is deployed, it listens on this IP address on each node.

Note

For more information about resolv.conf configuration, see resolv.conf.

In some cases, the domain name resolution policy on the client side may differ from the preceding configurations:

  • If you use Alpine as the container image, its built-in Musl library replaces Glibc, which causes significant differences in resolution behavior. For example:

    • Alpine does not adhere to the single-request and single-request-reopen options in /etc/resolv.conf.

    • Alpine 3.3 and earlier versions do not support the `search` parameter or search domains, which prevents service discovery from working.

    • Concurrent requests to multiple DNS servers configured in /etc/resolv.conf cause NodeLocal DNSCache optimizations to become ineffective.

    • Using the same socket to concurrently request A and AAAA records triggers Conntrack source port conflicts on older kernel versions, which leads to packet loss.

    Note

    For more information about resolution behavior, see musl libc.

  • If you use programming languages such as Golang or NodeJS, the application might use the language's built-in domain name resolver, which also has significantly different behavior.

In-cluster DNS Servers

By default, the /etc/resolv.conf file of CoreDNS uses the ECS configuration. However, CoreDNS uses the built-in Forward plug-in to forward DNS requests.

NodeLocal DNSCache uses a built-in CoreDNS for DNS service forwarding. The configuration method is the same as for CoreDNS.

The following table describes the parameters that control the resolution policy of the Forward plug-in. For more information about the CoreDNS Forward plug-in, see Forward.

Parameter

Description

CoreDNS default value

NodeLocal DNSCache default value

prefer_udp

Preferably uses UDP to communicate with the upstream server.

Enabled

Disabled

force_tcp

Forcibly uses TCP to communicate with the upstream server.

Disabled

Enabled

max_fails

The number of consecutive failed health checks before an upstream server is considered unhealthy.

2

2

expire

Keeps the connection to the upstream server for 10 seconds.

10s

10s

policy

The policy for selecting an upstream server.

random

random

health_check

The health check interval.

0.5s

0.5s

max_concurrent

The maximum number of concurrent connections to the upstream server.

None

None

dial timeout

The timeout for connecting to the upstream server.

30s. The value dynamically decreases based on the actual time consumed.

30s. The value dynamically decreases based on the actual time consumed.

read timeout

The timeout for waiting for data from the upstream server.

2s

2s

Caching policies

Client Side

The caching policy on the client side varies depending on the container and application. The actual caching policy depends on your specific configuration.

In-cluster DNS Servers

Parameter

Description

CoreDNS community default configuration

NodeLocal DNSCache ACK default configuration

CoreDNS ACK default configuration

success Max TTL

The maximum time-to-live (TTL) for the cache of successful domain name resolution results.

3600s

30s

30s

success Min TTL

The minimum TTL for the cache of successful domain name resolution results.

5s

5s

5s

success Capacity

The number of successful domain name resolution results to cache.

9984

9984

9984

denial Max TTL

The maximum TTL for the cache of failed domain name resolution results.

1800s

5s

30s

denial Min TTL

The minimum TTL for the cache of failed domain name resolution results.

5s

5s

5s

denial Capacity

The number of failed domain name resolution results to cache.

9984

9984

9984

ServerError TTL

The TTL for resolution results when the upstream DNS server is abnormal.

5s

0s (The default is 5s for NodeLocal DNSCache Helm Chart versions earlier than 1.5.0)

0s (The default is 5s for CoreDNS versions earlier than 1.8.4.2)

serve_stale

Allows the use of expired local cache when the upstream DNS server cannot be connected.

shutdown

Enabled (Disabled by default for NodeLocal DNSCache Helm Chart versions earlier than 1.5.0)

Disabled by default for CoreDNS versions earlier than 1.12.1. Enabled by default for CoreDNS 1.12.1 and later.

Note

The actual TTL of the cache is determined by the TTL of the domain name resolution result, the maximum TTL, and the minimum TTL. The rules are as follows:

  • If the result TTL is greater than the Max TTL, the actual TTL is the Max TTL.

  • If the result TTL is less than the Min TTL, the actual TTL is the Min TTL.

  • If the result TTL is between the Min TTL and the Max TTL, the actual TTL is the result TTL.

Optimization suggestions

This section describes the resolution paths and parameter configurations in a Kubernetes cluster. You can modify the parameters by editing the Pod YAML, CoreDNS ConfigMap, or NodeLocal DNSCache ConfigMap. The following is an example.

When you set dnsPolicy:Default for a client pod, the VPC DNS server settings on the ECS instance are copied to the /etc/resolv.conf file in the container.

apiVersion: v1
kind: Pod
metadata:
  name: example
  namespace: default
spec:
  containers:
  - image: registry.cn-hangzhou.aliyuncs.com/example-ns/example:v1
    name: example
  # The dnsPolicy value in the Pod YAML is Default.
  dnsPolicy: Default

# The /etc/resolv.conf file in the container at this time.
# cat /etc/resolv.conf
nameserver 100.100.2.136
nameserver 100.100.2.138

Compared to an ECS instance, the container's configuration is missing the rotate single-request-reopen timeout:2 attempts:3 options. Occasional network jitter might cause domain name resolution to fail for your services. You can add these parameters to improve fault tolerance. Adjust the Pod YAML as follows:

apiVersion: v1
kind: Pod
metadata:
  name: example
  namespace: default
spec:
  containers:
  - image: registry.cn-hangzhou.aliyuncs.com/example-ns/example:v1
    name: example
  # The dnsPolicy value in the Pod YAML is Default.
  dnsPolicy: Default
  # Add the following fault tolerance configuration.
  dnsConfig:
    options:
    - name: timeout
      value: "2"
    - name: attempts
      value: "3"
    - name: rotate
    - name: single-request-reopen

# After modification, redeploy the Pod. The options parameter is added to /etc/resolv.conf in the container.
# cat /etc/resolv.conf
nameserver 100.100.2.136
nameserver 100.100.2.138
options rotate single-request-reopen timeout:2 attempts:3

Use serve_stale to ensure DNS stability

The serve_stale option is a specific implementation of the stale serving feature in the CoreDNS cache plug-in. When serve_stale is enabled, CoreDNS continues to serve expired entries from the cache if the upstream DNS server is inaccessible. This feature can improve the reliability of DNS resolution and prevent resolution failures caused by upstream DNS service jitter or occasional exceptions.

This configuration is enabled by default in unmanaged CoreDNS 1.12.1 and later. For more information about this feature, see RFC-8767.

Configuration format

serve_stale [DURATION] [REFRESH_MODE]

  • DURATION: The validity period for expired entries. The default value is 1 h. If a cached entry expires, reaches its validity period, and is still not updated, CoreDNS stops serving the entry.

  • REFRESH_MODE: The policy for serving expired entries:

    • verify: Before sending an expired entry to the client, verify whether the upstream DNS service is active. This method might increase the resolution latency for the client, but it can provide a new entry immediately if an update is detected.

    • immediate: Immediately send the expired entry to the client, and then verify whether the upstream DNS service is active. This provides an immediate response, but the update time may lag behind the upstream DNS service update.

Configuration example

The following configuration is used by default in unmanaged CoreDNS v1.12.1.2 and later.
cache 30 { 
  ... 
  serve_stale 30s verify
}
Important

Default configuration for unmanaged CoreDNS v1.12.1.1-4035d7a99-aliyun:

cache 30 { 
  ... 
  serve_stale 1h immediate
}

When you use the preceding default configuration, in some extreme scenarios (for example, when a client performs DNS resolution during the iterative update of a headless service), DNS might return an expired entry. If this situation occurs frequently, you can change the policy to verify as shown in the Configuration example.