All Products
Search
Document Center

Container Service for Kubernetes:DNS resolution and caching policies

Last Updated:Mar 26, 2026

This topic describes the DNS resolution workflows, client-side behaviors, and server-side caching policies in Alibaba Cloud Container Service for Kubernetes (ACK) clusters.

DNS resolution architectures

DNS resolution behavior in ACK depends on where the application runs and whether the NodeLocal DNSCache add-on is active.

For details about parameters such as timeout and attempts referenced in the diagrams, see Resolution policies and Caching policies.

Scenario 1: Host-based applications (non-containerized)

Applications running directly on Elastic Compute Service (ECS) instances use the host's /etc/resolv.conf, which points to the VPC DNS servers.

DNS解析链路1.png

Scenario 2: Standard containerized pods (dnsPolicy: ClusterFirst)

By default, pods use the ClusterFirst policy. All DNS queries go to the CoreDNS service within the cluster.

DNS解析链路2.png

Scenario 3: Pods with NodeLocal DNSCache enabled

When NodeLocal DNSCache is active, pods send queries to a local caching agent on the same node. This provides two benefits:

  • Reduced latency: DNS queries resolve locally, skipping the network hop to CoreDNS.

  • Conntrack table protection: Queries go to the local agent without creating new conntrack table entries, reducing conntrack races and preventing UDP DNS entries from exhausting conntrack tables.

DNS解析链路3.png

Resolution policies

Client side

The parameters below come from /etc/resolv.conf and are interpreted by the glibc resolver. The following is a representative configuration for a standard pod using ClusterFirst:

nameserver 10.x.x.x          # CoreDNS ClusterIP
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5 timeout:5 attempts:2

The following table lists the default values across all deployment environments.

Parameter

Description

Default value in glibc

ECS

Pod with DNSPolicy set to ClusterFirst

Pod with DNSPolicy set to Default

Pod that uses NodeLocal DNSCache

Pod with DNSPolicy set to Default and that uses the host network

nameserver

The DNS server used to resolve domain names.

None

VPC DNS servers

CoreDNS ClusterIP

VPC DNS servers

  • NodeLocal DNSCache IP

  • CoreDNS ClusterIP

VPC DNS servers

search

For requests involving a domain name that is not a fully qualified domain name (FQDN), the domain name is appended with the search suffix to form an FQDN before the request is sent.

None

None

<ns>.svc.cluster.local svc.cluster.local cluster.local

None

<ns>.svc.cluster.local svc.cluster.local cluster.local

None

ndots:n

If the number of dots in a domain name string is greater than the ndots value, the domain name is treated as an FQDN and resolved directly. Otherwise, the domain name is appended with the search suffix before the query.

1

1

5

1

3

1

timeout:n

The timeout for a single DNS resolution request. Unit: seconds.

5

2

5

5

1

2

attempts:n

The maximum number of retries if a DNS resolution fails.

2

3

2

2

2

3

rotate

Queries DNS servers in a round-robin manner.

Disabled

Enabled

Disabled

Disabled

Disabled

Enabled

single-request-reopen

If enabled and two requests are sent using the same socket, the resolver closes the socket after the first request and opens a new socket before the second request.

Disabled

Enabled

Disabled

Disabled

Disabled

Enabled

^①^ The attempts parameter takes effect only in specific scenarios: when the server returns SERVFAIL, NOTIMP, or REFUSED, or when the server returns NOERROR but without a resolution result. For details, see Attempts parameter request details.

^②^ VPC DNS servers are the default DNS servers configured on ECS instances. Their IP addresses are 100.100.2.136 and 100.100.2.138. They resolve domain names in PrivateZone and authoritative domain names.

^③^ The CoreDNS ClusterIP is the IP address of the kube-dns service in the kube-system namespace. It resolves internal service domain names and forwards resolution requests for PrivateZone and authoritative domain names.

^④^ The NodeLocal DNSCache IP is 169.254.20.10. When the NodeLocal DNSCache add-on is deployed, it listens on this IP address on each node.

For additional /etc/resolv.conf options, see resolv.conf.

Non-standard resolvers

The glibc defaults above apply only when the container uses glibc. Two common exceptions:

  • Alpine (musl libc): Alpine's built-in musl library replaces glibc and behaves differently in several ways: For details on musl resolution behavior, see musl libc.

    • Does not honor single-request and single-request-reopen options in /etc/resolv.conf.

    • Alpine 3.3 and earlier do not support the search parameter or search domains, which breaks service discovery.

    • Concurrent requests to multiple DNS servers make NodeLocal DNSCache optimizations ineffective.

    • Using the same socket to concurrently request A and AAAA records triggers conntrack race conditions on older kernel versions, causing intermittent packet loss.

  • Languages with built-in resolvers (Go, Node.js): These runtimes often bypass /etc/resolv.conf entirely and exhibit different resolution behaviors from the system resolver.

In-cluster DNS servers

By default, CoreDNS reads its upstream from the ECS /etc/resolv.conf and uses the built-in forward plug-in to forward DNS requests. NodeLocal DNSCache runs an embedded CoreDNS instance and uses the same forwarding configuration.

The following table lists the parameters that control the forward plug-in resolution policy. For the full reference, see Forward.

Parameter

Description

CoreDNS default value

NodeLocal DNSCache default value

prefer_udp

Uses UDP to communicate with the upstream server when possible.

Enabled

Disabled

force_tcp

Forces TCP for all upstream communication.

Disabled

Enabled

max_fails

The number of consecutive failed health checks before an upstream server is marked unhealthy.

2

2

expire

How long to keep the connection to the upstream server open.

10s

10s

policy

The policy for selecting an upstream server.

random

random

health_check

The health check interval.

0.5s

0.5s

max_concurrent

The maximum number of concurrent upstream connections.

None

None

dial timeout

The timeout for connecting to the upstream server. The value decreases dynamically based on actual connection time.

30s

30s

read timeout

The timeout for waiting for data from the upstream server.

2s

2s

Caching policies

Client side

Client-side caching varies by container image and application. The effective policy depends on your specific configuration.

In-cluster DNS servers

The following table lists the cache parameters for CoreDNS and NodeLocal DNSCache in ACK.

Parameter

Description

CoreDNS community default

NodeLocal DNSCache ACK default

CoreDNS ACK default

success Max TTL

The maximum time-to-live (TTL) for cached successful DNS resolution results.

3600s

30s

30s

success Min TTL

The minimum TTL for cached successful DNS resolution results.

5s

5s

5s

success Capacity

The number of successful DNS resolution results to cache.

9984

9984

9984

denial Max TTL

The maximum TTL for cached failed DNS resolution results.

1800s

5s

30s

denial Min TTL

The minimum TTL for cached failed DNS resolution results.

5s

5s

5s

denial Capacity

The number of failed DNS resolution results to cache.

9984

9984

9984

ServerError TTL

The TTL applied when the upstream DNS server is unavailable.

5s

0s (default is 5s for NodeLocal DNSCache Helm Chart versions earlier than 1.5.0)

0s (default is 5s for CoreDNS versions earlier than 1.8.4.2)

serve_stale

Allows CoreDNS to serve expired cache entries when the upstream DNS server is unreachable.

Disabled

Enabled (disabled by default for NodeLocal DNSCache Helm Chart versions earlier than 1.5.0)

Enabled (disabled by default for CoreDNS versions earlier than 1.12.1)

Note

The effective TTL is determined by the resolution result TTL, Max TTL, and Min TTL as follows:

  • If Result TTL > Max TTL, the effective TTL is the Max TTL.

  • If Result TTL < Min TTL, the effective TTL is the Min TTL.

  • If Min TTL ≤ Result TTL ≤ Max TTL, the effective TTL is the Result TTL.

Optimization suggestions

Adjust DNS behavior by editing the pod YAML, CoreDNS ConfigMap, or NodeLocal DNSCache ConfigMap.

Enhance fault tolerance

When dnsPolicy: Default is set on a pod, the container inherits the VPC DNS server settings from the ECS instance's /etc/resolv.conf. However, it does not inherit the rotate, single-request-reopen, timeout:2, and attempts:3 options. Without these, network jitter can cause intermittent DNS resolution failures.

The following shows the inherited configuration:

apiVersion: v1
kind: Pod
metadata:
  name: example
  namespace: default
spec:
  containers:
  - image: registry.cn-hangzhou.aliyuncs.com/example-ns/example:v1
    name: example
  # The dnsPolicy value in the Pod YAML is Default.
  dnsPolicy: Default

# The /etc/resolv.conf file in the container at this time.
# cat /etc/resolv.conf
nameserver 100.100.2.136
nameserver 100.100.2.138

Add dnsConfig to restore the missing fault-tolerance options:

apiVersion: v1
kind: Pod
metadata:
  name: example
  namespace: default
spec:
  containers:
  - image: registry.cn-hangzhou.aliyuncs.com/example-ns/example:v1
    name: example
  # The dnsPolicy value in the pod YAML is Default.
  dnsPolicy: Default
  # Add the following fault tolerance configuration.
  dnsConfig:
    options:
    - name: timeout
      value: "2"
    - name: attempts
      value: "3"
    - name: rotate
    - name: single-request-reopen

# After modification, redeploy the pod. The options parameter is added to /etc/resolv.conf in the container.
# cat /etc/resolv.conf
nameserver 100.100.2.136
nameserver 100.100.2.138
options rotate single-request-reopen timeout:2 attempts:3

High availability with serve_stale

The serve_stale feature lets CoreDNS return expired cache entries when upstream DNS servers are unreachable. This prevents resolution failures caused by transient upstream outages.

serve_stale is enabled by default in CoreDNS unmanaged edition v1.12.1 and later. For the RFC specification, see RFC-8767.

Configuration format

serve_stale [DURATION] [REFRESH_MODE]
  • DURATION: How long expired entries remain eligible to be served after expiry. Defaults to 1h. If an entry has been expired longer than this duration without a successful refresh, CoreDNS stops serving it.

  • REFRESH_MODE: Controls how CoreDNS handles expired entries:

    • verify: First verifies that the upstream DNS service is reachable, then returns the result to the client — using the fresh entry if the upstream responds, or falling back to the expired entry if it does not. This increases latency on stale responses but prevents serving an outdated entry when a fresh one is available.

    • immediate: Returns the expired entry to the client right away, then checks the upstream in the background. This gives a faster response but may serve stale data if the upstream has been updated.

Example

The following configuration is the default in CoreDNS unmanaged edition v1.12.1.2 and later:

cache 30 {
  ...
  serve_stale 30s verify
}
Important

The default configuration for CoreDNS unmanaged edition v1.12.1.1-4035d7a99-aliyun is:

cache 30 {
  ...
  serve_stale 1h immediate
}

With serve_stale 1h immediate, in extreme scenarios — such as when a client performs DNS resolution during an iterative update of a headless service — CoreDNS may return an expired entry. If this occurs frequently, change the policy to verify.