All Products
Search
Document Center

Container Service for Kubernetes:DNS best practices

Last Updated:Apr 03, 2026

DNS is a critical service in Kubernetes clusters. Under certain conditions, such as improper client configuration or large cluster scale, DNS resolution may time out or fail. This guide provides best practices to prevent these issues.

Usage notes

This topic does not apply to Container Service for Kubernetes (ACK) clusters that use the managed edition of CoreDNS or have Auto Mode enabled. These clusters scale automatically based on load and require no manual adjustment.

Contents

DNS best practices cover both client-side and server-side optimizations:

For more information about CoreDNS, see the official CoreDNS documentation.

Optimize DNS resolution requests

DNS resolution is a frequent network operation in a Kubernetes cluster. You can optimize or avoid many of these requests to reduce latency and load on the DNS infrastructure:

  • (Recommended) Use connection pools. If your application frequently requests another service, we recommend using a connection pool. A connection pool caches active connections to upstream services in memory. This practice eliminates the overhead of DNS resolution and TCP handshakes for each request.

  • Use asynchronous requests or long polling to obtain the IP addresses that are mapped to a domain name.

  • Use DNS caching:

    • (Recommended) If you cannot refactor your application to use a connection pool to connect to another service, consider caching DNS resolution results on the client side. For more information, see Use NodeLocal DNSCache.

    • If you cannot use NodeLocal DNSCache, you can use the built-in Name Service Cache Daemon (NSCD) in your containers. For more information about how to use the NSCD cache, see Use NSCD in Kubernetes clusters.

  • Optimize the resolv.conf file: The way you write a domain name in a container determines the efficiency of domain name resolution because of how the ndots and search parameters in the resolv.conf file work. For more information about the mechanisms of the ndots and search parameters, see DNS policy configuration and domain name resolution.

  • Optimize domain name configuration. When an application accesses a domain name, specify it according to the following principles. This minimizes DNS resolution attempts and reduces latency.

    • When a pod accesses a Service in the same namespace, use <service-name> to access the Service. service-name indicates the name of the Service.

    • When a pod accesses a Service in a different namespace, use <service-name>.<namespace-name> to access the Service. namespace-name indicates the namespace where the Service resides.

    • When a pod accesses an external domain, prioritize using an FQDN, which is a domain name that ends with a period (.), to avoid multiple unnecessary lookups caused by appending search domains. For example, when you access www.aliyun.com, use the FQDN www.aliyun.com..

      • In clusters that run Kubernetes 1.33 or later, you can configure the search domain as a single period (.) (see Issue 125883). This effectively turns all DNS requests into FQDN requests, preventing unnecessary search domain iterations:

        dnsPolicy: None
        dnsConfig:
          nameservers: ["192.168.0.10"]  ## Replace 192.168.0.10 with the clusterIP of the kube-dns service.
          searches:
          - .
          - default.svc.cluster.local  ## Note: Replace default with the actual namespace.
          - svc.cluster.local
          - cluster.local

        With the preceding configuration, the /etc/resolv.conf file in the pod is as follows:

        search . default.svc.cluster.local svc.cluster.local cluster.local
        nameserver 192.168.0.10

        With . as the first search domain, the system immediately treats the target domain name of a resolution request as an FQDN, attempts to resolve it directly, and avoids unnecessary searches.

        Important

        You must set dnsPolicy to None for this configuration to take effect.

        Complete workload example

        apiVersion: apps/v1
        kind: Deployment
        metadata:
          labels:
            app: nginx
          name: nginx
          namespace: default
        spec:
          progressDeadlineSeconds: 600
          replicas: 3
          revisionHistoryLimit: 10
          selector:
            matchLabels:
              app: nginx
          strategy:
            rollingUpdate:
              maxSurge: 25%
              maxUnavailable: 25%
            type: RollingUpdate
          template:
            metadata:
              labels:
                app: nginx
            spec:
              containers:
              - image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
                imagePullPolicy: Always
                name: nginx
                resources: {}
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
              dnsPolicy: None
              dnsConfig:
                nameservers: ["192.168.0.10"]  ## Replace 192.168.0.10 with the clusterIP of the kube-dns service.
                searches:
                - .
                - default.svc.cluster.local
                - svc.cluster.local
                - cluster.local
              hostname: nginx
              restartPolicy: Always
              schedulerName: default-scheduler
              securityContext: {}
              subdomain: subdomain
              terminationGracePeriodSeconds: 30

Understand DNS configuration in containers

  • Different DNS resolvers have subtle implementation differences, which can cause dig <domain> to succeed while ping <domain> fails.

  • We strongly recommend using base images such as Debian or CentOS instead of Alpine Linux. The musl libc library used in Alpine has several implementation differences compared to the standard glibc, leading to issues including but not limited to:

    • Alpine versions 3.18 and earlier do not support fallback to TCP for truncated (TC) responses.

    • Alpine versions 3.3 and earlier do not support the search parameter or search domains. This breaks service discovery.

    • Concurrent requests to multiple DNS servers that are configured in /etc/resolv.conf can bypass NodeLocal DNSCache optimizations.

    • Using the same socket for concurrent A and AAAA record requests can trigger conntrack source port conflicts in older kernel versions, which causes packet loss.

    For more information about these issues, see the musl libc documentation.

  • If you use Go applications, understand the differences between the DNS resolvers in cgo and pure Go implementations.

Mitigate intermittent DNS timeouts in IPVS mode

When a cluster uses IPVS as the kube-proxy load balancing mode, you may encounter intermittent DNS resolution timeouts when CoreDNS is scaled down or restarted. This issue is caused by a bug in the Linux kernel. For more information, see IPVS.

You can use one of the following methods to mitigate the impact of this IPVS defect:

Use NodeLocal DNSCache

CoreDNS may occasionally encounter the following issues:

  • In rare cases, packet loss may occur due to concurrent A and AAAA queries, which causes DNS resolution failures.

  • A full conntrack table on a node may cause packet loss, which leads to DNS resolution failures.

To improve the stability and performance of the DNS service in your cluster, we recommend that you install the NodeLocal DNSCache add-on. This add-on improves cluster DNS performance by running a local DNS cache on each node. For detailed information about NodeLocal DNSCache and how to deploy it in an ACK cluster, see Use the NodeLocal DNSCache add-on.

Important

After you install NodeLocal DNSCache, you must inject the DNS cache configuration into your pods. Run the following command to add a label to the specified namespace. New pods created in this namespace will automatically have the DNS cache configuration injected. For more information about other injection methods, see the preceding documentation.

kubectl label namespace default node-local-dns-injection=enabled

Use an appropriate CoreDNS version

CoreDNS offers good backward compatibility with Kubernetes. However, it is critical to keep CoreDNS updated to the latest stable version. The Add-ons page in the ACK console allows you to install, upgrade, and configure CoreDNS. Check the status of the CoreDNS add-on on the Add-ons page. If an upgrade is available, schedule it during off-peak hours.

CoreDNS versions earlier than v1.7.0 have potential risks, including but not limited to:

The recommended minimum CoreDNS version varies based on the Kubernetes version of your cluster. The following table describes the details.

Cluster version

Recommended minimum version

Earlier than 1.14.8

v1.6.2 (End-of-life)

1.14.8 or later, but earlier than 1.20.4

v1.7.0.0-f59c03d-aliyun

1.20.4 or later, but earlier than 1.21.0

v1.8.4.1-3a376cc-aliyun

1.21.0 or later

v1.11.3.2-f57ea7ed6-aliyun

Monitor the operational status of CoreDNS

Monitor metrics

CoreDNS exposes health metrics, such as DNS resolution results, through a standard Prometheus interface to help detect errors on the CoreDNS server and even on upstream DNS servers.

Managed Service for Prometheus provides built-in metrics monitoring and alerting rules for CoreDNS. You can enable Prometheus and its dashboard features in the ACK console. For more information, see Monitor the CoreDNS component.

If you use a self-managed Prometheus instance to monitor your Kubernetes cluster, you can observe relevant metrics in Prometheus and set alerts for key metrics. For more information, see the official CoreDNS Prometheus documentation.

Logs

In the event of a DNS error, CoreDNS logs help you quickly diagnose the root cause. We recommend that you enable CoreDNS domain name resolution logging and Log Service collection. For more information, see Analyze and monitor CoreDNS logs.

Kubernetes event delivery

In CoreDNS v1.9.3.6-32932850-aliyun and later, you can enable the k8s_event plugin to deliver critical CoreDNS logs to the event center as Kubernetes events. For more information about the k8s_event plugin, see k8s_event.

This feature is enabled by default in new CoreDNS deployments. If you upgrade CoreDNS from an earlier version to v1.9.3.6-32932850-aliyun or later, you need to manually modify the configuration file to enable this feature.

  1. Run the following command to open the CoreDNS configuration file:

    kubectl -n kube-system edit configmap/coredns
  2. Add the kubeAPI and k8s_event plugins.

    apiVersion: v1
    data:
      Corefile: |
        .:53 {
            errors
            health {
                lameduck 15s
            }
    
            // Start of addition (ignore other differences)
            kubeapi
            k8s_event {
              level info error warning // Delivers critical logs of the info, error, and warning levels.
            }
            // End of addition
    
            kubernetes cluster.local in-addr.arpa ip6.arpa {
                pods verified
                fallthrough in-addr.arpa ip6.arpa
            }
            // ... (remaining configuration omitted)
        }
  3. Check the status and logs of the CoreDNS pod. If the logs contain the word reload, the modification is successful.

CoreDNS high availability

CoreDNS is the authoritative DNS for your cluster. A CoreDNS failure can prevent access to Services within the cluster and may cause widespread application outages. You can take the following measures to ensure the high availability of CoreDNS:

Assess the load on CoreDNS

You can run a DNS stress test in your cluster to assess the load. Many open source tools, such as DNSPerf, can help you achieve this. If you cannot accurately assess the DNS load in your cluster, refer to the following recommendations.

  • We recommend that you set the number of CoreDNS pods to at least 2 in all scenarios. The resource limit for a single pod must be at least 1 core and 1 GiB of memory.

  • The DNS resolution QPS that CoreDNS can provide is positively correlated with CPU consumption. With NodeLocal DNSCache, each CPU core can typically support over 10,000 QPS. The QPS requirements for DNS requests vary significantly among different types of services. You can observe the peak CPU usage of each CoreDNS pod. If CPU consumption exceeds one core during peak hours, scale out CoreDNS replicas. If you cannot determine peak CPU usage, use a conservative 1:8 replica-to-node ratio.

Adjust CoreDNS replicas

The number of CoreDNS replicas directly determines the computing resources that CoreDNS can use. You can adjust the number of CoreDNS replicas based on your assessment.

Important

Due to the lack of a retransmission mechanism in UDP, if there is a risk of packet loss on cluster nodes due to the IPVS UDP defect, scaling in or restarting CoreDNS pods may cause DNS resolution timeouts or errors across the entire cluster for up to five minutes. For more information about how to resolve resolution errors that are caused by the IPVS defect, see Troubleshoot DNS resolution errors.

  • Automatically adjust based on the recommended policy

    You can deploy the following dns-autoscaler. It automatically adjusts the number of CoreDNS replicas in real time based on the recommended 1:8 replica-to-node ratio. The number of replicas is calculated by using the following formula: replicas = max(ceil(cores × 1/coresPerReplica), ceil(nodes × 1/nodesPerReplica)). The result is also constrained by the max and min values.

    dns-autoscaler

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: dns-autoscaler
      namespace: kube-system
      labels:
        k8s-app: dns-autoscaler
    spec:
      selector:
        matchLabels:
          k8s-app: dns-autoscaler
      template:
        metadata:
          labels:
            k8s-app: dns-autoscaler
        spec:
          serviceAccountName: admin
          containers:
          - name: autoscaler
            image: registry.cn-hangzhou.aliyuncs.com/acs/cluster-proportional-autoscaler:1.8.4
            resources:
              requests:
                cpu: "200m"
                memory: "150Mi"
            command:
            - /cluster-proportional-autoscaler
            - --namespace=kube-system
            - --configmap=dns-autoscaler
            - --nodelabels=type!=virtual-kubelet
            - --target=Deployment/coredns
            - --default-params={"linear":{"coresPerReplica":64,"nodesPerReplica":8,"min":2,"max":100,"preventSinglePointFailure":true}}
            - --logtostderr=true
            - --v=9
  • Manually adjust

    You can run the following command to manually adjust the number of CoreDNS replicas:

    kubectl scale --replicas={target} deployment/coredns -n kube-system # Replace {target} with the desired number of replicas.
  • Do not use workload auto-scaling

    While workload auto-scaling mechanisms, such as Horizontal Pod Autoscaler (HPA) and CronHPA, can also adjust the number of replicas, they perform frequent scaling. Because scaling in can cause resolution errors, do not use workload auto-scaling for CoreDNS.

Adjust CoreDNS pod specifications

You can also adjust CoreDNS resources by modifying pod specifications. In an ACK managed Pro cluster, the default memory limit for CoreDNS pods is 2 GiB, with no CPU limit. For consistent performance, we recommend a CPU limit of at least 1024m. You can adjust these resource requests and limits in the console.

Modify CoreDNS configuration in the console

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Add-ons.

  3. Click the Network tab and find the CoreDNS card. Click Configuration on the card.

    image

  4. Modify the CoreDNS configuration and click OK.

    image

Schedule CoreDNS pods

Important

Incorrect scheduling configurations can prevent CoreDNS pods from being deployed, leading to CoreDNS failures. Before you perform this operation, make sure you are familiar with scheduling.

We recommend that you deploy CoreDNS pods in different availability zones and on different cluster nodes to prevent single points of failure. CoreDNS add-on versions before v1.8.4.3 have a default preferred node anti-affinity, which may cause pods to be deployed on the same node if resources are insufficient. If this occurs, delete the pods to trigger rescheduling or upgrade the add-on to the latest version. CoreDNS add-on versions earlier than v1.8 are no longer maintained. Upgrade to the latest version as soon as possible.

Ensure that the nodes running CoreDNS are not saturated with high CPU or memory usage, as this affects the QPS and response latency of DNS resolution. If cluster node resources permit, consider using custom parameters to schedule CoreDNS to independent cluster nodes to provide stable DNS resolution services.

Use custom parameters to deploy CoreDNS on dedicated nodes

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Nodes > Nodes.

  3. On the Nodes page, click Manage Labels and Taints.

  4. On the Manage Labels and Taints page, select the target nodes and click Add Label.

    Note

    The number of nodes must be greater than the number of CoreDNS replicas to avoid running multiple CoreDNS replicas on a single node.

  5. In the Add dialog box, set the following parameters and click OK.

    • Name: node-role-type

    • Value: coredns

  6. In the left-side navigation pane of the cluster management page, choose Operations > Add-ons, and then search for CoreDNS.

  7. On the CoreDNS card, click Configuration. In the Configuration dialog box, click + Add to the right of NodeSelector, set the following parameters, and then click OK.

    • Key: node-role-type

    • Value: coredns

    CoreDNS is rescheduled to the nodes with the specified label.

Optimize CoreDNS configuration

ACK provides a default configuration for CoreDNS. You should review the parameters in the configuration and optimize them to ensure that CoreDNS can provide proper DNS services for your business containers. CoreDNS is highly configurable. For more information, see Configure DNS policies and domain name resolution and the official CoreDNS documentation.

You can also use the scheduled inspection and fault diagnosis features of Container Intelligence Service to check the CoreDNS configuration file. If the service reports an abnormal CoreDNS ConfigMap configuration, review the items listed above.

Note

CoreDNS may consume extra memory when refreshing its configuration. After you modify the CoreDNS ConfigMap, monitor the pod status. If a pod has insufficient memory, modify the container memory limit in the CoreDNS Deployment in a timely manner. We recommend that you adjust the memory to 2 GiB.

Disable affinity for the kube-dns service

An affinity configuration can cause significant load imbalances between different CoreDNS replicas. We recommend that you follow these steps to disable it:

Console

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Network > Services.

  3. In the kube-system namespace, click Edit YAML to the right of the kube-dns service.

    • If the sessionAffinity field is set to None, you can skip the following steps.

    • If the sessionAffinity field is set to ClientIP, perform the following steps.

  4. Delete the sessionAffinity and sessionAffinityConfig fields and all their sub-keys. Then, click Update.

    # Delete all of the following content.
    sessionAffinity: ClientIP
    sessionAffinityConfig:
      clientIP:
        timeoutSeconds: 10800
  5. Click Edit YAML to the right of the kube-dns service again. Verify that the sessionAffinity field is set to None. A value of None indicates that the Kube-DNS service is successfully updated.

CLI

  1. Run the following command to view the configuration details of the kube-dns service:

    kubectl -n kube-system get svc kube-dns -o yaml
    • If the sessionAffinity field is set to None, you can skip the following steps.

    • If the sessionAffinity field is set to ClientIP, perform the following steps.

  2. Run the following command to open and edit the service named kube-dns:

    kubectl -n kube-system edit service kube-dns
  3. Delete the sessionAffinity-related settings (sessionAffinity, sessionAffinityConfig, and all their sub-keys), and then save and exit.

    # Delete all of the following content.
    sessionAffinity: ClientIP
    sessionAffinityConfig:
      clientIP:
        timeoutSeconds: 10800
  4. After the modification is complete, run the following command again to check if the sessionAffinity field is None. If the value is None, the Kube-DNS service is successfully changed.

    kubectl -n kube-system get svc kube-dns -o yaml

Disable the autopath plugin

The autopath plugin, enabled in some early CoreDNS versions, can cause resolution errors in edge cases. Check if the plugin is enabled and disable it by editing the configuration file. For more information, see Autopath.

Note

After you disable the autopath plugin, the QPS for client-side DNS resolution requests can increase by up to three times, and the time taken to resolve a single domain name can also increase by up to three times. Monitor the CoreDNS load and the impact on your services.

  1. Run the kubectl -n kube-system edit configmap coredns command to open the CoreDNS configuration file.

  2. Delete the autopath @kubernetes line, and then save and exit.

  3. Check the status and logs of the CoreDNS pod. If the logs contain the word reload, the modification is successful.

Configure graceful shutdown for CoreDNS

lameduck is a graceful shutdown mechanism in CoreDNS. It ensures that when CoreDNS is stopped or restarted, ongoing requests are completed properly and are not abruptly interrupted. The lameduck mechanism works as follows:

  • When the CoreDNS process is about to terminate, it enters lameduck mode.

  • In lameduck mode, CoreDNS stops accepting new requests but continues to process requests it has already received until all requests are completed or the lameduck timeout period is exceeded.

Console

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Configurations > ConfigMaps.

  3. In the kube-system namespace, click Edit YAML to the right of the coredns ConfigMap.

  4. Refer to the following CoreDNS configuration file. Ensure that the health plugin is enabled and adjust the lameduck timeout to 15s. Then, click OK.

  5. .:53 {
            errors       
            # The health plugin may have different settings in different CoreDNS versions.
            # Scenario 1: The health plugin is not enabled by default.   
            # Scenario 2: The health plugin is enabled by default, but the lameduck time is not set.
            # health      
            # Scenario 3: The health plugin is enabled by default, and the lameduck time is set to 5s.   
            # health {
            #     lameduck 5s
            # }      
            # For all three scenarios, modify the configuration as follows to set the lameduck parameter to 15s.
            health {
                lameduck 15s
            }       
            # Other plugins do not need to be modified and are omitted here.
        }

If the CoreDNS pod is running as expected, the graceful shutdown configuration is successfully updated. If the CoreDNS pod is abnormal, you can view pod events and logs to identify the cause.

CLI

  1. Run the following command to open the CoreDNS configuration file:

  2. kubectl -n kube-system edit configmap/coredns
  3. Refer to the following Corefile. Ensure that the health plugin is enabled and adjust the lameduck parameter to 15s.

  4. .:53 {
            errors     
            # The health plugin may have different settings in different CoreDNS versions.
            # Scenario 1: The health plugin is not enabled by default.     
            # Scenario 2: The health plugin is enabled by default, but the lameduck time is not set.
            # health
            # Scenario 3: The health plugin is enabled by default, and the lameduck time is set to 5s.   
            # health {
            #     lameduck 5s
            # }
            # For all three scenarios, modify the configuration as follows to set the lameduck parameter to 15s.
            health {
                lameduck 15s
            }
            # Other plugins do not need to be modified and are omitted here.
        }
  5. After you modify the CoreDNS configuration file, save and exit.

  6. If CoreDNS is running as expected, the graceful shutdown configuration is successfully updated. If the CoreDNS pod is abnormal, you can view pod events and logs to identify the cause.

Configure the forward plugin protocol

When using NodeLocal DNSCache, the communication chain is: Application → NodeLocal DNSCache (TCP) → CoreDNS (TCP). By default, CoreDNS then communicates with upstream DNS servers by using the same protocol as the source request. This means external domain name resolution requests from your workloads pass through NodeLocal DNSCache and CoreDNS, and are then sent to the VPC DNS servers (the two IP addresses 100.100.2.136 and 100.100.2.138 that are configured by default on ECS instances) over TCP.

VPC DNS servers have limited support for TCP. If you use NodeLocal DNSCache, you need to modify the CoreDNS configuration to make it always prioritize UDP for communication with upstream DNS servers to avoid resolution errors. Modify the CoreDNS configuration file, which is the ConfigMap named coredns in the kube-system namespace. For more information, see Manage ConfigMaps. In the forward plugin, specify the protocol for upstream requests as prefer_udp. After the modification, CoreDNS will prioritize UDP for upstream communication. The modification is as follows:

# Before modification
forward . /etc/resolv.conf
# After modification
forward . /etc/resolv.conf {
  prefer_udp
}

Configure the ready plugin

CoreDNS versions 1.5.0 and later require the ready plugin to be configured to enable readiness probes.

  1. Run the following command to open the CoreDNS configuration file:

    kubectl -n kube-system edit configmap/coredns
  2. Check whether the file contains a line with ready. If not, add the line ready, press the Esc key, enter :wq!, and then press Enter to save the modified configuration file and exit edit mode.

    apiVersion: v1
    data:
     Corefile: |
      .:53 {
        errors
        health {
          lameduck 15s
        }
        ready # If this line does not exist, add it. Ensure that the indentation is consistent with Kubernetes.
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods verified
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          max_concurrent 1000
                prefer_udp
        }
        cache 30
        loop
        log
        reload
        loadbalance
      }
  3. Check the status and logs of the CoreDNS pod. If the logs contain the word reload, the modification is successful.

Configure the multisocket plugin

CoreDNS v1.12.1 introduced the multisocket plugin. You can enable this plugin to allow CoreDNS to listen on the same port by using multiple sockets, which improves CoreDNS performance in high-CPU scenarios. For more information about the plugin, see the community documentation.

You need to enable multisocket in the coredns ConfigMap:

.:53 {
        ...
        prometheus :9153
        multisocket [NUM_SOCKETS]
        forward . /etc/resolv.conf
        ...
}

NUM_SOCKETS determines the number of sockets that listen on the same port.

Configuration recommendation: Align NUM_SOCKETS with the estimated CPU utilization, CPU resource limits, and available cluster resources, for example:

  • If CoreDNS consumes 4 cores during peak hours and the available resources are 8 cores, set NUM_SOCKETS to 2.

  • If CoreDNS consumes 8 cores during peak hours and 64 cores are available, set NUM_SOCKETS to 8.

To determine the optimal configuration, we recommend that you test the QPS and load with different configurations.

If you do not specify NUM_SOCKETS, the default value is GOMAXPROCS, which is equal to the CPU limit of the CoreDNS pod. If the CPU limit of the pod is not set, the value is equal to the number of CPU cores on the node where the pod is running.