All Products
Search
Document Center

Container Service for Kubernetes:Best Practices for DNS services

Last Updated:Mar 12, 2025

The Domain Name System (DNS) service is essential for Kubernetes clusters. Incorrect DNS settings on your client or the use of large clusters can lead to DNS resolution timeouts and failures. This topic describes best practices for configuring DNS services in Kubernetes clusters to prevent such issues.

Prerequisites

Table of contents

This topic covers best practices for clients and DNS servers:

For more information about CoreDNS, see the official documentation of CoreDNS.

Optimize DNS queries

DNS queries are frequently submitted in Kubernetes. Many DNS queries can be optimized or avoided. You can optimize DNS queries by using one of the following methods:

  • (Recommended) Use a connection pool: If a client pod frequently accesses a Service, we recommend that you use a connection pool. A connection pool caches the links to upstream services in memory to avoid the overhead of DNS resolution and TCP connection for each access.

  • Use asynchronous or long polling mode to obtain the IP address that corresponds to a DNS domain name.

  • Use DNS caching:

    • (Recommended) If your application cannot be modified to connect to another Service by using a connection pool, you can cache the DNS resolution results on the application side. For more information, see Use NodeLocal DNSCache.

    • If you cannot use NodeLocal DNSCache, you can use the Name Service Cache Daemon (NSCD) to cache DNS resolution results in a container. For more information about how to use NSCD, see Use NSCD in Kubernetes clusters.

  • Optimize the resolv.conf file: The efficiency of DNS resolution depends on the way that domain names are written in the resolv.conf file in a container. This is because of the mechanisms of the ndots and search parameters. For more information about the mechanisms of the ndots and search parameters, see DNS policy configurations and DNS resolution instructions.

  • Optimize domain name configurations: When an application in a container needs to access a domain name, configure the domain name based on the following principles to minimize the number of DNS resolution attempts and reduce the time required for DNS resolution.

    • If a pod accesses a Service in the same namespace, use <service-name> to access the Service. In this example, service-name indicates the name of the Service.

    • If a pod accesses a Service across namespaces, use <service-name>.<namespace-name> to access the Service. In this example, namespace-name indicates the namespace to which the Service belongs.

    • If a pod accesses an external domain name, use a fully qualified domain name (FQDN) to access the domain name. An FQDN is specified by adding a period (.) to the end of a common domain name. This prevents multiple invalid searches caused by the concatenation of the search domain. For example, if you want to access www.aliyun.com, use the FQDN www.aliyun.com. to access the domain name.

Understand DNS configurations in containers

  • Different DNS resolvers may have subtle differences due to implementation differences. You may encounter a situation where dig <domain name> resolves normally but ping <domain name> fails.

  • We do not recommend that you use the Alpine base image. The musl libc library that is built into the Alpine container image differs from the standard glibc library. This may cause issues such as but not limited to the following. You can try other base images such as Debian or CentOS.

    • Alpine 3.18 and earlier versions do not support tc fallback to the TCP protocol.

    • Alpine 3.3 and earlier versions do not support the search parameter or search domains. As a result, service discovery cannot be completed.

    • Concurrent requests to multiple DNS servers that are configured in the /etc/resolv.conf file cause NodeLocal DNSCache optimization to fail.

    • musl libc processes A and AAAA queries that use the same socket in parallel. This causes packet loss on the conntrack port in earlier kernel versions.

    For more information about the preceding issues, see musl libc.

  • If you use the Go programming language, understand the differences between the DNS resolvers that are implemented in CGO and Pure Go.

Reduce the adverse effect of occasional DNS resolution timeouts caused by IPVS defects

If the load balancing mode of kube-proxy is set to IPVS in your cluster, DNS resolution timeouts may occur when CoreDNS pods are scaled in or restarted. This issue is caused by defects in the community Linux kernel. For more information, see IPVS.

You can use the following methods to reduce the adverse effect of these issues:

Use NodeLocal DNSCache

CoreDNS may occasionally face issues such as:

  • Packet loss can occur due to concurrent A and AAAA queries, leading to DNS resolution failure.

  • DNS resolution may fail if packet loss happens because the node's conntrack table is full.

To enhance the stability and performance of DNS services within the cluster, installing the NodeLocal DNSCache component is recommended. This component boosts DNS service performance by operating DNS caches on the cluster nodes. For more information about NodeLocal DNSCache and instructions on deploying it in an ACK cluster, see Use NodeLocal DNSCache.

Important

Once NodeLocal DNSCache is installed, you must inject the DNS cache configurations into pods. Execute the command below to label a specified namespace, which will automatically inject the DNS cache configurations into pods created within that namespace. For details on other injection methods, refer to the document mentioned above.

kubectl label namespace default node-local-dns-injection=enabled

Use proper CoreDNS versions

CoreDNS is compatible with Kubernetes. We recommend using a stable, recent version of CoreDNS. You can install, upgrade, and configure CoreDNS on the Add-ons page of the ACK console. If the CoreDNS component status indicates an available upgrade, we advise performing the upgrade during off-peak hours at your earliest convenience.

Versions of CoreDNS earlier than v1.7.0 may present risks, including, but not limited to, the following:

The table below describes the recommended minimum CoreDNS versions for clusters running various Kubernetes versions:

Kubernetes version

Recommended minimum CoreDNS version

Earlier than v1.14.8 (discontinued)

v1.6.2

v1.14.8 and later but earlier than v1.20.4

v1.7.0.0-f59c03d-aliyun

1.20.4 and later but earlier than 1.21.0

v1.8.4.1-3a376cc-aliyun

1.21.0 and later

v1.11.3.2-f57ea7ed6-aliyun

Note

Kubernetes versions earlier than 1.14.8 are no longer supported. We recommend upgrading your Kubernetes version before upgrading CoreDNS.

Monitor the status of CoreDNS

Monitoring metrics

CoreDNS provides health metrics such as resolution results through standard Prometheus interfaces, which are essential for detecting exceptions on CoreDNS and even upstream DNS servers.

Alibaba Cloud Prometheus monitoring offers built-in metrics and alert rules for CoreDNS. You can activate the Prometheus and gauge features through the Container Service Management Console or the . For more information, see Monitor CoreDNS components.

If you use open-source Prometheus to monitor your Kubernetes cluster, you can access the related metrics in Prometheus and establish alert rules based on key metrics. For more information, see the official documentation of CoreDNS Prometheus.

Running logs

In the event of DNS resolution errors, you can examine the CoreDNS log to pinpoint the issues. It is advisable to activate the DNS resolution log and SLS log collection for CoreDNS. For more information, see Analyze and monitor CoreDNS logs.

Sink Kubernetes events

From CoreDNS v1.9.3.6-32932850-aliyun onwards, the k8s_event plug-in is available to sink Kubernetes events, including Info, Error, and Warning logs from CoreDNS, to the event center. For details about the k8s_event plug-in, see k8s_event.

By default, the k8s_event plug-in is activated upon deploying CoreDNS v1.9.3.6-32932850-aliyun or later. If you have updated from an earlier version to v1.9.3.6-32932850-aliyun or beyond, you will need to modify the CoreDNS ConfigMap to enable the k8s_event plug-in.

  1. Execute the command below to access the CoreDNS ConfigMap.

    kubectl -n kube-system edit configmap/coredns
  2. Incorporate the kubeAPI and k8s_event plug-ins.

    apiVersion: v1
    data:
      Corefile: |
        .:53 {
            errors
            health {
                lameduck 15s
            }
    
            // The start of the plug-in configuration (ignore other differences).
            kubeapi
            k8s_event {
              level info error warning // Report Info, Error, and Warning logs to Kubernetes events.
            }
            // The end of the plug-in configuration.
    
            kubernetes cluster.local in-addr.arpa ip6.arpa {
                pods verified
                fallthrough in-addr.arpa ip6.arpa
            }
            // Details are not shown.
        }
  3. Verify the CoreDNS pod's running status and log. The presence of the term reload in the log indicates a successful modification.

Ensure high availability of CoreDNS

CoreDNS is the authoritative DNS service within the cluster. If CoreDNS fails, the cluster's access to services may be compromised, potentially leading to large-scale service outages. To ensure CoreDNS remains highly available, consider the following methods:

Evaluate the load on CoreDNS components

Perform DNS stress testing within the cluster to assess component load. Open source tools, such as DNSPerf, can facilitate this process. If precise DNS load evaluation within the cluster is challenging, consider adhering to the following recommended standards:

  • It is advisable to maintain a minimum of two CoreDNS pods under all circumstances, with each pod having at least 1 CPU core and 1 GB of memory allocated.

  • The DNS queries per second (QPS) that CoreDNS can handle is directly proportional to CPU consumption. With NodeLocal DNSCache, each CPU core can support over 10,000 DNS QPS. Service types can greatly influence DNS QPS requirements. Monitor the peak CPU usage of CoreDNS pods; if usage exceeds one core during peak times, consider scaling out CoreDNS pods. If peak CPU usage is unknown, a conservative approach is to deploy CoreDNS pods at a ratio of one pod per eight cluster nodes, adding one CoreDNS pod for every eight additional nodes.

Adjust the number of CoreDNS pods

The number of CoreDNS pods is a direct indicator of the available computing resources for CoreDNS. Adjust the pod count based on load evaluation results.

Important

UDP messages lack a retransmission mechanism. IPVS UDP defects on cluster nodes can lead to packet loss, which may result in DNS resolution timeouts or anomalies for up to five minutes across the entire cluster during CoreDNS pod scale-in or restart. For details on addressing DNS resolution issues caused by IPVS defects, see Troubleshoot DNS Resolution Exceptions.

  • Automatically adjust the number of CoreDNS pods based on the recommended policy

    Deploy the dns-autoscaler to automatically adjust the number of CoreDNS pods in real-time according to the earlier mentioned policy of one pod per eight cluster nodes. The pod count is calculated using the formula: replicas = max(ceil(cores × 1/coresPerReplica), ceil(nodes × 1/nodesPerReplica)). The number of pods is constrained by the max and min values.

    dns-autoscaler

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: dns-autoscaler
      namespace: kube-system
      labels:
        k8s-app: dns-autoscaler
    spec:
      selector:
        matchLabels:
          k8s-app: dns-autoscaler
      template:
        metadata:
          labels:
            k8s-app: dns-autoscaler
        spec:
          serviceAccountName: admin
          containers:
          - name: autoscaler
            image: registry.cn-hangzhou.aliyuncs.com/acs/cluster-proportional-autoscaler:1.8.4
            resources:
              requests:
                cpu: "200m"
                memory: "150Mi"
            command:
            - /cluster-proportional-autoscaler
            - --namespace=kube-system
            - --configmap=dns-autoscaler
            - --nodelabels=type!=virtual-kubelet
            - --target=Deployment/coredns
            - --default-params={"linear":{"coresPerReplica":64,"nodesPerReplica":8,"min":2,"max":100,"preventSinglePointFailure":true}}
            - --logtostderr=true
            - --v=9
  • Manually adjust the number of CoreDNS pods

    To manually change the number of CoreDNS pods, execute the following command:

    kubectl scale --replicas={target} deployment/coredns -n kube-system # Replace target with the number of target pods
  • Avoid using workload auto scaling

    Workload auto scaling, such as horizontal pod autoscaling (HPA) and scheduled horizontal pod autoscaling (CronHPA), can lead to frequent scaling in and out of pods. As previously mentioned, this can cause DNS resolution issues during pod scale-in, so it's best to avoid using workload auto scaling for CoreDNS pod management.

Adjust the specifications of CoreDNS pods

An alternative way to modify CoreDNS resources involves altering pod specifications. In ACK Pro clusters, CoreDNS pods come with a default memory limit of 2Gi, while the CPU is unlimited. It is advisable to establish a CPU limit of 4096m, with the minimum threshold being 1024m. You can modify CoreDNS pod settings through the console.

Modify CoreDNS Configurations in the Console

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Add-ons.

  3. Click the Network tab, locate the CoreDNS card, and then click Configure on the card.

    image

  4. After modifying the CoreDNS configurations, click OK.

    image

Schedule CoreDNS pods

Important

Inappropriate scheduling configurations can prevent CoreDNS pods from deploying correctly, leading to service unavailability. Ensure familiarity with scheduling before proceeding.

To mitigate the risk of single-node or single-zone failures, it is recommended to deploy CoreDNS pods across different zones and nodes. Versions of CoreDNS prior to v1.8.4.3 have default weak anti-affinity rules based on node allocation. If nodes are resource-constrained, pods may cluster on a single node. In such cases, delete the pods to trigger a rescheduling or upgrade the component to the latest version. Versions earlier than v1.8 are no longer supported, and an update is strongly recommended.

Avoid deploying CoreDNS pods on nodes with fully utilized CPU and memory resources to prevent adverse effects on DNS QPS and response times. If the cluster has enough nodes, you can schedule CoreDNS pods to dedicated nodes by setting custom parameters, ensuring stable domain name resolution services.

Deploy CoreDNS Pods to Exclusive Nodes Using Custom Parameters

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Nodes > Nodes.

  3. On the Nodes page, click Label And Taint Management.

  4. On the Label And Taint Management page, select the desired node and click Add Label.

    Note

    Ensure the number of selected nodes exceeds the number of CoreDNS pods to prevent multiple pods from being deployed to the same node.

  5. In the Add dialog box, enter the following parameters and click OK:

    • Name: node-role-type

    • Value: coredns

  6. In the left-side navigation pane of the Cluster Management page, go to Maintenance > Components and locate CoreDNS.

  7. Click Configure on the CoreDNS card. In the Configure dialog box, click + next to Nodeselector, set the following parameters, and click OK.

    • Key: node-role-type

    • Value: coredns

    This reschedules CoreDNS pods to nodes with the specified label, ensuring dedicated resources.

Optimize CoreDNS configurations

Container Service for Kubernetes (ACK) provides default configurations for CoreDNS. It is important to review and optimize these parameters to ensure CoreDNS delivers DNS services effectively for your application containers. CoreDNS configurations are highly flexible. For more information, see DNS policy configurations and DNS resolution instructions and the official CoreDNS documentation.

Default configurations from earlier CoreDNS versions deployed with Kubernetes clusters may introduce risks. To examine and enhance these configurations, consider the following operations:

You can also verify CoreDNS configurations using the Regular Inspection and Diagnosis features in the Container Intelligence Service console. If the inspection reveals that the coredns ConfigMap is abnormal, review the items listed above.

Note

Updating the coredns ConfigMap in ACK may increase memory usage. After modifying the coredns ConfigMap, monitor the CoreDNS pods' status. If the pods' memory resources are depleted, adjust the memory limit in the CoreDNS Deployment to 2 GB.

Disable the affinity settings of the kube-dns service

Affinity settings can lead to uneven CoreDNS pod loads. To disable these settings, follow these steps:

Use the ACK console

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Network > Services.

  3. In the kube-system namespace, select YAML Edit for the Kube-dns Service.

    • If the sessionAffinity field is None, you can skip the following steps.

    • If the sessionAffinity field is set to ClientIP, you can follow these steps.

  4. Remove the sessionAffinity and sessionAffinityConfig fields along with their subkeys, and click Update.

    # Delete all the following content.
    sessionAffinity: ClientIP
      sessionAffinityConfig:
        clientIP:
          timeoutSeconds: 10800
  5. Click YAML Edit on the right side of the kube-dns Service again to verify if the sessionAffinity field is set to None. Should the field be set to None, it indicates that the kube-dns Service has been modified.

Use the CLI

  1. Execute the following command to view the kube-dns Service configurations.

    kubectl -n kube-system get svc kube-dns -o yaml
    • If the sessionAffinity field is set to None, you can skip the following steps.

    • If the sessionAffinity field is set to ClientIP, you can proceed with the following steps.

  2. To modify the kube-dns Service, run the following command:

    kubectl -n kube-system edit service kube-dns
  3. Remove the sessionAffinity settings (sessionAffinity, sessionAffinityConfig, and all associated subkeys), save the changes, and close the file.

    # Delete all the following content.
    sessionAffinity: ClientIP
      sessionAffinityConfig:
        clientIP:
          timeoutSeconds: 10800
  4. Once you have completed the modifications, run the command below to verify that the sessionAffinity field is set to None. If it is set to None, this indicates that the kube-dns Service has been successfully modified.

    kubectl -n kube-system get svc kube-dns -o yaml

Disable the autopath plug-in

The autopath plug-in, enabled in earlier CoreDNS versions, can lead to DNS resolution failures in certain scenarios. To disable the autopath plug-in, update the coredns ConfigMap. For more information, see Autopath.

Note

Disabling the autopath plug-in may triple the number of DNS queries per second from the client and, consequently, the domain name resolution time. Monitor the load on CoreDNS and assess any impact on your services.

  1. To open the coredns ConfigMap, run the kubectl -n kube-system edit configmap coredns command.

  2. Remove the line containing autopath @kubernetes, save the changes, and exit.

  3. Verify the CoreDNS pod's running status and examine the logs. If the term reload is present in the logs, this signifies a successful modification.

Configure graceful shutdown for CoreDNS

lameduck is a feature in CoreDNS designed to facilitate a graceful shutdown. It allows ongoing requests to complete without interruption during a CoreDNS stop or restart. The lameduck feature operates in the following way:

  • Before termination, the CoreDNS process enters lameduck mode.

  • In lameduck mode, CoreDNS ceases to accept new requests while still processing any received requests until they are all completed or the lameduck timeout period expires.

Use the ACK console

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the name of the one you want to change. In the left-side navigation pane, choose Configurations > ConfigMaps.

  3. In the kube-system namespace, select YAML Edit next to the coredns configuration item.

  4. To ensure that the health plug-in is enabled and to set the lameduck timeout period to 15s, refer to the CoreDNS configuration file provided. Then, click OK.

  5. .:53 {
            errors       
            # The setting of the health plug-in may vary based on the CoreDNS version.
            # Scenario 1: The health plug-in is disabled by default.   
            # Scenario 2: The health plug-in is enabled by default but lameduck is not set.
            # health      
            # Scenario 3: The health plug-in is enabled by default and lameduck is set to 5s.   
            # health {
            #     lameduck 5s
            # }      
            # In the preceding scenarios, change the value of lameduck to 15s.
            health {
                lameduck 15s
            }       
            # You do not need to modify other plug-ins.
        }

If the CoreDNS pods are operating normally, CoreDNS is ready for a graceful shutdown. If the pods are not functioning correctly, examine the pod events and logs to determine the issue.

Use the CLI

  1. Execute the command below to edit the coredns ConfigMap.

  2. kubectl -n kube-system edit configmap/coredns
  3. Consult the Corefile below to confirm that the health plug-in is active and adjust the lameduck parameter to 15s.

  4. .:53 {
            errors     
            # The setting of the health plug-in may vary based on the CoreDNS version.
            # Scenario 1: The health plug-in is disabled by default.     
            # Scenario 2: The health plug-in is enabled by default but lameduck is not set.
            # health
            # Scenario 3: The health plug-in is enabled by default and lameduck is set to 5s.   
            # health {
            #     lameduck 5s
            # }
            # In the preceding scenarios, change the value of lameduck to 15s.
            health {
                lameduck 15s
            }
            # You do not need to modify other plug-ins.
        }
  5. Save the changes to the coredns ConfigMap and exit after making the modifications.

  6. Check the CoreDNS pods' status. If they are running as expected, CoreDNS is set for a graceful shutdown. If issues are detected, review the pod events and logs for troubleshooting.

Configure the default protocol of the forward plug-in and upstream VPC DNS servers

NodeLocal DNSCache communicates with CoreDNS using TCP, and CoreDNS in turn communicates with upstream DNS servers based on the originating protocol of DNS queries. Thus, DNS queries from client pods for external domain names traverse NodeLocal DNSCache and CoreDNS before reaching VPC DNS servers over TCP. The VPC DNS servers, with IP addresses 100.100.2.136 and 100.100.2.138, are automatically configured on Elastic Compute Service (ECS) instances.

DNS servers within a VPC offer limited TCP support. To use NodeLocal DNSCache effectively, you must adjust CoreDNS configurations to enable UDP communication with upstream DNS servers, avoiding DNS resolution issues. We recommend altering the CoreDNS configuration file by changing the ConfigMap named coredns within the kube-system namespace. For more information, see the referenced document. Set the protocol for upstream requests to prefer_udp within the forward plug-in. Following this modification, CoreDNS will default to using UDP for upstream communication. The setting can be adjusted as outlined below:

# The original setting
forward . /etc/resolv.conf
# The modified setting
forward . /etc/resolv.conf {
  prefer_udp
}

Configure the ready plug-in

The ready plug-in must be configured for CoreDNS versions 1.5.0 and above.

  1. To open the coredns ConfigMap, run the following command.

    kubectl -n kube-system edit configmap/coredns
  2. Verify if the line containing ready is present. If it is not, insert the line with ready. To save the changes and exit edit mode, press the Esc key, type :wq!, and press Enter.

    apiVersion: v1
    data:
     Corefile: |
      .:53 {
        errors
        health {
          lameduck 15s
        }
        ready # Add this line and make sure that the word "ready" is aligned with the word "kubernetes".
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods verified
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          max_concurrent 1000
                prefer_udp
        }
        cache 30
        loop
        log
        reload
        loadbalance
      }
  3. Verify the CoreDNS pod's running status and examine the logs. If the term reload is present in the logs, this signifies a successful modification.