The Domain Name System (DNS) service is a basic service for Kubernetes clusters. If the DNS settings on your client are not properly configured or if you use a large cluster, DNS resolution timeouts and failures may occur. This topic describes best practices for configuring DNS services in Kubernetes clusters to help you avoid the issues.

Background information

The best practices in this topic involve clients and DNS servers:

For more information about CoreDNS, see Official Documentation of CoreDNS.

Optimize DNS queries

DNS queries are frequently submitted in Kubernetes. A large number of DNS queries can be optimized or avoided. You can optimize DNS queries by using one of the following methods:
  • Use a connection pool: If a client pod frequently accesses a Service, we recommend that you use a connection pool. A connection pool can cache connections to the upstream Service in the memory. This way, the client pod no longer needs to send a DNS query and establish a TCP connection each time it accesses the Service.
  • Use DNS caching:
    • If you cannot use a connection pool to connect a client pod to a Service, we recommend that you cache DNS resolution results on the client pod side. For more information, see Use NodeLocal DNSCache to optimize DNS resolution.
    • If you cannot use NodeLocal DNSCache, you can use the built-in Name Service Cache Daemon (NSCD) in containers. For more information about how to use NSCD, see Use NSCD in Kubernetes clusters.
  • Optimize the resolv.conf file: Due to the use of the ndots and search parameters in the resolv.conf file, the DNS resolution efficiency is affected by how you specify domain names in containers. For more information about the ndots and search parameters, see Configure DNS resolution.
  • Optimize domain name settings: You can specify the domain name that a client pod needs to access based on the following rules. These rules can help minimize the number of attempts for resolving the domain name and make the DNS resolution service more efficient.
    • If the client pod needs to access a Service in the same namespace, use <service-name> as the domain name. <service-name> indicates the name of the Service.
    • If the client pod needs to access a Service in another namespace, use <service-name>.<namespace-name> as the domain name. namespace-name specifies the namespace to which the Service belongs.
    • If the client pod needs to access an external domain name, you can specify the domain name in the Fully Qualified Domain Name (FQDN) format by appending a period (.) to the domain name. This avoids invalid DNS lookups caused by search domains. For example, if the client pod needs to access www.aliyun.com, you can specify the domain name as www.aliyun.com..

Use proper container images

The implementation of the built-in musl libc in Alpine container images is different from that of glibc:
  • Alpine 3.3 and earlier versions do not support the search parameter which allows you to specify search domains. As a result, Services cannot be discovered.
  • musl libc processes queries that are sent to the DNS servers that are specified in the /etc/resolv.conf file in parallel. As a result, NodeLocal DNSCache fails to optimize DNS resolution.
  • musl libc processes A and AAAA queries that use the same socket in parallel. This causes packet loss on the conntrack port in earlier kernel versions.
For more information, see musl libc.

If containers that are deployed in a Kubernetes cluster use Alpine as the base image, domain names may not be resolved due to the use of musl libc. We recommend that you replace the image with an image that is based on Debian or CentOS.

Reduce the adverse effect of occasional DNS resolution timeouts caused by IPVS defects

If the load balancing mode of kube-proxy is set to IPVS in your cluster, DNS resolution timeouts may occur when CoreDNS pods are scaled in or restarted. The issues are caused by the kernel bugs of Linux. For more information, see IPVS.

You can use the following methods to reduce the adverse effect of this issue:

Use NodeLocal DNSCache to optimize DNS resolution

Container Service for Kubernetes (ACK) allows you to deploy NodeLocal DNSCache to improve the stability and performance of service discovery. NodeLocal DNSCache is implemented as a DaemonSet and runs a DNS caching agent on cluster nodes to improve the efficiency of DNS resolution for ACK clusters.

For more information about NodeLocal DNSCache and how to deploy NodeLocal DNSCache in ACK clusters, see Configure NodeLocal DNSCache.

Use proper CoreDNS versions

CoreDNS is backward-compatible with Kubernetes. We recommend that you use a new stable version of CoreDNS. You can install, upgrade, and CoreDNS on the Add-ons page of the ACK console. If the status of the CoreDNS component indicates that CoreDNS is upgradable, we recommend that you upgrade the component during off-peak hours at the earliest opportunity.
The following issues may occur in CoreDNS versions earlier than 1.7.0:
  • If connectivity exceptions occur between CoreDNS and the API server, such as network jitters, API server restarts, or API server migrations, CoreDNS pods may be restarted because error logs cannot be written. For more information, see Set klog's logtostderr flag.
  • CoreDNS occupies extra memory resources during the initialization process. In this process, the default memory limit may cause out of memory (OOM) errors in large clusters. If this situation intensifies, CoreDNS pods may be repetitively restarted but fail to be started. For more information, see CoreDNS uses a lot memory during initialization phase.
  • CoreDNS has issues that may affect the domain name resolution of headless Services and requests from outside the cluster. For more information, see plugin/kubernetes: handle tombstones in default processor and Data is not synced when CoreDNS reconnects to kubernetes api server after protracted disconnection.
  • Some earlier CoreDNS versions are configured with default toleration rules that may cause CoreDNS pods to be deployed on abnormal nodes and fail to be automatically evicted when exceptions occur on the nodes. This may lead to domain name resolution errors in the cluster.
The following table describes the recommended CoreDNS versions for clusters that run different Kubernetes versions.
Kubernetes CoreDNS
Earlier than 1.14.8 (discontinued) v1.6.2
1.14.8 and later but earlier than 1.20.4 v1.7.0.0-f59c03d-aliyun
1.20.4 and later v1.8.4.1-3a376cc-aliyun
Note Kubernetes versions earlier than 1.14.8 are discontinued. We recommend that you upgrade the Kubernetes version before you upgrade CoreDNS.

Monitor the status of CoreDNS

Metrics

CoreDNS uses the standard Prometheus API to collect metrics such as DNS resolution results. This allows you to identify exceptions in CoreDNS and upstream DNS servers at the earliest opportunity.

By default, monitoring metrics and alerting rules related to CoreDNS are predefined in Application Real-Time Monitoring Service (ARMS) Prometheus provided by Alibaba Cloud. You can log on to the the ACK console to enable Prometheus and dashboards. For more information, see Enable ARMS Prometheus.

If you use open source Prometheus to monitor the Kubernetes cluster, you can view the related metrics in Prometheus and create alert rules based on the following key metrics. For more information, see CoreDNS Prometheus official documentation. The following table describes the key metrics.
Metric Description Alert setting
coredns_dns_requests_total The number of requests. You can create alert rules based on the number of requests. This helps you check whether the current DNS QPS is high.
coredns_dns_responses_total The number of responses. You can create alert rules based on the number of responses with different response codes (RCODE). For example, you can create alert rules based on the SERVFAIL error.
coredns_panics_total The number of abnormal exits of CoreDNS. You can create alert rules based on the number of abnormal exits of CoreDNS. If the value of this metric is greater than 0, abnormal exits occur and alerts are triggered.
coredns_dns_request_duration_seconds Duration to process a DNS query. Alerts are triggered when the duration to process a DNS query exceeds the specified threshold.

Operational log

When DNS resolution errors occur, you can view the log of CoreDNS to identify the causes. We recommend that you enable logging for CoreDNS and use Log Service to collect the log data. For more information, see Monitor CoreDNS and analyze the CoreDNS log.

Modify the CoreDNS Deployment

Modify the number of CoreDNS pods

We recommend that you provision at least two CoreDNS pods. You must make sure that the number of CoreDNS pods is sufficient to handle DNS queries within the cluster.

The DNS QPS of CoreDNS is related to CPU usage. A single CPU can handle more than 10,000 DNS QPS if you enable DNS caching. The DNS QPS required by different workloads may vary. You can evaluate the DNS QPS based on the peak CPU usage of each CoreDNS pod. We recommend that you increase the number of CoreDNS pods if a CoreDNS pod occupies more than one vCPU during peak hours. If you cannot confirm the peak CPU usage of each CoreDNS pod, you can set the ratio of CoreDNS pods to cluster nodes to 1:8. This way, each time you add eight nodes to a cluster, a CoreDNS pod is created. The total number of CoreDNS pods must not exceed 10. If your cluster contains more than 100 nodes, we recommend that you use NodeLocal DNSCache. For more information, see Use NodeLocal DNSCache to optimize DNS resolution.

Note UDP does not support retransmission. When CoreDNS pods terminate, UDP packets may be dropped if a CoreDNS pod is deleted or restarted. As a result, the cluster may experience DNS query timeouts or failures. If UDP packet loss caused by IPVS issues occurs on cluster nodes, the cluster may experience DNS query timeouts or failures that last for 5 minutes after a CoreDNS pod is deleted or restarted. For more information about how to resolve DNS query failures caused by IPVS issues, see What do I do if DNS resolutions fail due to IP Virtual Server (IPVS) errors?.

Schedule CoreDNS pods to proper nodes

When you deploy CoreDNS pods in a cluster, we recommend that you deploy the CoreDNS pods on different cluster nodes across multiple zones. This prevents service disruptions when a single node or zone fails. By default, soft anti-affinity settings based on nodes are configured for CoreDNS. Some or all CoreDNS pods may be deployed on the same node due to insufficient nodes. In this case, we recommend that you delete the CoreDNS pods and reschedule the pods.

CoreDNS pods must not be deployed on cluster nodes whose CPU and memory resources are fully utilized. Otherwise, DNS QPS and response time are adversely affected.

Manually increase the number of CoreDNS pods

If the number of cluster nodes remains unchanged for a long period of time, you can run the following command to increase the number of CoreDNS pods:
kubectl scale --replicas={target} deployment/coredns -n kube-system
Note Replace {target} with the required value.

Dynamically increase the number of CoreDNS pods by using cluster-autoscaler

If the number of cluster nodes increases, you can use the following YAML template to deploy cluster-proportional-autoscaler and dynamically increase the number of CoreDNS pods:
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-autoscaler
  namespace: kube-system
  labels:
    k8s-app: dns-autoscaler
spec:
  selector:
    matchLabels:
      k8s-app: dns-autoscaler
  template:
    metadata:
      labels:
        k8s-app: dns-autoscaler
    spec:
      serviceAccountName: admin
      containers:
      - name: autoscaler
        image: registry.cn-hangzhou.aliyuncs.com/acs/cluster-proportional-autoscaler:1.8.4
        resources:
          requests:
            cpu: "200m"
            memory: "150Mi"
        command:
        - /cluster-proportional-autoscaler
        - --namespace=kube-system
        - --configmap=dns-autoscaler
        - --nodelabels=type!=virtual-kubelet
        - --target=Deployment/coredns
        - --default-params={"linear":{"coresPerReplica":64,"nodesPerReplica":8,"min":2,"max":100,"preventSinglePointFailure":true}}
        - --logtostderr=true
        - --v=9
In the preceding example, a linear scaling policy is used. The number of CoreDNS pods is calculated based on the following formula: Replicas (pods) = Max (Ceil (Cores × 1/coresPerReplica), Ceil (Nodes × 1/nodesPerReplica)). The number of CoreDNS pods is subject to the values of max and min in the linear scaling policy. The following code block shows the parameters of the linear scaling policy:
{
      "coresPerReplica": 64,
      "nodesPerReplica": 8,
      "min": 2,
      "max": 100,
      "preventSinglePointFailure": true
}

Use HPA to increase the number of CoreDNS pods based on CPU utilization

Horizontal Pod Autoscaler (HPA) frequently triggers scale-in activities for CoreDNS pods. We recommend that you do not use HPA. If HPA is required in specific scenarios, you can refer to the following policy configurations based on CPU utilization:
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: coredns-hpa
  namespace: kube-system
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: coredns
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 50
Note For more information about how to use HPA, see HPA.

Properly configure CoreDNS

ACK provides only the default settings for CoreDNS. You can modify the parameters to optimize the settings and enable CoreDNS to provide normal DNS services for your client pods. You can modify the configurations of CoreDNS on demand. For more information, see Configure DNS resolution and CoreDNS official documentation.

The default configurations of earlier CoreDNS versions that are deployed together with your Kubernetes clusters may pose risks. We recommend that you check and optimize the configurations by using the following methods:

Disable the affinity settings of the kube-dns Service

The affinity settings may cause CoreDNS pods to handle different loads. To disable the affinity settings, perform the following steps:
  • Use the ACK console
    1. Log on to the ACK console.
    2. In the left-side navigation pane, click Clusters.
    3. On the Clusters page, find the cluster that you want to manage and click its name or click Details in the Actions column.
    4. In the left-side navigation pane of the details page, choose Network > Services.
    5. In the kube-system namespace, find the kube-dns Service and click View in YAML in the Actions column.
      • If the value of the sessionAffinity field is None, skip the following steps.
      • If the value of the sessionAffinity field is ClientIP, perform the following steps.
    6. Delete sessionAffinity, sessionAffinityConfig, and all of the subfields. Then, click Update.
      #Delete the following content. 
      sessionAffinity: ClientIP
        sessionAffinityConfig:
          clientIP:
            timeoutSeconds: 10800
    7. Find the kube-dns Service and click View in YAML in the Actions column again to check whether the value of the sessionAffinity field is None. If the value is None, the kube-dns Service is modified.
  • Use the CLI
    1. Run the following command to query the configurations of the kube-dns Service:
      kubectl -n kube-system get svc kube-dns -o yaml
      • If the value of the sessionAffinity field is None, skip the following steps.
      • If the value of the sessionAffinity field is ClientIP, perform the following steps.
    2. Run the following command to modify the kube-dns Service:
      kubectl -n kube-system edit service kube-dns
    3. Delete all of the fields that are related to sessionAffinity, including sessionAffinity, sessionAffinityConfig, and all of the subfields. Then, save the change and exit.
      #Delete the following content. 
      sessionAffinity: ClientIP
        sessionAffinityConfig:
          clientIP:
            timeoutSeconds: 10800
    4. After you modify the kube-dns Service, run the following command again to check whether the value of the sessionAffinity field is None. If the value is None, the kube-dns Service is modified.
      kubectl -n kube-system get svc kube-dns -o yaml

Disable the autopath plug-in

The autopath plug-in is enabled for CoreDNS of earlier versions and may cause DNS resolution errors in specific scenarios. If the autopath plug-in is enabled, you must disable the plug-in in the coredns ConfigMap. For more information, see #3765.

Note After you disable the autopath plug-in, the number of DNS queries sent from the client per second is increased by three times at most. Therefore, the amount of time required to resolve a domain name is also increased by three times at most. You must pay close attention to the load on CoreDNS and the impacts on your business.
  1. Run the kubectl -n kube-system edit configmap coredns command to modify the coredns ConfigMap.
  2. Delete autopath @kubernetes. Then, save the change and exit.
  3. Check the status and logs of the CoreDNS pods. If the log data contains the reload keyword, the new configuration is loaded.

Configure graceful shutdown for CoreDNS

Note ACK may consume additional memory resources when it updates the coredns ConfigMap. After you modify the coredns ConfigMap, check the status of the CoreDNS pods. If the memory resources of the pods are exhausted, change the memory limit of pods in the CoreDNS Deployment. We recommend that you change the memory limit to 2 GB.
  • Use the ACK console
    1. Log on to the ACK console.
    2. In the left-side navigation pane, click Clusters.
    3. On the Clusters page, find the cluster that you want to manage and click its name or click Details in the Actions column.
    4. In the left-side navigation pane of the details page, choose Configurations > ConfigMaps.
    5. Select the kube-system namespace. Find the coredns ConfigMap and click Edit YAML in the Actions column.
    6. Refer to the following YAML content and make sure that the health plug-in is enabled. Then, set lameduck to 15s and click OK.
      .:53 {
              errors       
              #The setting of the health plug-in may vary based on the CoreDNS version. 
              #Scenario 1: The health plug-in is disabled by default.    
              #Scenario 2: The health plug-in is enabled by default but lameduck is not set. 
              health      
              #Scenario 3: The health plug-in is enabled by default and lameduck is set to 5s.    
              health {
                  lameduck 5s
              }      
              #In the preceding scenarios, change the value of lameduck to 15s. 
              health {
                  lameduck 15s
              }       
              #You do not need to modify other plug-ins. 
          }

    If the CoreDNS pods run as normal, CoreDNS can be gracefully shut down. If the CoreDNS pods do not run as normal, you can check the pod events and log to identify the cause.

  • Use the CLI
    1. Run the following command to open the coredns ConfigMap:
      kubectl -n kube-system edit configmap coredns
    2. Refer to the following YAML content and make sure that the health plug-in is enabled. Then, set lameduck to 15s.
      .:53 {
              errors     
              #The setting of the health plug-in may vary based on the CoreDNS version. 
              #Scenario 1: The health plug-in is disabled by default.      
              # Scenario 2: The health plug-in is enabled by default but lameduck is not set. 
              health
              #Scenario 3: The health plug-in is enabled by default and lameduck is set to 5s.    
              health {
                  lameduck 5s
              }
              #In the preceding scenarios, change the value of lameduck to 15s. 
              health {
                  lameduck 15s
              }
              #You do not need to modify other plug-ins. 
          }
    3. After you modify the coredns ConfigMap, save the change and exit.

      If the CoreDNS pods run as normal, CoreDNS can be gracefully shut down. If the CoreDNS pods do not run as normal, you can check the pod events and log to identify the cause.

Configure the default protocol for the forward plug-in and upstream DNS servers of VPC

NodeLocal DNSCache uses the TCP protocol to communicate with CoreDNS. CoreDNS communicates with the upstream DNS servers based on the protocol used by the source of DNS queries. Therefore, DNS queries sent from a client pod for external domain names pass through NodeLocal DNSCache and CoreDNS, and then arrive at the DNS servers in the VPC over TCP. The IP addresses of the DNS servers are 100.100.2.136 and 100.100.2.13. These IP addresses are automatically configured on the Elastic Compute Service (ECS) instances.

DNS servers in a VPC have limited support for TCP. If you use NodeLocal DNSCache, you must modify the configurations of CoreDNS and enable CoreDNS to use UDP for communication with the upstream DNS servers. This prevents DNS resolution issues. We recommend that you modify the ConfigMap of CoreDNS based on the following modifications. The ConfigMap is named coredns and belongs to the kube-system namespace. Modify the setting of the forward plug-in and set the protocol that is used to request upstream servers to prefer_udp. This way, CoreDNS preferentially uses the UDP protocol to communicate with the upstream DNS servers. You can modify the setting based on the following modifications:
#The original settings.
forward . /etc/resolv.conf
#The modified settings.
forward . /etc/resolv.conf {
  prefer_udp
}