DNS is one of the most important basic services in a Kubernetes cluster. DNS resolution timeouts and failures may occur because of improper client settings or in large-scale clusters. This topic describes best practices for DNS in Kubernetes clusters to help you prevent these issues.
Notes
This topic does not apply to ACK clusters that use the managed edition of CoreDNS or have the intelligent hosting mode enabled. The managed edition of CoreDNS automatically performs elastic scaling based on the load and does not require manual adjustments.
Contents
DNS best practices cover both the client and server sides:
On the client side, you can reduce resolution latency by optimizing domain name resolution requests. You can also reduce resolution failures using proper container images, proper node operating systems, and NodeLocal DNSCache.
On the CoreDNS server side, you can identify DNS exceptions and quickly locate their root causes by monitoring the running status of CoreDNS. You can also improve the high availability and queries per second (QPS) throughput of CoreDNS in the cluster by properly adjusting the deployment of CoreDNS.
For more information about CoreDNS, see the official CoreDNS documentation.
Optimize domain name resolution requests
Domain name resolution requests are one of the most frequent network behaviors in Kubernetes. Many of these requests can be optimized or avoided. You can optimize domain name resolution requests in the following ways:
(Recommended) Use a connection pool: If a container application needs to frequently request another service, we recommend that you use a connection pool. A connection pool caches links to upstream services in memory, which avoids the overhead of domain name resolution and TCP connections for each access.
Use an asynchronous or long polling pattern to obtain the IP address corresponding to a domain name.
Use a DNS cache:
(Recommended) If your application cannot be modified to connect to another service using a connection pool, you can cache DNS resolution results on the application side. For more information, see Use NodeLocal DNSCache.
If you cannot use NodeLocal DNSCache, you can use the built-in Name Service Cache Daemon (NSCD) in the container. For more information about how to use NSCD, see Use NSCD in a Kubernetes cluster.
Optimize the resolv.conf file: The mechanisms of the ndots and search parameters in the resolv.conf file mean that different ways of configuring domain names in a container determine the efficiency of domain name resolution. For more information about the mechanisms of the ndots and search parameters, see Configure DNS policies and resolve domain names.
Optimize domain name configurations: If an application in a container needs to access a domain name, configure the domain name according to the following principles to minimize the number of resolution attempts and reduce the time consumed.
If a pod accesses a service in the same namespace, we recommend that you use the
<service-name>format, whereservice-nameis the name of the service.If a pod accesses a service in a different namespace, we recommend that you use the
<service-name>.<namespace-name>format, wherenamespace-nameis the namespace where the service resides.If a pod accesses a domain name outside the cluster, use a fully qualified domain name (FQDN) to prevent multiple unnecessary lookups caused by appending
searchdomains. An FQDN is specified by adding a trailing period (.) to the domain name. For example, to access www.aliyun.com, use the FQDN www.aliyun.com..In clusters of version 1.33 or later, you can configure the search domain as a single "." to achieve a similar effect. For more information, see issue 125883 on GitHub:
dnsPolicy: None dnsConfig: nameservers: ["192.168.0.10"] ## The IP address 192.168.0.10 is the clusterIP of the coredns service. You need to change it based on your actual environment. searches: - . - default.svc.cluster.local ## Note: You need to change default to the actual namespace name. - svc.cluster.local - cluster.localAfter you apply the preceding configuration, the /etc/resolv.conf file in the pod is as follows:
search . default.svc.cluster.local svc.cluster.local cluster.local nameserver 192.168.0.10As you can see, "." is the first search domain. This way, the domain name request always considers the target domain name to be an FQDN and first attempts to resolve the domain name itself, which avoids invalid searches.
ImportantNote that you must set
dnsPolicytoNonefor the preceding configuration to take effect.
Understand DNS configurations in containers
Different DNS resolvers may have minor differences in their implementations. You may encounter cases where `dig <domain name>` works but `ping <domain name>` fails.
We recommend that you do not use Alpine as the base image. The musl libc library built into the Alpine container image has some differences from the standard glibc implementation, which can cause issues such as the following. You can use other base images, such as Debian or CentOS.
Alpine 3.18 and earlier do not support fallback to the TCP protocol for truncated responses.
In Alpine 3.3 and earlier, the search parameter is not supported. This means search domains are not supported and service discovery cannot be completed.
Concurrent requests to multiple DNS servers configured in /etc/resolv.conf cause NodeLocal DNSCache optimizations to fail.
Concurrently requesting A and AAAA records using the same socket triggers conntrack source port conflicts on older kernels, which causes packet loss.
For more information about these issues, see the musl libc documentation.
If you are using a Go application, you should understand the differences between DNS resolvers in CGO and Pure Go implementations.
Avoid intermittent DNS resolution timeouts caused by IPVS bugs
If you use IPVS as the kube-proxy load balancing mode, you may encounter intermittent DNS resolution timeouts when CoreDNS is scaled in or restarted. This issue is caused by a Linux kernel bug. For more information, see the IPVS documentation.
You can use one of the following methods to mitigate the impact of the IPVS bug:
Use NodeLocal DNSCache. For more information, see Use NodeLocal DNSCache.
Modify the IPVS UDP session persistence timeout in kube-proxy. For more information, see How do I modify the IPVS UDP session persistence timeout in kube-proxy?.
Use NodeLocal DNSCache
In some cases, CoreDNS may have the following issues:
In rare cases, packet loss may occur because of concurrent A and AAAA queries, which leads to DNS resolution failures.
The node conntrack table is full, which causes packet loss and DNS resolution failures.
To improve the stability and performance of DNS services in the cluster, we recommend that you install the NodeLocal DNSCache component. This component improves the DNS performance of the cluster by running a DNS cache on the cluster nodes. For more information about NodeLocal DNSCache and how to deploy it in an ACK cluster, see Use the NodeLocal DNSCache component.
After you install NodeLocal DNSCache, you must inject the DNS cache configuration into pods. You can run the following command to add a label to a specified namespace. The DNS cache configuration is then automatically injected into new pods created in this namespace. For more information about other injection methods, see the referenced document.
kubectl label namespace default node-local-dns-injection=enabledUse a proper CoreDNS version
CoreDNS provides good backward compatibility with Kubernetes versions. We recommend that you keep CoreDNS updated to the latest stable version. The Component Management page in the ACK console lets you install, upgrade, and configure CoreDNS. You can check the status of the CoreDNS component on the Component Management page. If an upgrade is available for the CoreDNS component, upgrade it during off-peak hours.
For more information about how to upgrade, see Automatically upgrade CoreDNS.
For more information about the release notes of CoreDNS, see CoreDNS.
CoreDNS versions earlier than v1.7.0 have potential risks, such as the following:
If the connectivity between CoreDNS and the API server is abnormal (for example, the API server is restarted or migrated, or network jitter occurs), CoreDNS restarts because it fails to write error logs. For more information, see Set klog's logtostderr flag.
CoreDNS consumes extra memory at startup. The default memory limit may trigger out-of-memory (OOM) issues in large-scale clusters. In severe cases, CoreDNS pods may repeatedly restart and fail to recover automatically. For more information, see CoreDNS uses a lot memory during initialization phase.
CoreDNS has several issues that may affect the resolution of headless service domain names and domain names outside the cluster. For more information, see plugin/kubernetes: handle tombstones in default processor and Data is not synced when CoreDNS reconnects to kubernetes api server after protracted disconnection.
If a cluster node is abnormal, the default toleration policy used by some earlier versions of CoreDNS may cause CoreDNS pods to be deployed on the abnormal node. These pods cannot be automatically evicted, which leads to abnormal domain name resolution.
The recommended minimum CoreDNS version varies based on the Kubernetes cluster version, as shown in the following table.
Cluster version | Recommended minimum CoreDNS version |
Earlier than 1.14.8 | v1.6.2 (No longer maintained) |
1.14.8 or later, earlier than 1.20.4 | v1.7.0.0-f59c03d-aliyun |
1.20.4 or later, earlier than 1.21.0 | v1.8.4.1-3a376cc-aliyun |
1.21.0 or later | v1.11.3.2-f57ea7ed6-aliyun |
Monitor the running status of CoreDNS
Metrics
CoreDNS exposes health metrics, such as resolution results, through a standard Prometheus interface to detect exceptions on the CoreDNS server and even upstream DNS servers.
Prometheus for Alibaba Cloud provides built-in metrics monitoring and alert rules for CoreDNS. You can enable the Prometheus and dashboard features in the Container Service for Kubernetes (ACK) console. For more information, see Monitor the CoreDNS component.
If you use a self-managed Prometheus instance to monitor your Kubernetes cluster, you can observe relevant metrics in Prometheus and set alerts for key metrics. For more information, see the official CoreDNS Prometheus documentation.
Running logs
In the event of a DNS exception, CoreDNS logs can help you quickly diagnose the root cause. We recommend that you enable CoreDNS domain name resolution logs and their SLS log collection. For more information, see Analyze and monitor CoreDNS logs.
Kubernetes event delivery
In CoreDNS v1.9.3.6-32932850-aliyun and later, you can enable the k8s_event plugin to deliver key CoreDNS logs to Event Hub as Kubernetes events. For more information about the k8s_event plugin, see k8s_event.
This feature is enabled by default for newly deployed CoreDNS. If you upgrade from an earlier version of CoreDNS to v1.9.3.6-32932850-aliyun or later, you must manually modify the configuration file to enable this feature.
Run the following command to open the CoreDNS configuration file.
kubectl -n kube-system edit configmap/corednsAdd the kubeAPI and k8s_event plugins.
apiVersion: v1 data: Corefile: | .:53 { errors health { lameduck 15s } // Add the following content (ignore other differences). kubeapi k8s_event { level info error warning // Deliver key logs with the info, error, and warning statuses. } // End of added content. kubernetes cluster.local in-addr.arpa ip6.arpa { pods verified fallthrough in-addr.arpa ip6.arpa } // The following content is omitted. }Check the running status and logs of the CoreDNS pod. If the logs contain the word
reload, the modification is successful.
Ensure the high availability of CoreDNS
CoreDNS is the authoritative DNS in the cluster. A CoreDNS failure will cause access to services within the cluster to fail, which may lead to widespread service unavailability. You can take the following measures to ensure the high availability of CoreDNS:
Evaluate the pressure on the CoreDNS component
You can perform a DNS stress test in the cluster to evaluate the load on the component. Many open source tools, such as DNSPerf, can help you achieve this. If you cannot accurately evaluate the DNS load in the cluster, you can refer to the following recommended standards.
We recommend that you set the number of CoreDNS pods to at least 2. The resource limit of a single pod should not be less than 1 core and 1 GB.
The domain name resolution queries per second (QPS) that CoreDNS can provide is positively correlated with CPU consumption. If NodeLocal DNSCache is used, each CPU core can support more than 10,000 QPS for domain name resolution requests. The QPS requirements for domain name requests vary greatly among different types of services. You can observe the peak CPU usage of each CoreDNS pod. If it occupies more than one CPU core during peak business hours, we recommend that you scale out the CoreDNS replicas. If you cannot determine the peak CPU usage, you can conservatively deploy pods at a ratio of 1 pod to 8 cluster nodes. That is, for every 8 cluster nodes added, one CoreDNS pod is added.
Adjust the number of CoreDNS pods
The number of CoreDNS pods directly determines the computing resources that CoreDNS can use. You can adjust the number of CoreDNS pods based on the evaluation results.
Because the UDP protocol lacks a retransmission mechanism, if there is a risk of packet loss on a cluster node due to the IPVS UDP bug, scaling in or restarting CoreDNS pods may cause domain name resolution timeouts or exceptions for the entire cluster for up to five minutes. For more information about the solution to resolution exceptions caused by the IPVS bug, see Troubleshoot DNS resolution failures.
Automatically adjust based on the recommended policy
You can deploy the
dns-autoscaler. It automatically adjusts the number of CoreDNS pods based on a recommended policy, such as one pod for every eight cluster nodes. The number of replicas is calculated using the formula: replicas = max(ceil(cores × 1/coresPerReplica), ceil(nodes × 1/nodesPerReplica)), and is constrained by themaxandminvalues.Manually adjust
You can run the following command to manually adjust the number of CoreDNS pods.
kubectl scale --replicas={target} deployment/coredns -n kube-system # Replace target with the target number of pods.Do not use workload autoscaling
Although workload autoscaling, such as horizontal pod autoscaling (HPA) and CronHPA, can also automatically adjust the number of pods, they perform frequent scaling operations. Because of the resolution exceptions caused by pod scale-in mentioned earlier, do not use workload autoscaling to control the number of CoreDNS pods.
Adjust CoreDNS pod specifications
Another way to adjust CoreDNS resources is to adjust pod specifications. In an ACK Pro cluster, the default memory limit for a CoreDNS pod is 2 GiB, and there is no CPU limit. We recommend that you set the CPU limit to 4096m, and the minimum value should not be less than 1024m. You can adjust the CoreDNS pod configuration in the console.
Schedule CoreDNS pods
Incorrect scheduling configurations may cause CoreDNS pods to fail to be deployed, which results in a CoreDNS failure. Before you perform this operation, make sure that you are familiar with scheduling.
We recommend that you deploy CoreDNS pods on different cluster nodes in different zones to avoid single-node and single-zone failures. CoreDNS components earlier than v1.8.4.3 are configured with weak anti-affinity by node by default. This may cause some or all pods to be deployed on the same node because of insufficient node resources. If this occurs, you can delete the pods to trigger a scheduling adjustment, or upgrade the component to the latest version. CoreDNS component versions earlier than v1.8 are no longer maintained. We recommend that you upgrade them as soon as possible.
The cluster nodes where CoreDNS runs should not have full CPU and memory usage. Otherwise, the QPS and response latency of domain name resolution will be affected. If cluster node conditions permit, you can consider using custom parameters to schedule CoreDNS to independent cluster nodes to provide stable domain name resolution services.
Optimize CoreDNS configurations
ACK provides only the default configurations for CoreDNS. You should pay attention to the parameters in the configuration and optimize them to ensure that CoreDNS can provide DNS services for your application containers. The CoreDNS configuration is highly flexible. For more information, see Configure DNS policies and resolve domain names and the official CoreDNS documentation.
The default CoreDNS configuration deployed with earlier versions of Kubernetes clusters may have some risks. We recommend that you check and optimize the configuration as follows:
You can also check the CoreDNS configuration file using the scheduled inspection and fault diagnosis features of AIOps. If the inspection result from AIOps indicates that the CoreDNS ConfigMap configuration is abnormal, check the preceding items one by one.
CoreDNS may consume extra memory when it refreshes the configuration. After you modify the CoreDNS configuration items, observe the running status of the pod. If the pod has insufficient memory, modify the container memory limit in the CoreDNS deployment in a timely manner. We recommend that you adjust the memory to 2 GB.
Disable the affinity configuration of the kube-dns service
The affinity configuration may cause large load differences between CoreDNS replicas. We recommend that you disable it by following these steps:
Use the console
Log on to the ACK console. In the navigation pane on the left, click Clusters.
On the Clusters page, find the cluster you want and click its name. In the left-side pane, choose .
In the kube-system namespace, click Edit YAML for the kube-dns service.
If the value of the sessionAffinity field is
None, you do not need to perform the following steps.If the value of the sessionAffinity field is
ClientIP, perform the following steps.
Delete the sessionAffinity and sessionAffinityConfig fields and all their sub-keys. Then, click Update.
# Delete all of the following content. sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 10800Click Edit YAML to the right of the kube-dns service again and check whether the sessionAffinity field is
None. If the value isNone, the kube-dns service has been successfully modified.
Use the command line
Run the following command to view the configuration information of the kube-dns service.
kubectl -n kube-system get svc kube-dns -o yamlIf the value of the sessionAffinity field is
None, you do not need to perform the following steps.If the value of the sessionAffinity field is
ClientIP, perform the following steps.
Run the following command to open and edit the service named kube-dns.
kubectl -n kube-system edit service kube-dnsDelete the sessionAffinity-related settings (sessionAffinity, sessionAffinityConfig, and all their sub-keys), and save and exit.
# Delete all of the following content. sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 10800After the modification, run the following command again to check whether the value of the sessionAffinity field is
None. If the value isNone, the kube-dns service is successfully changed.kubectl -n kube-system get svc kube-dns -o yaml
Disable the Autopath plugin
Some earlier versions of CoreDNS have the Autopath plugin enabled. This plugin may cause incorrect resolution results in some extreme scenarios. You should check whether it is enabled and edit the configuration file to disable it. For more information, see Autopath.
After you disable the Autopath plugin, the QPS of domain name resolution requests from the client can increase by up to three times, and the time consumed to resolve a single domain name can also increase by up to three times. Pay attention to the CoreDNS load and business impact.
Run the
kubectl -n kube-system edit configmap corednscommand to open the CoreDNS configuration file.Delete the
autopath @kubernetesline and save and exit.Check the running status and logs of the CoreDNS pod. If the logs contain the word
reload, the modification is successful.
Configure graceful shutdown for CoreDNS
lameduck is a mechanism in CoreDNS that implements graceful shutdown. If CoreDNS needs to be stopped or restarted, this mechanism ensures that requests being processed can be completed normally without sudden interruption. The lameduck mechanism works as follows:
If the CoreDNS process is about to terminate, it enters lameduck mode.
In
lameduckmode, CoreDNS stops accepting new requests but continues to process received requests until all requests are completed or thelameducktimeout period is exceeded.
Use the console
Log on to the ACK console. In the navigation pane on the left, click Clusters.
On the Clusters page, click the name of the one you want to change. In the left-side navigation pane, choose .
In the kube-system namespace, click Edit YAML for the coredns ConfigMap.
In the following CoreDNS configuration file, ensure that the health plugin is enabled, set the lameduck timeout to
15 s, and then click OK.
.:53 {
errors
# The health plugin may have different settings in different CoreDNS versions.
# Scenario 1: The health plugin is not enabled by default.
# Scenario 2: The health plugin is enabled by default, but the lameduck time is not set.
# health
# Scenario 3: The health plugin is enabled by default, and the lameduck time is set to 5s.
# health {
# lameduck 5s
# }
# For the preceding three scenarios, you should uniformly modify the configuration as follows and adjust the lameduck parameter to 15s.
health {
lameduck 15s
}
# Other plugins do not need to be modified and are omitted here.
}If the CoreDNS pod runs normally, the graceful shutdown configuration of CoreDNS is successfully updated. If the CoreDNS pod is abnormal, you can locate the cause by viewing the pod events and logs.
Use the command line
Run the following command to open the CoreDNS configuration file.
Refer to the following Corefile, make sure that the
healthplugin is enabled, and adjust the lameduck parameter to15s.After you modify the CoreDNS configuration file, save and exit.
If CoreDNS runs normally, the graceful shutdown configuration of CoreDNS is successfully updated. If the CoreDNS pod is abnormal, you can locate the cause by viewing the pod events and logs.
kubectl -n kube-system edit configmap/coredns.:53 {
errors
# The health plugin may have different settings in different CoreDNS versions.
# Scenario 1: The health plugin is not enabled by default.
# Scenario 2: The health plugin is enabled by default, but the lameduck time is not set.
# health
# Scenario 3: The health plugin is enabled by default, and the lameduck time is set to 5s.
# health {
# lameduck 5s
# }
# For the preceding three scenarios, you should uniformly modify the configuration as follows and adjust the lameduck parameter to 15s.
health {
lameduck 15s
}
# Other plugins do not need to be modified and are omitted here.
}Configure the default protocol for the forward plugin and upstream VPC DNS servers
NodeLocal DNSCache uses the TCP protocol to communicate with CoreDNS. CoreDNS communicates with upstream DNS servers using the protocol that is used by the source of the request. Therefore, by default, requests to resolve domain names outside the cluster from application containers pass through NodeLocal DNSCache and CoreDNS, and are finally sent over TCP to the VPC DNS servers. The VPC DNS servers are the two IP addresses, 100.100.2.136 and 100.100.2.138, that are configured by default on ECS instances.
The VPC DNS servers have limited support for the TCP protocol. If you use NodeLocal DNSCache, you must modify the CoreDNS configuration to prioritize the UDP protocol for communication with upstream DNS servers to avoid resolution failures. We recommend that you modify the CoreDNS configuration by modifying the ConfigMap named coredns in the kube-system namespace. For more information, see Manage ConfigMaps. Specify prefer_udp as the protocol for requesting the upstream in the forward plugin. After the modification, CoreDNS prioritizes the UDP protocol for communication with the upstream. The modification is as follows:
# Before modification
forward . /etc/resolv.conf
# After modification
forward . /etc/resolv.conf {
prefer_udp
}Configure the ready plugin for readiness probes
For CoreDNS versions 1.5.0 and later, you must configure the ready plugin to enable readiness probes.
Run the following command to open the CoreDNS configuration file.
kubectl -n kube-system edit configmap/corednsCheck whether the file contains the
readyline. If not, add thereadyline. Press the Esc key, enter:wq!, and then press Enter to save the modified configuration file and exit edit mode.apiVersion: v1 data: Corefile: | .:53 { errors health { lameduck 15s } ready # If this line does not exist, add it. Note that the indent must be consistent with Kubernetes. kubernetes cluster.local in-addr.arpa ip6.arpa { pods verified fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf { max_concurrent 1000 prefer_udp } cache 30 loop log reload loadbalance }Check the running status and logs of the CoreDNS pod. If the logs contain the word
reload, the modification is successful.
Configure the multisocket plugin to enhance CoreDNS parsing performance
CoreDNS v1.12.1 introduced the multisocket plugin. Enabling this plugin allows CoreDNS to listen on the same port using multiple sockets, which enhances CoreDNS performance in scenarios with high CPU usage. For a detailed introduction to the plugin, see the community documentation.
You need to enable multisocket through the coredns ConfigMap:
.:53 {
...
prometheus :9153
multisocket [NUM_SOCKETS]
forward . /etc/resolv.conf
...
}NUM_SOCKETS determines the number of sockets listening on the same port.
Configuration recommendation: Align the value of NUM_SOCKETS with the estimated CPU utilization, CPU resource limits, and available cluster resources. For example:
If CoreDNS consumes 4 cores at its peak and the available resources are 8 cores, set
NUM_SOCKETSto 2.If CoreDNS consumes 8 cores at its peak and the available resources are 64 cores, set
NUM_SOCKETSto 8.
To determine the optimal configuration, we recommend that you test the QPS and load with different configurations.
If you do not specify NUM_SOCKETS, GOMAXPROCS is used by default, which is equal to the CPU limit of the CoreDNS pod. If the pod CPU limit is not set, it is equal to the number of CPU cores on the node where the pod resides.

