All Products
Search
Document Center

Container Service for Kubernetes:Deployment recommendations for the stability and high performance of unmanaged CoreDNS

Last Updated:Dec 04, 2025

CoreDNS is the default DNS server in a cluster. It handles all domain name resolution for both in-cluster Services and external domains. If CoreDNS is unavailable, other services in the cluster are severely affected. The performance and availability requirements for CoreDNS vary by scenario, and the default configurations are not suitable for all scenarios. This topic provides recommendations on how to configure the CoreDNS component based on your business scenario.

Impact assessment

In unmanaged mode, CoreDNS is treated like any other workload in the cluster. Its availability and performance depend on factors such as the number of pods, resource limits, scheduling policies, and node distribution.

High loads or improper configurations directly affect the quality of DNS services in the cluster. CoreDNS may face two main types of problems:

  • Availability issues:

    • Incorrect configurations can prevent node-level and zone-level high availability, which creates a risk of a single point of failure.

    • Insufficient resources can cause pods to be evicted, which leads to service interruptions.

  • Performance issues:

    • Resource contention with other workloads on the same node increases response latency.

    • High node load causes network I/O packet loss, which leads to failed DNS requests.

Adjust the number of CoreDNS pods

Important
  • Because UDP packets lack a retransmission mechanism, if UDP packet loss caused by IPVS defects occurs on cluster nodes, scaling in or restarting a CoreDNS pod can cause cluster-wide domain name resolution timeouts or failures that last for up to five minutes. For more information, see Troubleshooting DNS Resolution Issues.

  • Do not use workload autoscaling: Workload autoscaling features, such as horizontal pod autoscaling (HPA) and CronHPA, can automatically adjust the number of pods. However, they frequently perform scale-out and scale-in operations. Because scaling in pods can cause resolution failures, do not use workload autoscaling to control the number of CoreDNS pods.

Assess component pressure

Many open source tools, such as DNSPerf, can assess the overall DNS pressure within a cluster. If you cannot accurately assess the pressure, use the following guidelines:

  • In all cases, set the number of CoreDNS pods to at least 2. The resource limit for a single pod should be at least 1 CPU core and 1 GB of memory.

  • The number of queries per second (QPS) for domain name resolution that CoreDNS can provide is directly proportional to its CPU consumption. When NodeLocal DNSCache is used, each CPU core can support over 10,000 QPS for domain name resolution requests. The QPS requirements for domain name requests vary greatly among different business types. Observe the peak CPU usage of each CoreDNS pod. If a pod uses more than one CPU core during peak business hours, scale out the CoreDNS replicas. If you cannot determine the peak CPU usage, you can conservatively deploy one CoreDNS pod for every eight cluster nodes.

After you complete the evaluation, see Configure automatic adjustments (recommended) or Adjust manually to make the adjustments.

Configure automatic adjustments (recommended)

The cluster-proportional-autoscaler component automatically adjusts the number of CoreDNS pods in real time based on the recommended policy of one pod for every eight cluster nodes. Compared to HPA, it does not depend on CoreDNS CPU load metrics. It is suitable for services where the required number of replicas is proportional to the cluster size.

In the example, the formula to calculate the number of replicas is `replicas = max(ceil(cores × 1/coresPerReplica), ceil(nodes × 1/nodesPerReplica))`. The `min` and `max` parameters limit the number of pods to a minimum of 2 and a maximum of 100.

cluster-proportional-autoscaler

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-autoscaler
  namespace: kube-system
  labels:
    k8s-app: dns-autoscaler
spec:
  selector:
    matchLabels:
      k8s-app: dns-autoscaler
  template:
    metadata:
      labels:
        k8s-app: dns-autoscaler
    spec:
      serviceAccountName: admin
      containers:
      - name: autoscaler
        image: registry.cn-hangzhou.aliyuncs.com/acs/cluster-proportional-autoscaler:1.8.4
        resources:
          requests:
            cpu: "200m"
            memory: "150Mi"
        command:
        - /cluster-proportional-autoscaler
        - --namespace=kube-system
        - --configmap=dns-autoscaler
        - --nodelabels=type!=virtual-kubelet
        - --target=Deployment/coredns
        - --default-params={"linear":{"coresPerReplica":64,"nodesPerReplica":8,"min":2,"max":100,"preventSinglePointFailure":true}}
        - --logtostderr=true
        - --v=9

Adjust manually

You can also manually adjust the number of CoreDNS pods using the following command.

kubectl scale --replicas=<target> deployment/coredns -n kube-system # Replace <target> with the target number of pods.

Adjust CoreDNS pod specifications

In an ACK Pro cluster, the default memory limit for a CoreDNS pod is 2 GiB, and there is no CPU limit. Set the CPU limit to 4096m, with a minimum value of 1024m. You can adjust the CoreDNS pod configuration in the console.

Important

Adjusting CoreDNS pod specifications might cause the pods to restart. This can cause occasional DNS latency timeouts and resolution failures in the cluster. Perform this operation during off-peak hours.

  1. Log on to the ACK console. In the navigation pane on the left, choose Clusters.

  2. On the Clusters page, click the name of the target cluster. In the navigation pane on the left, click Add-ons.

  3. Click the Networking tab and find the CoreDNS card. On the card, click Configuration.

    image

  4. Modify the CoreDNS configuration and click OK.

    image

Deploy on a dedicated node pool

Deploy the CoreDNS component pods to a dedicated node pool. This isolates the CoreDNS workload from other applications in the cluster and protects it from resource preemption.

Important

Scheduling CoreDNS pods to a dedicated node pool might cause the pods to restart. This can cause occasional DNS latency timeouts and resolution failures in the cluster. Perform this operation during off-peak hours.

Create a dedicated node pool for CoreDNS

Create a dedicated node pool for the CoreDNS pods. Note the following:

  • CoreDNS does not require high computing resources, but it does require high network performance. Select network-enhanced instances. A specification of 4 CPU cores and 8 GB of memory is recommended.

  • By default, CoreDNS deploys two pods. The node pool requires at least two nodes.

  • The node pool requires taints and labels to prevent other pods from being scheduled to it. For example, you can add the system-addon: system-addon key-value pair as both a taint and a label. Set the taint Effect to NoSchedule. You will use the taint and label in the next step.

For more information, see Create and manage node pools.

Configure the CoreDNS component to schedule pods

  1. On the Add-ons page, find the CoreDNS card and click Configuration.

  2. In the NodeSelector section, add the label of the dedicated node pool.

    Do not delete existing NodeSelector labels.

    image.png

  3. In the Tolerations section, add a toleration that corresponds to the taint of the dedicated node pool.

    image.png

  4. Click OK to save the component configuration. Then, run the following command to confirm that the CoreDNS pods are scheduled to the dedicated node pool.

    kubectl -n kube-system get pod -o wide --show-labels | grep coredns

Use scheduling policies to achieve high availability for CoreDNS

Important

The node affinity, pod anti-affinity, and topology-aware scheduling policies for CoreDNS take effect only during deployment. If the node or zone configurations change, go to the ACK console, find the coredns deployment, and click Redeploy to ensure the CoreDNS pods are distributed for high availability.

Pod anti-affinity

By default, CoreDNS deploys two pods. It uses a pod anti-affinity policy to distribute the pods across different nodes. This achieves node-level disaster recovery. For the pod anti-affinity policy to be effective, ensure that at least two nodes are available in the cluster with sufficient resources based on the pod resource request. The nodes must not have the following labels:

  • k8s.aliyun.com: true (nodes with the node autoscaling feature enabled)

  • type: virtual-kubelet (virtual nodes)

  • alibabacloud.com/lingjun-worker: true (Lingjun nodes)

Default CoreDNS affinity configuration

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              # virtual nodes have this label
              - key: type
                operator: NotIn
                values:
                - virtual-kubelet
              # lingjun worker nodes have this label
              - key: alibabacloud.com/lingjun-worker
                operator: NotIn
                values:
                - "true" 
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              # autoscaled nodes have this label
              - key: k8s.aliyun.com
                operator: NotIn
                values:
                - "true"
            weight: 100
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: k8s-app
                operator: In
                values:
                - kube-dns
            topologyKey: kubernetes.io/hostname

Topology-aware scheduling

By default, CoreDNS uses topology-aware scheduling to distribute pods as evenly as possible across different zones. If the balance condition cannot be met, it denies scheduling (`DoNotSchedule`). This achieves zone-level disaster recovery. This mechanism has some limitations. To ensure that this mechanism works effectively, meet the following requirements:

  • Ensure that the nodes in your cluster are in at least two different zones. Each zone must have at least one node with sufficient resources where CoreDNS can be scheduled.

  • Ensure that the nodes have the correct and consistent topology.kubernetes.io/zone label, which they have by default. Otherwise, topology awareness may fail, or pods may be concentrated in a single zone.

  • Upgrade your cluster to v1.27 or later and upgrade CoreDNS to v1.12.1.3 or later. If the cluster version is earlier than v1.27, topology-aware scheduling does not support `matchLabelKeys`. When CoreDNS performs a rolling update, the final pod distribution might be uneven or might not cover all zones.

Default CoreDNS topology-aware scheduling policy

  • For CoreDNS versions earlier than v1.12.1.3, the topology-aware scheduling policy is as follows:

    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          k8s-app: kube-dns
      maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
    The difference in the number of pods between different zones does not exceed maxSkew, which is 1 by default.
    Important

    During a CoreDNS rolling update, the behavior of the topology spread constraint (topologySpreadConstraints) can cause an uneven pod distribution.

    The root cause is that the labelSelector counts pods from both the new and old ReplicaSets. To satisfy maxSkew=1, the scheduler prioritizes scheduling new pods to the zone with the fewest total pods (new and old combined). After the rolling update is complete and the old pods are destroyed, the new pods might be concentrated in only a few zones. This results in an uneven final distribution that might not cover all available zones.

  • To address this issue, Kubernetes v1.27 and later provide the matchLabelKeys feature to optimize the topology constraint configuration. Therefore, in CoreDNS v1.12.1.3 and later, the topology-aware scheduling policy is adjusted as follows:

    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          k8s-app: kube-dns
      matchLabelKeys:
      - pod-template-hash
      nodeTaintsPolicy: Honor
      maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule