CoreDNS is the default DNS server in an ACK cluster. It resolves all in-cluster Service names and external domains. Because every workload depends on DNS, a CoreDNS failure causes widespread service disruption across the cluster.
In unmanaged mode, CoreDNS runs like any other workload. Its reliability depends on pod count, resource limits, scheduling policies, and node distribution. The default configuration is suitable for small or low-traffic clusters. For larger or production workloads, tune these settings based on your scale and availability requirements.
Potential impacts
Misconfigured or under-resourced CoreDNS leads to two categories of problems:
-
Availability: incorrect scheduling configurations create node-level or zone-level single points of failure. Insufficient resource limits cause pod eviction, which interrupts DNS service.
-
Performance: resource contention with other workloads on the same node increases response latency. High node load causes network I/O packet loss, leading to failed DNS requests.
Adjust the number of CoreDNS pods
Because UDP packets lack a retransmission mechanism, scaling in or restarting a CoreDNS pod — especially when IPVS defects cause UDP packet loss on cluster nodes — can trigger cluster-wide domain name resolution timeouts or failures that last up to five minutes. For details, see Troubleshooting DNS resolution issues.
Do not use horizontal pod autoscaling (HPA) or CronHPA to manage CoreDNS pod count. These controllers frequently scale in pods, which causes resolution failures.
Assess component pressure
Before adjusting replica count, assess your cluster's DNS pressure. Tools such as DNSPerf can measure overall DNS load.
If you cannot measure DNS pressure directly, use these guidelines:
-
Run at least 2 CoreDNS pods, each with resource limits of at least 1 CPU core and 1 GB of memory.
-
With NodeLocal DNSCache enabled, each CPU core handles over 10,000 queries per second (QPS). Monitor peak CPU usage per pod. If any pod consistently uses more than 1 CPU core during peak hours, scale out.
-
Without load data, start with 1 CoreDNS pod per 8 cluster nodes as a conservative baseline.
Scaling out pods only helps when nodes have sufficient available resources. If cluster nodes are running low on memory, adding more pods won't resolve the problem — you need to add nodes or increase per-node resources instead.
Once you know your target replica count, use automatic adjustment (recommended) or scale manually.
Configure automatic adjustment (recommended)
The cluster-proportional-autoscaler component adjusts CoreDNS replica count in real time based on cluster size. Unlike HPA, it does not rely on CoreDNS CPU metrics and does not perform scale-in operations that could disrupt DNS. It targets 1 pod per 8 cluster nodes by default.
The replica count formula is: replicas = max(ceil(cores × 1/coresPerReplica), ceil(nodes × 1/nodesPerReplica)). The min and max parameters keep the pod count between 2 and 100.
Deploy the autoscaler with the following manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: dns-autoscaler
namespace: kube-system
labels:
k8s-app: dns-autoscaler
spec:
selector:
matchLabels:
k8s-app: dns-autoscaler
template:
metadata:
labels:
k8s-app: dns-autoscaler
spec:
serviceAccountName: admin
containers:
- name: autoscaler
image: registry.cn-hangzhou.aliyuncs.com/acs/cluster-proportional-autoscaler:1.8.4
resources:
requests:
cpu: "200m"
memory: "150Mi"
command:
- /cluster-proportional-autoscaler
- --namespace=kube-system
- --configmap=dns-autoscaler
- --nodelabels=type!=virtual-kubelet
- --target=Deployment/coredns
- --default-params={"linear":{"coresPerReplica":64,"nodesPerReplica":8,"min":2,"max":100,"preventSinglePointFailure":true}}
- --logtostderr=true
- --v=9
Scale manually
To set a specific replica count directly:
kubectl scale --replicas=<target> deployment/coredns -n kube-system # Replace <target> with the target number of pods.
Adjust CoreDNS pod specifications
In an ACK Pro cluster, the default CoreDNS pod configuration is:
| Resource | Default limit |
|---|---|
| CPU | No limit |
| Memory | 2 GiB |
Set the CPU limit to 4096m (minimum 1024m) based on your observed peak usage.
Changing CoreDNS pod specifications restarts the pods, which can cause brief DNS latency spikes or resolution failures. Perform this operation during off-peak hours.
-
Log on to the ACK console. In the left navigation pane, click ACK consoleClusters.
-
On the Clusters page, click the target cluster name. In the left navigation pane, click Add-ons.
-
Click the Networking tab, find the CoreDNS card, and click Configuration.

-
Modify the CoreDNS configuration and click OK.

Deploy on a dedicated node pool
Scheduling CoreDNS pods to a dedicated node pool isolates them from other workloads and prevents resource contention.
Rescheduling CoreDNS pods to a dedicated node pool restarts them, which can cause brief DNS latency spikes or resolution failures. Perform this operation during off-peak hours.
Create a dedicated node pool
When creating the node pool, follow these guidelines:
-
CoreDNS is network-intensive but not compute-intensive. Use network-enhanced instances with at least 4 CPU cores and 8 GB of memory.
-
The node pool must have at least 2 nodes, since CoreDNS runs 2 pods by default.
-
Add a taint and label to prevent other pods from being scheduled to these nodes. For example, use
system-addon: system-addonas both the taint key-value pair and label. Set the taintEffecttoNoSchedule.
For detailed steps, see Create and manage node pools.
Schedule CoreDNS pods to the node pool
-
On the Add-ons page, find the CoreDNS card and click Configuration.
-
In the NodeSelector section, add the label of the dedicated node pool.
Do not delete existing NodeSelector labels.

-
In the Tolerations section, add a toleration that matches the node pool's taint.

-
Click OK. Then run the following command to confirm that the CoreDNS pods are running on the dedicated node pool:
kubectl -n kube-system get pod -o wide --show-labels | grep coredns
Use scheduling policies for high availability
To protect DNS availability, CoreDNS uses two scheduling policies by default:
-
Pod anti-affinity (node-level): prevents two CoreDNS pods from running on the same node. If a node fails, DNS remains available on other nodes.
-
Topology-aware scheduling (zone-level): distributes CoreDNS pods across different availability zones. If pod anti-affinity alone is used and both pods land in the same zone, a zone outage still disrupts DNS. Topology-aware scheduling addresses this by enforcing cross-zone distribution.
These scheduling policies only take effect when pods are initially scheduled. If node or zone configurations change, go to the ACK console, find the coredns Deployment, and click Redeploy to redistribute the pods.
Pod anti-affinity
CoreDNS uses a requiredDuringSchedulingIgnoredDuringExecution pod anti-affinity rule to ensure no two CoreDNS pods share the same node. For this to work, your cluster needs at least 2 nodes with sufficient resource requests available, excluding:
-
k8s.aliyun.com: true— nodes with node autoscaling enabled -
type: virtual-kubelet— virtual nodes -
alibabacloud.com/lingjun-worker: true— Lingjun nodes
The default affinity configuration is:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
# virtual nodes have this label
- key: type
operator: NotIn
values:
- virtual-kubelet
# lingjun worker nodes have this label
- key: alibabacloud.com/lingjun-worker
operator: NotIn
values:
- "true"
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
# autoscaled nodes have this label
- key: k8s.aliyun.com
operator: NotIn
values:
- "true"
weight: 100
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: kubernetes.io/hostname
Topology-aware scheduling
By default, CoreDNS uses topology-aware scheduling to spread pods across zones. The whenUnsatisfiable: DoNotSchedule setting enforces this — pods won't be scheduled if the zone balance condition can't be met.
For this to work reliably:
-
Your cluster must have nodes in at least 2 different zones, each with at least 1 node that has sufficient resources for CoreDNS.
-
All nodes must have the
topology.kubernetes.io/zonelabel (applied by default). Missing or inconsistent labels can cause scheduling failures or uneven distribution. -
Upgrade your cluster to v1.27 or later and CoreDNS to v1.12.1.3 or later to get the
matchLabelKeysfix described below.
CoreDNS versions earlier than v1.12.1.3 use this topology spread constraint:
topologySpreadConstraints:
- labelSelector:
matchLabels:
k8s-app: kube-dns
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
During a rolling update, this configuration can produce an uneven final pod distribution. The labelSelector counts pods from both the new and old ReplicaSets. To satisfy maxSkew=1, the scheduler favors zones with fewer total pods (combining old and new). After the old pods are removed, the new pods may be concentrated in only a few zones.
CoreDNS v1.12.1.3 and later resolves this with the matchLabelKeys feature introduced in Kubernetes v1.27:
topologySpreadConstraints:
- labelSelector:
matchLabels:
k8s-app: kube-dns
matchLabelKeys:
- pod-template-hash
nodeTaintsPolicy: Honor
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
Adding matchLabelKeys: [pod-template-hash] scopes the spread constraint to pods from the current ReplicaSet only, so rolling updates produce an even distribution across all zones.