When ECI-based pods in one zone fail to be created, all replicas concentrated in that zone become unavailable. Two failure patterns are common: all replicas land in the same zone during scheduling, and ECI creation failures in some zones silently violate spread guarantees. This topic explains how to use Kubernetes topology spread constraints and affinity rules to prevent both problems in an ACK managed Pro cluster.
Prerequisites
Before you begin, make sure you have:
An ACK managed Pro cluster that meets the following requirements:
Kubernetes version 1.22 or later
ACK Virtual Node component version 2.10.0 or later
kube-scheduler component version 5.9 or later, with the virtual node-based pod scheduling policy enabled
Multiple zones (vSwitches) configured in the eci-profile so pods can be scheduled across zones
The
nodeAffinity,podAffinity, ortopologySpreadConstraintsfields set in your pod spec, or a ResourcePolicy configured for the pod
To schedule pods to ARM-based virtual nodes, add tolerations that match the taints of those nodes.
Usage notes
Set
topologyKeytotopology.kubernetes.io/zone.The scheduling behavior described in this topic does not apply when:
The
k8s.aliyun.com/eci-schedule-strategy: "VSwitchOrdered"annotation is set on the pod (multi-zone scheduling follows a fixed vSwitch order instead).The
k8s.aliyun.com/eci-fail-strategy: "fail-fast"annotation is set on the pod.
Choose an approach
Two Kubernetes mechanisms serve different goals:
| Goal | Mechanism | When to use |
|---|---|---|
| Spread pods evenly across all available zones | topologySpreadConstraints | High availability — replicas are distributed so a single zone failure affects only a fraction of them |
| Pin pods to a specific zone or keep them co-located | podAffinity + nodeAffinity | Performance — low latency between pods matters, or your workload must land in a particular zone |
Both examples below use the same ECI-required fields: a toleration for virtual-kubelet.io/provider and a preferredDuringSchedulingIgnoredDuringExecution node affinity that prefers Elastic Compute Service (ECS) nodes over virtual nodes. The only thing that changes is the spread or affinity rule itself.
Example 1: Spread pods evenly across zones
This example creates a Deployment with 10 replicas distributed evenly across all available zones.
Step 2: Verify the scheduling result
Run the following command to see which nodes the pods landed on:
kubectl get po -lapp=with-pod-topology-spread \
-o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName \
--no-headers | grep -v "<none>"To count pods per zone:
kubectl get po -lapp=with-pod-topology-spread \
-o custom-columns=NODE:.spec.nodeName \
--no-headers | grep -v "<none>" \
| xargs -I {} kubectl get no {} -o json \
| jq '.metadata.labels["topology.kubernetes.io/zone"]' \
| sort | uniq -cExample 2: Deploy pods in a specific zone
This example creates a Deployment with 3 replicas that must all land in the same zone. Use this pattern when low inter-pod latency matters more than spreading across zones.
Step 1: Create the Deployment
Save the following YAML to deployment.yaml and run kubectl apply -f deployment.yaml.
apiVersion: apps/v1
kind: Deployment
metadata:
name: with-affinity
labels:
app: with-affinity
spec:
replicas: 3
selector:
matchLabels:
app: with-affinity
template:
metadata:
labels:
app: with-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- with-affinity
topologyKey: topology.kubernetes.io/zone
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: type
operator: NotIn
values:
- virtual-kubelet
tolerations:
- key: "virtual-kubelet.io/provider"
operator: "Exists"
effect: "NoSchedule"
containers:
- name: with-affinity
image: registry.k8s.io/pause:2.0The three scheduling-relevant sections are:
| Section | What it does |
|---|---|
podAffinity (requiredDuringSchedulingIgnoredDuringExecution, topologyKey: topology.kubernetes.io/zone) | Requires all pods with the app: with-affinity label to land in the same zone. kube-scheduler will not place a pod unless this is satisfiable. See Node affinity. |
nodeAffinity (preferredDuringSchedulingIgnoredDuringExecution, type NotIn virtual-kubelet) | Prefers ECS nodes; falls back to virtual nodes when no ECS node is available. See Node affinity. |
tolerations (virtual-kubelet.io/provider: Exists, NoSchedule) | Allows kube-scheduler to place pods on virtual nodes. See Taints and tolerations. |
To pin pods to a specific zone instead of letting them co-locate wherever the first pod lands, replace the podAffinity block with a requiredDuringSchedulingIgnoredDuringExecution node affinity. The following configuration schedules pods exclusively to Beijing Zone A:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- cn-beijing-aStep 2: Verify the scheduling result
Run the following command to see which nodes the pods landed on:
kubectl get po -lapp=with-affinity \
-o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName \
--no-headers | grep -v "<none>"To count pods per zone:
kubectl get po -lapp=with-affinity \
-o custom-columns=NODE:.spec.nodeName \
--no-headers | grep -v "<none>" \
| xargs -I {} kubectl get no {} -o json \
| jq '.metadata.labels["topology.kubernetes.io/zone"]' \
| sort | uniq -cStrict ECI pod topology spread
By default, kube-scheduler targets an even zone distribution but does not block pod placement when ECI creation fails in some zones. This can silently violate the maxSkew constraint.
For example, with maxSkew: 1 and three zones (A, B, C), kube-scheduler evenly deploys the pods of a workload across all zones. If ECI creation fails in Zone B and Zone C, pods only run in Zone A — violating the constraint specified by maxSkew.
Enable strict ECI pod topology spread to guarantee the constraint is honored. With strict mode on, kube-scheduler first dispatches a pod to each zone and holds back pending pods until the scheduled pods are created. The figure below shows the initial state: one pod dispatched to each zone, all others pending.
Even after Pod A1 is created, kube-scheduler does not schedule the next pod — because if Zone B or Zone C fails, the constraint would be violated. Only after Pod B1 is also created does kube-scheduler schedule a pod to Zone C. Pods with green shading indicate created pods.
To disable strict spread and allow kube-scheduler to schedule pods regardless of constraint satisfaction, set whenUnsatisfiable: ScheduleAnyway. For parameter details, see Spread constraint definition.