All Products
Search
Document Center

Container Service for Kubernetes:Spread ECI pods across zones and configure affinity scheduling

Last Updated:Mar 25, 2026

When ECI-based pods in one zone fail to be created, all replicas concentrated in that zone become unavailable. Two failure patterns are common: all replicas land in the same zone during scheduling, and ECI creation failures in some zones silently violate spread guarantees. This topic explains how to use Kubernetes topology spread constraints and affinity rules to prevent both problems in an ACK managed Pro cluster.

Prerequisites

Before you begin, make sure you have:

  • An ACK managed Pro cluster that meets the following requirements:

  • Multiple zones (vSwitches) configured in the eci-profile so pods can be scheduled across zones

  • The nodeAffinity, podAffinity, or topologySpreadConstraints fields set in your pod spec, or a ResourcePolicy configured for the pod

Note

To schedule pods to ARM-based virtual nodes, add tolerations that match the taints of those nodes.

Usage notes

  • Set topologyKey to topology.kubernetes.io/zone.

  • The scheduling behavior described in this topic does not apply when:

    • The k8s.aliyun.com/eci-schedule-strategy: "VSwitchOrdered" annotation is set on the pod (multi-zone scheduling follows a fixed vSwitch order instead).

    • The k8s.aliyun.com/eci-fail-strategy: "fail-fast" annotation is set on the pod.

Choose an approach

Two Kubernetes mechanisms serve different goals:

GoalMechanismWhen to use
Spread pods evenly across all available zonestopologySpreadConstraintsHigh availability — replicas are distributed so a single zone failure affects only a fraction of them
Pin pods to a specific zone or keep them co-locatedpodAffinity + nodeAffinityPerformance — low latency between pods matters, or your workload must land in a particular zone

Both examples below use the same ECI-required fields: a toleration for virtual-kubelet.io/provider and a preferredDuringSchedulingIgnoredDuringExecution node affinity that prefers Elastic Compute Service (ECS) nodes over virtual nodes. The only thing that changes is the spread or affinity rule itself.

Example 1: Spread pods evenly across zones

This example creates a Deployment with 10 replicas distributed evenly across all available zones.

Step 1: Create the Deployment

Save the following YAML to deployment.yaml and run kubectl apply -f deployment.yaml.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: with-pod-topology-spread
  labels:
    app: with-pod-topology-spread
spec:
  replicas: 10
  selector:
    matchLabels:
      app: with-pod-topology-spread
  template:
    metadata:
      labels:
        app: with-pod-topology-spread
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: type
                operator: NotIn
                values:
                - virtual-kubelet
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: with-pod-topology-spread
      tolerations:
        - key: "virtual-kubelet.io/provider"
          operator: "Exists"
          effect: "NoSchedule"
      containers:
      - name: with-pod-topology-spread
        image: registry.k8s.io/pause:2.0
        resources:
          requests:
            cpu: "1"
            memory: "256Mi"

The three scheduling-relevant sections are:

SectionWhat it does
nodeAffinity (preferredDuringSchedulingIgnoredDuringExecution, type NotIn virtual-kubelet)Prefers ECS nodes; falls back to virtual nodes when no ECS node is available. See Node affinity.
topologySpreadConstraints (maxSkew: 1, topologyKey: topology.kubernetes.io/zone, whenUnsatisfiable: DoNotSchedule)Keeps the pod count difference between any two zones at most 1. kube-scheduler blocks placement if this constraint cannot be satisfied. See topologySpreadConstraints field.
tolerations (virtual-kubelet.io/provider: Exists, NoSchedule)Allows kube-scheduler to place pods on virtual nodes, which carry this taint by default. See Taints and tolerations.
Note

To schedule pods to ARM-based virtual nodes, add a toleration for the ARM-specific taint.

The topologySpreadConstraints field accepts several parameters. The table below covers the ones used in this example and optional parameters available in recent Kubernetes versions:

ParameterRequiredDescription
maxSkewYesMaximum allowed difference in pod count between any two zones
topologyKeyYesMust be topology.kubernetes.io/zone for ECI-based pods
whenUnsatisfiableYesDoNotSchedule blocks placement if the constraint cannot be met; ScheduleAnywaySpread constraint definition allows placement with best effort
labelSelectorYesSelects the pods to count when evaluating skew
minDomainsNoMinimum number of eligible zones to consider. Beta since Kubernetes 1.25.
matchLabelKeysNoAdditional label keys used to identify pods belonging to the same group. Beta since Kubernetes 1.27.
nodeAffinityPolicyNoWhether node affinity rules are honored when counting pods per zone. Beta since Kubernetes 1.26.
nodeTaintsPolicyNoWhether node taints are honored when counting pods per zone. Beta since Kubernetes 1.26.

Step 2: Verify the scheduling result

Run the following command to see which nodes the pods landed on:

kubectl get po -lapp=with-pod-topology-spread \
  -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName \
  --no-headers | grep -v "<none>"

To count pods per zone:

kubectl get po -lapp=with-pod-topology-spread \
  -o custom-columns=NODE:.spec.nodeName \
  --no-headers | grep -v "<none>" \
  | xargs -I {} kubectl get no {} -o json \
  | jq '.metadata.labels["topology.kubernetes.io/zone"]' \
  | sort | uniq -c

Example 2: Deploy pods in a specific zone

This example creates a Deployment with 3 replicas that must all land in the same zone. Use this pattern when low inter-pod latency matters more than spreading across zones.

Step 1: Create the Deployment

Save the following YAML to deployment.yaml and run kubectl apply -f deployment.yaml.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: with-affinity
  labels:
    app: with-affinity
spec:
  replicas: 3
  selector:
    matchLabels:
      app: with-affinity
  template:
    metadata:
      labels:
        app: with-affinity
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - with-affinity
            topologyKey: topology.kubernetes.io/zone
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: type
                operator: NotIn
                values:
                - virtual-kubelet
      tolerations:
        - key: "virtual-kubelet.io/provider"
          operator: "Exists"
          effect: "NoSchedule"
      containers:
      - name: with-affinity
        image: registry.k8s.io/pause:2.0

The three scheduling-relevant sections are:

SectionWhat it does
podAffinity (requiredDuringSchedulingIgnoredDuringExecution, topologyKey: topology.kubernetes.io/zone)Requires all pods with the app: with-affinity label to land in the same zone. kube-scheduler will not place a pod unless this is satisfiable. See Node affinity.
nodeAffinity (preferredDuringSchedulingIgnoredDuringExecution, type NotIn virtual-kubelet)Prefers ECS nodes; falls back to virtual nodes when no ECS node is available. See Node affinity.
tolerations (virtual-kubelet.io/provider: Exists, NoSchedule)Allows kube-scheduler to place pods on virtual nodes. See Taints and tolerations.

To pin pods to a specific zone instead of letting them co-locate wherever the first pod lands, replace the podAffinity block with a requiredDuringSchedulingIgnoredDuringExecution node affinity. The following configuration schedules pods exclusively to Beijing Zone A:

requiredDuringSchedulingIgnoredDuringExecution:
  nodeSelectorTerms:
  - matchExpressions:
    - key: topology.kubernetes.io/zone
      operator: In
      values:
      - cn-beijing-a

Step 2: Verify the scheduling result

Run the following command to see which nodes the pods landed on:

kubectl get po -lapp=with-affinity \
  -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName \
  --no-headers | grep -v "<none>"

To count pods per zone:

kubectl get po -lapp=with-affinity \
  -o custom-columns=NODE:.spec.nodeName \
  --no-headers | grep -v "<none>" \
  | xargs -I {} kubectl get no {} -o json \
  | jq '.metadata.labels["topology.kubernetes.io/zone"]' \
  | sort | uniq -c

Strict ECI pod topology spread

By default, kube-scheduler targets an even zone distribution but does not block pod placement when ECI creation fails in some zones. This can silently violate the maxSkew constraint.

For example, with maxSkew: 1 and three zones (A, B, C), kube-scheduler evenly deploys the pods of a workload across all zones. If ECI creation fails in Zone B and Zone C, pods only run in Zone A — violating the constraint specified by maxSkew.

image

Enable strict ECI pod topology spread to guarantee the constraint is honored. With strict mode on, kube-scheduler first dispatches a pod to each zone and holds back pending pods until the scheduled pods are created. The figure below shows the initial state: one pod dispatched to each zone, all others pending.

image

Even after Pod A1 is created, kube-scheduler does not schedule the next pod — because if Zone B or Zone C fails, the constraint would be violated. Only after Pod B1 is also created does kube-scheduler schedule a pod to Zone C. Pods with green shading indicate created pods.

image

To disable strict spread and allow kube-scheduler to schedule pods regardless of constraint satisfaction, set whenUnsatisfiable: ScheduleAnyway. For parameter details, see Spread constraint definition.