All Products
Search
Document Center

Container Service for Kubernetes:Custom priority-based scheduling for elastic resources

Last Updated:Mar 26, 2026

Custom elastic resource priority scheduling lets you define the order in which pods are scheduled across different resource types and node pools. Create a ResourcePolicy to set this order: during scale-out, pods are scheduled to resource units in the order you define; during scale-in, pods are removed in reverse order.

Warning

Do not use system-reserved labels such as alibabacloud.com/compute-class or alibabacloud.com/compute-qos in workload label selectors (for example, the spec.selector.matchLabels field of a Deployment). The system may modify these labels during custom priority scheduling, causing the controller to frequently rebuild pods and affecting application stability.

Prerequisites

Before you begin, ensure that you have:

  • An ACK managed cluster Pro edition, version 1.20.11 or later. For upgrade steps, see Manually upgrade a cluster.

  • A kube-scheduler version that meets the requirements for your ACK cluster version. See kube-scheduler for a full list of supported features per version.

    ACK version Scheduler version
    1.20 v1.20.4-ack-7.0 or later
    1.22 v1.22.15-ack-2.0 or later
    1.24 or later All versions supported
  • (Required for ECI resources) The ack-virtual-node component deployed in your cluster. See Use ECI in ACK.

Usage notes

  • Best-effort ordering: This feature uses a BestEffort policy. Pod scale-in does not strictly follow the reverse of the scheduling order in all cases.

  • Starting from scheduler version v1.x.x-aliyun-6.4, the default value of ignorePreviousPod changed to false, and ignoreTerminatingPod changed to true. Existing ResourcePolicy objects and subsequent updates are not affected.

  • This feature conflicts with pod-deletion-cost and cannot be used together.

  • This feature cannot be used with Elastic Container Instance (ECI) elastic scheduling implemented through ElasticResource. See Use ElasticResource for elastic scheduling of ECI pods.

  • The max field is available only in clusters of version 1.22 or later with scheduler version 5.0 or later.

  • When used with elastic node pools, this feature may cause node pools to create invalid nodes. To prevent this, include the elastic node pool in a unit and do not set the max field for that unit.

  • If your scheduler version is earlier than 5.0 or your cluster version is 1.20 or earlier, pods that exist before the ResourcePolicy is created are the first to be scaled in.

  • If your scheduler version is earlier than 6.1 or your cluster version is 1.20 or earlier, do not modify a ResourcePolicy while its associated pods are not completely deleted.

  • When used with auto-scaling, this feature must be used with instant elasticity. The Cluster Autoscaler may otherwise trigger incorrect node pool scaling.

Create a ResourcePolicy

Define a ResourcePolicy with the following YAML structure:

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
  name: test
  namespace: default
spec:
  selector:
    key1: value1
  strategy: prefer
  units:
  - nodeSelector:
      unit: first
    podLabels:
      key1: value1
    podAnnotations:
      key1: value1
    resource: ecs
  - nodeSelector:
      unit: second
    max: 10
    resource: ecs
  - resource: eci
  # Optional advanced configuration
  preemptPolicy: AfterAllUnits
  ignorePreviousPod: false
  ignoreTerminatingPod: true
  matchLabelKeys:
  - pod-template-hash
  whenTryNextUnits:
    policy: TimeoutOrExceedMax
    timeout: 1m

spec fields

Field Description
selector Selects pods that have matching labels in the same namespace. If empty, applies to all pods in the namespace.
strategy Scheduling strategy. Only prefer is supported.
units Ordered list of scheduling units. Pods are scheduled in list order during scale-out and removed in reverse order during scale-in.

units fields

Field Description
resource Resource type for this unit. Supported values: ecs, eci, elastic (clusters 1.24+ with scheduler 6.4.3+), acs (clusters 1.26+ with scheduler 6.7.1+).
nodeSelector Selects nodes in this unit by their labels.
max Maximum number of pod replicas schedulable to this unit. Available in scheduler version 5.0 or later.
maxResources Maximum amount of resources schedulable to pods in this unit. Available in scheduler version 6.9.5 or later.
podLabels Labels added to pods scheduled to this unit. Only pods with these labels are counted for this unit.
podAnnotations Annotations added to pods scheduled to this unit. Only pods with these annotations are counted for this unit.
The elastic resource type is being deprecated. Instead, use auto-scaling node pools by setting k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true" in podLabels.
The acs type adds the alibabacloud.com/compute-class: default and alibabacloud.com/compute-class: general-purpose labels to pods by default. Override these by specifying different values in podLabels. If alpha.alibabacloud.com/compute-qos-strategy is specified in podAnnotations, the alibabacloud.com/compute-class: default label is not added.
The acs and eci types add tolerations for virtual node taints to pods by default. The scheduler adds these tolerations internally — they do not appear in the pod spec, and pods can be scheduled to virtual nodes without additional taint toleration configuration.
Important

In scheduler versions earlier than 6.8.3, you cannot use multiple units of the acs type at the same time.

If a unit's podLabels include k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true", or if the number of pods in the unit is less than the max value, the scheduler holds the pod in the current unit until a condition is met. Set the maximum wait duration in whenTryNextUnits. The k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true" label is not applied to the pod and is not required for pod counting.

Advanced configuration fields

Field Available from Description
preemptPolicy Scheduler v6.1 Controls when preemption is attempted across units. BeforeNextUnit: attempt preemption each time a unit fails. AfterAllUnits (default): attempt preemption only after all units fail. Not applicable to ACS. See Enable preemption.
ignorePreviousPod Scheduler v6.1 When true, pods created before the ResourcePolicy are excluded from pod counting. Must be used with max.
ignoreTerminatingPod Scheduler v6.1 When true, pods in the Terminating state are excluded from pod counting. Must be used with max.
matchLabelKeys Scheduler v6.2 Groups pods by label values and applies max per group. Pods missing a declared label are rejected by the scheduler. Must be used with max.
whenTryNextUnits Cluster 1.24+, scheduler 6.4+ Defines when a pod is allowed to move to the next unit. See whenTryNextUnits policies below.

whenTryNextUnits policies

Policy Moves to next unit when... Best for
LackResourceOrExceedMax (default) Current unit runs out of resources, or pod count reaches max Most general use cases
ExceedMax max and maxResources are not set, or pod count reaches max, or the resources used in the current unit plus the resources of the current pod exceed maxResources Prioritizing auto-scaling of node pools over ECI
TimeoutOrExceedMax (1) max is set and pod count is below max, or maxResources is set and current usage plus the current pod's resources are below maxResources; or (2) max is not set and podLabels contain k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true" — in either case, if the unit has insufficient resources, the pod waits up to timeout before moving Node pool scale-out with ECI fallback after timeout
LackResourceAndNoTerminating Resources are insufficient (or max is reached) and no pods in the current unit are Terminating Rolling updates — prevents new pods from spilling to the next unit while old pods terminate

The timeout field applies only when policy is TimeoutOrExceedMax. The default is 15 minutes. Not supported for ACS units (which are limited only by max).

Important

If the auto-scaling node pool cannot create nodes for a long time, ExceedMax may leave pods in Pending indefinitely. The Cluster Autoscaler does not currently respect the max limit in ResourcePolicy, so the actual number of created instances may exceed max. This will be addressed in a future release.

Important

With TimeoutOrExceedMax, if a node is created during the timeout period but is not yet Ready, and the pod does not tolerate the NotReady taint, the pod is still scheduled to ECI.

Scenario examples

These scenarios produce best-effort results. Pod removal order during scale-in may not strictly follow the reverse scheduling order in all circumstances.

Prioritize one node pool over another

Goal: Deploy a Deployment across two node pools — Pool A first, Pool B as overflow. During scale-in, remove pods from Pool B first.

In this example, nodes cn-beijing.10.0.3.137 and cn-beijing.10.0.3.138 belong to Pool A, and cn-beijing.10.0.6.47 and cn-beijing.10.0.6.46 belong to Pool B. All nodes have 2 vCPUs and 4 GB of memory.

  1. Create a ResourcePolicy that sets the node pool scheduling order. Replace the nodepool-id values with your actual node pool IDs, which you can find on the Node Management > Node Pools page. See Create and manage a node pool.

    apiVersion: scheduling.alibabacloud.com/v1alpha1
    kind: ResourcePolicy
    metadata:
      name: nginx
      namespace: default
    spec:
      selector:
        app: nginx # Must match the pod label in the Deployment below
      strategy: prefer
      units:
      - resource: ecs
        nodeSelector:
          alibabacloud.com/nodepool-id: np7ec79f2235954e879de07b780058****
      - resource: ecs
        nodeSelector:
          alibabacloud.com/nodepool-id: npab2df797738644e3a7b7cbf532bb****
  2. Create a Deployment. The pod label app: nginx must match the selector in the ResourcePolicy.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          name: nginx
          labels:
            app: nginx # Must match the ResourcePolicy selector
        spec:
          containers:
          - name: nginx
            image: nginx
            resources:
              limits:
                cpu: 2
              requests:
                cpu: 2
  3. Apply the Deployment and verify pod placement.

    1. Apply the YAML files.

      kubectl apply -f nginx.yaml

      Expected output:

      deployment.apps/nginx created
    2. Check which nodes the pods are scheduled to.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-b****   1/1     Running   0          17s   172.29.112.216   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-k****   1/1     Running   0          17s   172.29.113.24    cn-beijing.10.0.3.138   <none>           <none>

      Both pods are on Pool A nodes, as expected.

  4. Scale out to four replicas and verify overflow to Pool B.

    1. Scale the Deployment.

      kubectl scale deployment nginx --replicas 4
    2. Check pod placement.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE    IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-b****   1/1     Running   0          101s   172.29.112.216   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-k****   1/1     Running   0          101s   172.29.113.24    cn-beijing.10.0.3.138   <none>           <none>
      nginx-9cdf7bbf9-m****   1/1     Running   0          18s    172.29.113.156   cn-beijing.10.0.6.47    <none>           <none>
      nginx-9cdf7bbf9-x****   1/1     Running   0          18s    172.29.113.89    cn-beijing.10.0.6.46    <none>           <none>

      The two new pods overflow to Pool B nodes, as Pool A is at capacity.

  5. Scale in to two replicas and verify that Pool B pods are removed first.

    1. Scale the Deployment.

      kubectl scale deployment nginx --replicas 2
    2. Check pod status.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS        RESTARTS   AGE     IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-b****   1/1     Running       0          2m41s   172.29.112.216   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-k****   1/1     Running       0          2m41s   172.29.113.24    cn-beijing.10.0.3.138   <none>           <none>
      nginx-9cdf7bbf9-m****   0/1     Terminating   0          78s     172.29.113.156   cn-beijing.10.0.6.47    <none>           <none>
      nginx-9cdf7bbf9-x****   0/1     Terminating   0          78s     172.29.113.89    cn-beijing.10.0.6.46    <none>           <none>

      Pool B pods are removed first, which is the reverse of the scheduling order.

Use subscription ECS first, then pay-as-you-go ECS, then fall back to ECI

Goal: Minimize costs by filling subscription ECS capacity first, then pay-as-you-go ECS, and finally ECI. During scale-in, remove pods in reverse order: ECI first, then pay-as-you-go ECS, then subscription ECS.

In this example, all nodes have 2 vCPUs and 4 GB of memory.

  1. Label the nodes by billing type. If you use node pools, configure labels at the node pool level instead.

    kubectl label node cn-beijing.10.0.3.137 paidtype=subscription
    kubectl label node cn-beijing.10.0.3.138 paidtype=subscription
    kubectl label node cn-beijing.10.0.6.46 paidtype=pay-as-you-go
    kubectl label node cn-beijing.10.0.6.47 paidtype=pay-as-you-go
  2. Create a ResourcePolicy that orders units by billing type.

    apiVersion: scheduling.alibabacloud.com/v1alpha1
    kind: ResourcePolicy
    metadata:
      name: nginx
      namespace: default
    spec:
      selector:
        app: nginx # Must match the pod label in the Deployment below
      strategy: prefer
      units:
      - resource: ecs
        nodeSelector:
          paidtype: subscription
      - resource: ecs
        nodeSelector:
          paidtype: pay-as-you-go
      - resource: eci
  3. Create a Deployment with two replicas.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          name: nginx
          labels:
            app: nginx # Must match the ResourcePolicy selector
        spec:
          containers:
          - name: nginx
            image: nginx
            resources:
              limits:
                cpu: 2
              requests:
                cpu: 2
  4. Apply and verify initial placement on subscription nodes.

    1. Apply the YAML files.

      kubectl apply -f nginx.yaml
    2. Check pod placement.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-b****   1/1     Running   0          66s   172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running   0          66s   172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

      Both pods are on subscription nodes.

  5. Scale out to verify overflow to pay-as-you-go ECS and then ECI.

    1. Scale to four replicas and check pod placement.

      kubectl scale deployment nginx --replicas 4
      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE     IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-4****   1/1     Running   0          16s     172.29.113.155   cn-beijing.10.0.6.47    <none>           <none>
      nginx-9cdf7bbf9-b****   1/1     Running   0          3m48s   172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-f****   1/1     Running   0          16s     172.29.113.88    cn-beijing.10.0.6.46    <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running   0          3m48s   172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

      Overflow pods are scheduled to pay-as-you-go nodes.

    2. Scale to six replicas and check pod placement.

      kubectl scale deployment nginx --replicas 6
      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE     IP               NODE                           NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-4****   1/1     Running   0          3m10s   172.29.113.155   cn-beijing.10.0.6.47           <none>           <none>
      nginx-9cdf7bbf9-b****   1/1     Running   0          6m42s   172.29.112.215   cn-beijing.10.0.3.137          <none>           <none>
      nginx-9cdf7bbf9-f****   1/1     Running   0          3m10s   172.29.113.88    cn-beijing.10.0.6.46           <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running   0          6m42s   172.29.113.23    cn-beijing.10.0.3.138          <none>           <none>
      nginx-9cdf7bbf9-s****   1/1     Running   0          36s     10.0.6.68        virtual-kubelet-cn-beijing-j   <none>           <none>
      nginx-9cdf7bbf9-v****   1/1     Running   0          36s     10.0.6.67        virtual-kubelet-cn-beijing-j   <none>           <none>

      When all ECS capacity is exhausted, the remaining pods are scheduled to ECI (virtual-kubelet nodes).

  6. Scale in to verify reverse removal order.

    1. Scale to four replicas. ECI pods are removed first.

      kubectl scale deployment nginx --replicas 4
      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS        RESTARTS   AGE     IP               NODE                           NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-4****   1/1     Running       0          4m59s   172.29.113.155   cn-beijing.10.0.6.47           <none>           <none>
      nginx-9cdf7bbf9-b****   1/1     Running       0          8m31s   172.29.112.215   cn-beijing.10.0.3.137          <none>           <none>
      nginx-9cdf7bbf9-f****   1/1     Running       0          4m59s   172.29.113.88    cn-beijing.10.0.6.46           <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running       0          8m31s   172.29.113.23    cn-beijing.10.0.3.138          <none>           <none>
      nginx-9cdf7bbf9-s****   1/1     Terminating   0          2m25s   10.0.6.68        virtual-kubelet-cn-beijing-j   <none>           <none>
      nginx-9cdf7bbf9-v****   1/1     Terminating   0          2m25s   10.0.6.67        virtual-kubelet-cn-beijing-j   <none>           <none>

      ECI pods are the first to be removed.

    2. Scale to two replicas. Pay-as-you-go ECS pods are removed next.

      kubectl scale deployment nginx --replicas 2
      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS        RESTARTS   AGE     IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-4****   0/1     Terminating   0          6m43s   172.29.113.155   cn-beijing.10.0.6.47    <none>           <none>
      nginx-9cdf7bbf9-b****   1/1     Running       0          10m     172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-f****   0/1     Terminating   0          6m43s   172.29.113.88    cn-beijing.10.0.6.46    <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running       0          10m     172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>
    3. After termination completes, only the subscription ECS pods remain.

      kubectl get pods -o wide

      Expected output:

      NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                    NOMINATED NODE   READINESS GATES
      nginx-9cdf7bbf9-b****   1/1     Running   0          11m   172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
      nginx-9cdf7bbf9-r****   1/1     Running   0          11m   172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

Troubleshooting

Pods are stuck in Pending after applying a ResourcePolicy

The scheduler may not be associating the ResourcePolicy with the correct pods. Check that the selector in the ResourcePolicy exactly matches the pod labels in your workload. If the selector uses a label that the system reserves (such as alibabacloud.com/compute-class), the system may modify it, breaking the association.

Also confirm your kube-scheduler version meets the minimum requirement for your cluster version (see Prerequisites).

Scale-in does not follow the expected reverse order

This feature is best-effort. The scheduler does not guarantee strict reverse-order removal in all cases — for example, when preemption is active or when multiple pods become eligible for removal simultaneously.

If you require stricter ordering, check the whenTryNextUnits.policy setting and consider LackResourceAndNoTerminating for rolling update scenarios.

ResourcePolicy conflicts with pod-deletion-cost

If you have configured pod-deletion-cost annotations on pods in the same workload, the two features conflict and cannot be used together. Remove pod-deletion-cost annotations before applying a ResourcePolicy.

Node pool creates unexpected nodes when used with elastic node pools

When an auto-scaling node pool is included in a unit with the max field set, the Cluster Autoscaler may create more nodes than the max value, because it does not currently read the max limit from ResourcePolicy. To avoid this, include the elastic node pool in a unit and do not set max for that unit.

What's next

References