How to customize pod scheduling in ACS - Container Compute Service

Priority-based resource scheduling is provided by Alibaba Cloud to meet elasticity requirements in pod scheduling. During application deployment or scaling, you can configure a custom ResourcePolicy to schedule application pods across Elastic Compute Service (ECS) nodes and Alibaba Cloud Container Compute Service (ACS) pods, enabling hybrid scheduling based on cost and availability priorities.

Important

In kube-scheduler v1.x.x-aliyun-6.4 and later, by default the ignorePreviousPod parameter of a ResourcePolicy is set to False and the ignoreTerminatingPod parameter is set to True. Existing ResourcePolicies that use the preceding parameters are not affected by this change or further updates.

Limits

If you use this feature together with pod-deletion-cost in scale-in scenarios, you cannot perform scale-ins in reverse order.
The max parameter is available only if your cluster runs Kubernetes 1.26 or later and the version of the scheduler installed in your cluster is 6.7 or later.
If you use this feature together with elastic node pools, invalid nodes may be added to the elastic node pools. Make sure that the elastic node pools are included in units. Do not specify the max parameter for the units.

Use priority-based resource scheduling

For more information about the priority-based scheduling feature of Alibaba Cloud Container Service for Kubernetes (ACK) Pro clusters, see Configure priority-based resource scheduling.

Prerequisites

An ACK Pro cluster that runs Kubernetes 1.26 or later is created. For more information about how to update the Kubernetes version of an ACK cluster, see Update the Kubernetes version of an ACK cluster.

Specific versions of the scheduler are required based on the Kubernetes version of the cluster. The following table describes the required scheduler versions and virtual node versions for each Kubernetes version.
Kubernetes version
Scheduler version
Virtual node version
≥ 1.26
≥ 1.26.3-aliyun-6.7.0
≥ 2.13.0
For more information about the features of different scheduler versions, see kube-scheduler.
For more information about how to enable the computing power of ACS for an ACK Pro cluster, see Use the computing power of ACS in ACK Pro clusters.

Colocated scheduling of ECS instances and ACS computing power

Schedule a Deployment''s pods across subscription ECS, pay-as-you-go ECS, and ACS pods with priority order: subscription → pay-as-you-go → ACS. During scale-in, evict pods in reverse order: ACS → pay-as-you-go → subscription. Assume each node has 2 vCPUs and 4 GB memory. Configure colocated scheduling accordingly:

Run the following command to add labels that indicate different billing methods to the nodes. Use node pools to automatically add labels to the nodes.
```
kubectl label node cn-beijing.10.0.3.137 paidtype=subscription 
kubectl label node cn-beijing.10.0.3.138 paidtype=subscription
kubectl label node cn-beijing.10.0.6.46 paidtype=pay-as-you-go
kubectl label node cn-beijing.10.0.6.47 paidtype=pay-as-you-go
```
Node label description
- paidtype=subscription: subscription ECS nodes
  paidtype=pay-as-you-go: pay-as-you-go ECS nodes

Create a ResourcePolicy based on the following content:

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
  name: nginx
  namespace: default
spec:
  selector:
    app: nginx               # Must match the 'app' label in the pod template metadata. 
  strategy: prefer
  units:
  - resource: ecs
    nodeSelector:
      paidtype: subscription
  - resource: ecs
    nodeSelector:
      paidtype: pay-as-you-go
  - resource: acs

Use the following template to create a Deployment that provisions two pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx                                           # The pod label must be the same as the one that you specified for the selector in the ResourcePolicy. 
        alibabacloud.com/compute-class: general-purpose      # The compute class of the ACS computing power. Default value: general-purpose.
        alibabacloud.com/compute-qos: default                # The quality of service (QoS) class of the ACS pod. Default value: default.
    spec:
      containers:
      - name: nginx
        image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
        resources:
          limits:
            cpu: 2
          requests:
            cpu: 2

Deploy and verify the nginx application:

Deploy the nginx application:

kubectl apply -f nginx.yaml

Expected output:

deployment.apps/nginx created

Verify pod placement:

kubectl get pods -o wide

Expected output:

NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                    NOMINATED NODE   READINESS GATES
nginx-9cdf7bbf9-b****   1/1     Running   0          66s   172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
nginx-9cdf7bbf9-r****   1/1     Running   0          66s   172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

The output indicates that two pods are scheduled to nodes that have the paidtype=subscription label. In this example, cn-beijing.10.0.3.137 and cn-beijing.10.0.3.138 are used.

Scale up the nginx application.

Increase the number of pods to four:

kubectl scale deployment nginx --replicas 4

Expected output:

deployment.apps/nginx scaled

Query the status of the pod:

kubectl get pods -o wide

Expected output:

NAME                    READY   STATUS    RESTARTS   AGE     IP               NODE                    NOMINATED NODE   READINESS GATES
nginx-9cdf7bbf9-4****   1/1     Running   0          16s     172.29.113.155   cn-beijing.10.0.6.47    <none>           <none>
nginx-9cdf7bbf9-b****   1/1     Running   0          3m48s   172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
nginx-9cdf7bbf9-f****   1/1     Running   0          16s     172.29.113.88    cn-beijing.10.0.6.46    <none>           <none>
nginx-9cdf7bbf9-r****   1/1     Running   0          3m48s   172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

The output indicates that new pods are scheduled to nodes that have the paidtype=pay-as-you-go label, because nodes that have the paidtype=subscription label are insufficient. In this example, cn-beijing.10.0.6.46 and cn-beijing.10.0.6.47 are used.

Increase the number of pods to six:

kubectl scale deployment nginx --replicas 6

Expected output:

deployment.apps/nginx scaled

Query the status of the pod:

kubectl get pods -o wide

Expected output:

NAME                    READY   STATUS    RESTARTS   AGE     IP               NODE                           NOMINATED NODE   READINESS GATES
nginx-9cdf7bbf9-4****   1/1     Running   0          3m10s   172.29.113.155   cn-beijing.10.0.6.47           <none>           <none>
nginx-9cdf7bbf9-b****   1/1     Running   0          6m42s   172.29.112.215   cn-beijing.10.0.3.137          <none>           <none>
nginx-9cdf7bbf9-f****   1/1     Running   0          3m10s   172.29.113.88    cn-beijing.10.0.6.46           <none>           <none>
nginx-9cdf7bbf9-r****   1/1     Running   0          6m42s   172.29.113.23    cn-beijing.10.0.3.138          <none>           <none>
nginx-9cdf7bbf9-s****   1/1     Running   0          36s     10.0.6.68        virtual-kubelet-cn-beijing-j   <none>           <none>
nginx-9cdf7bbf9-v****   1/1     Running   0          36s     10.0.6.67        virtual-kubelet-cn-beijing-j   <none>           <none>

The output indicates that the ECS instance has insufficient resources. The pod is scheduled to ACS. In this example, virtual-kubelet-cn-beijing-j is used.

Scale down the nginx application.

Reduce the number of pods to four:

kubectl scale deployment nginx --replicas 4

Expected output:

deployment.apps/nginx scaled

Query the status of the pod:

kubectl get pods -o wide

Expected output:

NAME                    READY   STATUS        RESTARTS   AGE     IP               NODE                           NOMINATED NODE   READINESS GATES
nginx-9cdf7bbf9-4****   1/1     Running       0          4m59s   172.29.113.155   cn-beijing.10.0.6.47           <none>           <none>
nginx-9cdf7bbf9-b****   1/1     Running       0          8m31s   172.29.112.215   cn-beijing.10.0.3.137          <none>           <none>
nginx-9cdf7bbf9-f****   1/1     Running       0          4m59s   172.29.113.88    cn-beijing.10.0.6.46           <none>           <none>
nginx-9cdf7bbf9-r****   1/1     Running       0          8m31s   172.29.113.23    cn-beijing.10.0.3.138          <none>           <none>
nginx-9cdf7bbf9-s****   1/1     Terminating   0          2m25s   10.0.6.68        virtual-kubelet-cn-beijing-j   <none>           <none>
nginx-9cdf7bbf9-v****   1/1     Terminating   0          2m25s   10.0.6.67        virtual-kubelet-cn-beijing-j   <none>           <none>

The output shows that pods are evicted in reverse order of scheduling priority: first from ACS, then from pay-as-you-go ECS nodes, and last from subscription ECS nodes.

Reduce the number of pods to two:

kubectl scale deployment nginx --replicas 2

Expected output:

deployment.apps/nginx scaled

Query the status of the pod:

kubectl get pods -o wide

Expected output:

NAME                    READY   STATUS        RESTARTS   AGE     IP               NODE                    NOMINATED NODE   READINESS GATES
nginx-9cdf7bbf9-4****   0/1     Terminating   0          6m43s   172.29.113.155   cn-beijing.10.0.6.47    <none>           <none>
nginx-9cdf7bbf9-b****   1/1     Running       0          10m     172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
nginx-9cdf7bbf9-f****   0/1     Terminating   0          6m43s   172.29.113.88    cn-beijing.10.0.6.46    <none>           <none>
nginx-9cdf7bbf9-r****   1/1     Running       0          10m     172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

The output indicates that the pods on the nodes that have the paidtype=pay-as-you-go label have been deleted.

Wait a few minutes, then query the pod state again:

kubectl get pods -o wide

Expected output:

NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                    NOMINATED NODE   READINESS GATES
nginx-9cdf7bbf9-b****   1/1     Running   0          11m   172.29.112.215   cn-beijing.10.0.3.137   <none>           <none>
nginx-9cdf7bbf9-r****   1/1     Running   0          11m   172.29.113.23    cn-beijing.10.0.3.138   <none>           <none>

The output indicates that pods run only on the nodes with the paidtype=subscription label.

References

When deploying workloads in an ACK cluster, use tolerations and node affinity to restrict scheduling to ECS or ACS resources, or to auto-provision ACS when ECS is insufficient. Different scheduling policies support various scaling scenarios. See Use the computing power of ACS in ACK Pro clusters.
For distributed jobs, leverage Kubernetes-native scheduling to spread workloads across zones for high availability or confine them to specific zones via affinity for high performance. See Node affinity scheduling.

Kubernetes version	Scheduler version	Virtual node version
≥ 1.26	≥ 1.26.3-aliyun-6.7.0	≥ 2.13.0