ACS delivers resources through virtual nodes, so instance availability varies dynamically. When a specific compute class or QoS tier is out of stock, pods stay in Pending state unless you have a fallback strategy. Custom priority scheduling lets you define an ordered list of compute class and QoS combinations. The scheduler tries each option in sequence until one succeeds, and uses the same ordering—in reverse—to determine which pods to remove first during scale-in.
Prerequisites
Before you begin, make sure you have:
kube-scheduler installed at a version that meets the following requirements:
ACS cluster version Minimum scheduler version 1.31 v1.31.0-aliyun-1.2.0 1.30 v1.30.3-aliyun-1.1.1 1.28 v1.28.9-aliyun-1.1.0 acs-virtual-node installed at v2.12.0-acs.4 or later
How it works
A ResourcePolicy is a custom resource (CRD) that sits between the cluster and individual workloads. It selects a group of pods by label and defines an ordered list of resource properties. When a pod matching the selector needs to be scheduled, the scheduler tries each entry in the list in order. If the inventory for an entry is insufficient, it moves to the next one. If all entries are out of stock, the pod stays in Pending state and the scheduler continuously retries until resources become available.
ResourcePolicy changes only affect pods created after the update. Existing running pods are not rescheduled.
Scale-in order
ACS uses the Kubernetes pod deletion cost feature to control scale-in order. Theoretically, a pod with the lowest pod deletion cost is scaled in first. However, the scale-in algorithm considers a variety of factors that depend on the implementation of the pod controller.
ACS automatically sets controller.kubernetes.io/pod-deletion-cost based on the scheduling priority of each pod. If your pods already carry this annotation, ACS overwrites its value.
Supported compute classes
For the compute classes that support custom priority scheduling, see Compute classes.
Do not use alibabacloud.com/compute-class or alibabacloud.com/compute-qos in the spec.selector.matchLabels of a workload (such as a Deployment). ACS may modify these labels during scheduling, which causes the controller to repeatedly recreate pods and destabilizes the application.
Create a ResourcePolicy
Create a file named
resource-policy.yamlwith the following content. The example below defines a policy for pods with theapp: stresslabel. It triesgeneral-purpose+best-effortfirst, then falls back togeneral-purpose+default.apiVersion: scheduling.alibabacloud.com/v1alpha1 kind: ResourcePolicy metadata: name: rp-demo namespace: default spec: selector: app: stress # Applies to pods with this label units: - resource: acs # First choice: general-purpose, best-effort podLabels: alibabacloud.com/compute-class: general-purpose alibabacloud.com/compute-qos: best-effort - resource: acs # Fallback: general-purpose, default podLabels: alibabacloud.com/compute-class: general-purpose alibabacloud.com/compute-qos: defaultApply the ResourcePolicy to the cluster.
kubectl apply -f resource-policy.yamlCreate a workload. Set the pod labels to match the
selectorin the ResourcePolicy. The following example uses a Job. The pod template carriesapp: stress, which associates it with the policy defined above.apiVersion: batch/v1 kind: Job metadata: name: demo-job namespace: default spec: parallelism: 3 template: metadata: labels: app: stress # Must match spec.selector in the ResourcePolicy spec: containers: - name: demo-job image: registry.cn-hangzhou.aliyuncs.com/acs/stress:v1.0.4 args: - 'infinity' command: - sleep resources: requests: cpu: "1" memory: "1Gi" limits: cpu: "1" memory: "1Gi" restartPolicy: Never backoffLimit: 4
Advanced configuration
The following annotated YAML shows the full structure of a ResourcePolicy for ACS clusters.
This page covers common ACS configurations. For the complete ResourcePolicy field reference, see Custom Elastic Resource Priority Scheduling.
apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
name: rp-demo
namespace: default
spec:
# Selector: identifies which pods follow this policy
selector:
app: stress
# Units: ordered list of resource options
units:
- resource: acs # Must be set to "acs"
podLabels:
alibabacloud.com/compute-class: general-purpose
alibabacloud.com/compute-qos: best-effort
nodeSelector: # Optional: restrict to a specific zone
topology.kubernetes.io/zone: cn-hangzhou-i
- resource: acs
podLabels:
alibabacloud.com/compute-class: general-purpose
alibabacloud.com/compute-qos: default
# Fields not listed here apply to non-ACS clusters and can be ignored.Application configuration
| Field | Type | Description | Example |
|---|---|---|---|
selector | map[string]string | Pods with all listed labels follow this ResourcePolicy. | app: stress |
Resource configuration
Each entry in units describes one resource option. The scheduler tries entries in order.
| Field | Type | Description | Allowed values |
|---|---|---|---|
resource | string | Resource type. Required. | acs |
nodeSelector | map[string]string | Filter virtual nodes by label, such as zone. See node affinity scheduling. | topology.kubernetes.io/zone: cn-hangzhou-i |
podLabels[alibabacloud.com/compute-class] | string | Compute class for the pod. | general-purpose (default), performance |
podLabels[alibabacloud.com/compute-qos] | string | Compute QoS tier for the pod. | default (default), best-effort |
Example: priority fallback with scale-out
This example shows how to use a ResourcePolicy to request performance + default resources first, then fall back to general-purpose + best-effort as capacity scales out.
Create
resource-policy.yamlwith an initial single-unit policy.apiVersion: scheduling.alibabacloud.com/v1alpha1 kind: ResourcePolicy metadata: name: stress-demo namespace: default spec: selector: app: stress units: - resource: acs podLabels: alibabacloud.com/compute-class: performance alibabacloud.com/compute-qos: defaultApply the ResourcePolicy.
kubectl apply -f resource-policy.yamlCreate
stress-dep.yamlfor the Deployment.apiVersion: apps/v1 kind: Deployment metadata: name: stress spec: replicas: 1 selector: matchLabels: app: stress template: metadata: labels: app: stress # Keep this consistent with the ResourcePolicy selector spec: containers: - name: stress image: registry-cn-hangzhou.ack.aliyuncs.com/acs/stress:v1.0.4 command: - "sleep" - "infinity" resources: limits: cpu: '1' memory: 1Gi requests: cpu: '1' memory: 1GiDeploy the application.
kubectl apply -f stress-dep.yamlVerify the pod is running with the expected compute class and QoS.
kubectl get pod -L alibabacloud.com/compute-class,alibabacloud.com/compute-qosExpected output:
# Actual output depends on resource availability. NAME READY STATUS RESTARTS AGE COMPUTE-CLASS COMPUTE-QOS stress-xxxxxxxx1 1/1 Running 0 53s performance defaultUpdate
resource-policy.yamlto add a fallback unit. The updated policy triesgeneral-purpose+best-effortfirst, then falls back toperformance+default.apiVersion: scheduling.alibabacloud.com/v1alpha1 kind: ResourcePolicy metadata: name: stress-demo namespace: default spec: selector: app: stress units: - resource: acs podLabels: alibabacloud.com/compute-class: general-purpose alibabacloud.com/compute-qos: best-effort - resource: acs podLabels: alibabacloud.com/compute-class: performance alibabacloud.com/compute-qos: defaultApply the updated ResourcePolicy. The change takes effect for subsequently created pods.
kubectl apply -f resource-policy.yamlScale the Deployment to two replicas.
kubectl scale deployment stress --replicas=2Verify both pods are running and the new replica used the first-choice resources.
kubectl get pod -L alibabacloud.com/compute-class,alibabacloud.com/compute-qosExpected output:
# Actual output depends on resource availability. NAME READY STATUS RESTARTS AGE COMPUTE-CLASS COMPUTE-QOS stress-xxxxxxxx1 1/1 Running 0 2m14s performance default stress-xxxxxxxx2 1/1 Running 0 33s general-purpose best-effortThe new replica uses
general-purpose+best-effort, confirming the updated policy applied to the newly created pod.
What's next
Custom Elastic Resource Priority Scheduling — full ResourcePolicy field reference
Node affinity scheduling — supported labels for
nodeSelectorCompute classes — available compute classes for custom priority scheduling