ACS clusters include a built-in default-scheduler that allocates resources for all pods. For GPU-HPN workloads that require complex scheduling policies—such as gang scheduling or topology-aware placement—you can deploy a custom scheduler (for example, Koordinator or Volcano) and configure ACS to use it.
Custom schedulers are supported only for GPU-HPN pods. All other compute types use regular virtual nodes and do not support custom schedulers.
Prerequisites
Before you begin, make sure you have:
A pod with the compute type set to High-Performance Network GPU (gpu-hpn)
The acs-virtual-node add-on installed at version v2.12.0-acs.8 or later
The kube-scheduler component installed at a version that supports custom schedulers:
ACS cluster version Minimum kube-scheduler version 1.32 and later All versions supported 1.31 v1.31.0-aliyun-1.1.2 or later 1.30 v1.30.3-aliyun-1.1.2 or later 1.28 v1.28.9-aliyun-1.1.2 or later
When using a custom scheduler for GPU-HPN pods, configure the spec.schedulerName field on each pod. For details, see Specify schedulers for pods.Usage notes
Enabling custom schedulers changes how ACS handles GPU-HPN pods and nodes. Review the behavior differences before proceeding.
| Aspect | Default scheduler (disabled) | Custom scheduler (enabled) |
|---|---|---|
| Pod scheduler name | Not customizable. After you submit a pod, spec.schedulerName is overwritten to default-scheduler. | Customizable. spec.schedulerName is preserved after submission and can be set to any value. |
| Pod scheduling process | The ACS default scheduler allocates resources for all pods. | The ACS default scheduler only handles pods where spec.schedulerName is default-scheduler. All other pods are handled by your custom scheduler. |
| GPU-HPN node label and taint constraints | Adding, modifying, and deleting node labels, annotations, and taints are subject to ACS constraints. See Manage node labels and taints. | Node label, annotation, and taint constraints no longer apply. |
| Pod affinity scheduling constraints | Affinity field configuration is subject to ACS constraints. See Node affinity scheduling. | Affinity field constraints no longer apply. |
These changes apply only to GPU-HPN pods and nodes. Other compute types do not support custom schedulers.
Deploy and configure a custom scheduler
Step 1: Deploy a custom scheduler
Deploy your custom scheduler in the ACS cluster. For a complete example—including the required ServiceAccount and ClusterRoleBinding configuration—see the Kubernetes documentation.
Step 2: Enable custom schedulers in ACS
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the ID of the target cluster. In the left navigation pane, choose Operations > Add-ons.
On the Add-ons page, find the Kube Scheduler card and click Configuration.
In the dialog box, select Enable custom labels and schedulers for GPU-HPN nodes, then click OK.
Step 3: Configure a custom scheduler for a pod
Create a file named
dep-with-koordinator.yamlwith the following content. The Deployment setsalibabacloud.com/compute-class: gpu-hpnon the pod template and assignskoord-scheduleras the scheduler. Replacekoord-schedulerwith the name you configured in Step 1.apiVersion: apps/v1 kind: Deployment metadata: name: dep-with-koordinator labels: app: dep-with-koordinator spec: replicas: 1 selector: matchLabels: app: dep-with-koordinator template: metadata: labels: app: dep-with-koordinator # Set the compute class to gpu-hpn. Other compute types do not support custom schedulers. alibabacloud.com/compute-class: gpu-hpn spec: containers: - name: demo image: registry.cn-hangzhou-finance.aliyuncs.com/acs/stress:v1.0.4 command: - "sleep" - "infinity" restartPolicy: Always # Set the scheduler name to match the one deployed in Step 1. schedulerName: koord-schedulerApply the Deployment to the cluster.
kubectl apply -f dep-with-koordinator.yamlVerify that the pod is using the custom scheduler.
kubectl get pod -lapp=dep-with-koordinator -o custom-columns=NAME:.metadata.name,schedulerName:.spec.schedulerNameExpected output:
NAME schedulerName dep-with-koordinator-xxxxx-xxxxx koord-scheduler
FAQ
Why do I get an "Insufficient attachable-volumes-xxx" error when a pod uses a PVC with a custom scheduler?
Some custom schedulers require the node's Container Storage Interface (CSI) Node object to exist and to have reported capacity information for the corresponding CSI driver. If that condition is not met, the scheduler reports an insufficient resource error—even though the default Kubernetes scheduler handles this case automatically.
Configure the custom scheduler to ignore specific CSI drivers. For the Volcano scheduler, add the --ignored-provisioners flag at startup:
# Separate multiple drivers with commas.
--ignored-provisioners=povplugin.csi.alibabacloud.comAdjust the driver name to match the CSI driver in your cluster.