In an ACK managed cluster Pro, you can control how GPU resources are allocated to workloads by assigning scheduling labels to GPU nodes. These labels let you configure exclusive access, shared compute across multiple pods, topology-aware assignment, or hardware-model-based routing — giving you fine-grained control over resource utilization and workload placement.
Scheduling label overview
GPU scheduling labels define the resource allocation policy for a node. A node supports only one GPU scheduling mode at a time (exclusive, shared, or topology-aware). Enabling one mode automatically sets the extended resources for all other modes to 0.
The four scheduling modes differ along two key dimensions: whether compute is isolated and whether GPU memory is isolated. Use this as your starting point when choosing a mode:
| Scheduling mode | Label | When to use |
|---|---|---|
| Exclusive scheduling (default) | ack.node.gpu.schedule: default |
Performance-critical workloads that need a full GPU — model training, high-performance computing (HPC). |
| Shared scheduling | ack.node.gpu.schedule: cgpu |
Multiple concurrent lightweight tasks — multitenancy, inference. Shared compute with isolated GPU memory, based on Alibaba Cloud cGPU technology. |
ack.node.gpu.schedule: core_mem |
Isolated compute and isolated GPU memory per pod. | |
ack.node.gpu.schedule: share |
Shared compute and GPU memory with no isolation. | |
ack.node.gpu.schedule: mps |
Shared compute with isolated GPU memory, based on NVIDIA MPS isolation technology combined with Alibaba Cloud cGPU technology. | |
| Placement policy (for shared scheduling on multi-GPU nodes) | ack.node.gpu.placement: binpack |
Maximize GPU utilization or reduce energy use. Fills one GPU completely before moving to the next. Default. |
ack.node.gpu.placement: spread |
High availability. Distributes pods across different GPUs to reduce the impact of a single card failure. | |
| Topology-aware scheduling | ack.node.gpu.schedule: topology |
Workloads sensitive to GPU-to-GPU communication latency. Assigns the optimal combination of GPUs based on physical topology within a node. |
| Card model scheduling | aliyun.accelerator/nvidia_name: <GPU_card_name> |
Route jobs to nodes with a specific GPU model, or exclude a specific model using node affinity rules. |
aliyun.accelerator/nvidia_mem: <memory_per_card> |
Filter by GPU memory per card. Use together with card model scheduling. | |
aliyun.accelerator/nvidia_count: <total_number_of_GPU_cards> |
Filter by total number of GPU cards on the node. Use together with card model scheduling. |
Theack.node.gpu.placementlabel only applies when shared scheduling (cgpu,core_mem,share, ormps) is enabled. Thebinpackandspreadvalues are mutually exclusive — only one placement policy is active per node at a time.
Enable scheduling features
Exclusive scheduling
Nodes without a GPU scheduling label use exclusive scheduling by default. Each pod gets one full GPU card.
Removing a label does not restore exclusive scheduling if another mode was previously enabled. To switch back, set the label value explicitly: kubectl label node <NODE_NAME> ack.node.gpu.schedule=default --overwrite.
Shared scheduling
Shared scheduling is available only for ACK managed cluster Pro. For more information, see Limits.
Step 1: Install the ack-ai-installer component.
-
Log on to the ACK console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Applications > Cloud-native AI Suite.
-
On the Cloud-native AI Suite page, click Deploy. On the Deploy Cloud-native AI Suite page, select Scheduling Policy Extension (Batch Task Scheduling, GPU Sharing, Topology-aware GPU Scheduling).
For details on setting the compute scheduling policy for the cGPU service, see Install and use the cGPU service.
-
Click Deploy Cloud-native AI Suite. On the Cloud-native AI Suite page, confirm that ack-ai-installer appears in the list of installed components.
Step 2: Enable shared scheduling on a node pool.
-
On the Clusters page, click the cluster name. In the left navigation pane, choose Nodes > Node Pools.
-
On the Node Pools page, click Create Node Pool. Configure the node labels, then click Confirm. Keep the default values for other fields. For label definitions, see Scheduling label overview.
-
Basic shared scheduling: Click the
icon next to Node Labels. Set the Key to ack.node.gpu.scheduleand set the Value to one of:cgpu,core_mem,share, ormps. > Note:mpsrequires the MPS Control Daemon component. See installing the MPS Control Daemon component. -
Placement policy (multi-GPU nodes only): Add a second label. Click the
icon, set the Key to ack.node.gpu.placement, and set the Value tobinpackorspread.
-
Step 3: Verify that shared scheduling is enabled.
Run the appropriate command for your sharing mode. Replace <NODE_NAME> with a node in the target node pool.
cgpu/share/mps
kubectl get nodes <NODE_NAME> -o yaml | grep -q "aliyun.com/gpu-mem"
Expected output:
aliyun.com/gpu-mem: "60"
A non-zero value in the aliyun.com/gpu-mem field confirms that cgpu, share, or mps shared scheduling is active.
core_mem
kubectl get nodes <NODE_NAME> -o yaml | grep -E 'aliyun\.com/gpu-core\.percentage|aliyun\.com/gpu-mem'
Expected output:
aliyun.com/gpu-core.percentage:"80"
aliyun.com/gpu-mem:"6"
Both the aliyun.com/gpu-core.percentage and aliyun.com/gpu-mem fields must be non-zero for core_mem shared scheduling to be active.
binpack
Use the shared GPU resource query tool to inspect per-GPU allocation:
kubectl inspect cgpu
Expected output:
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU3(Allocated/Total) GPU Memory(GiB)
cn-shanghai.192.0.2.109 192.0.2.109 15/15 9/15 0/15 0/15 24/60
--------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
24/60 (40%)
Each GPUx(Allocated/Total) column shows how much memory is in use on that card. GPU0 is fully allocated (15/15) and GPU1 is partially allocated (9/15), while GPU2 and GPU3 are empty. This sequential fill pattern — packing one card before starting the next — confirms that binpack is active.
spread
kubectl inspect cgpu
Expected output:
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU3(Allocated/Total) GPU Memory(GiB)
cn-shanghai.192.0.2.109 192.0.2.109 4/15 4/15 0/15 4/15 12/60
--------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
12/60 (20%)
GPU0, GPU1, and GPU3 each show 4/15, while GPU2 is empty. This distributed allocation pattern — spreading pods across cards rather than filling one first — confirms that spread is active.
core_mem
binpack
spread
Topology-aware scheduling
Topology-aware scheduling is available only for ACK managed cluster Pro. For version requirements, see System component version requirements.
-
Add the topology label to the target node:
After enabling topology-aware scheduling on a node, that node no longer accepts non-topology-aware GPU workloads. To restore exclusive scheduling, run:
kubectl label node <NODE_NAME> ack.node.gpu.schedule=default --overwrite.kubectl label node <NODE_NAME> ack.node.gpu.schedule=topology -
Verify that topology-aware scheduling is enabled:
kubectl get nodes <NODE_NAME> -o yaml | grep aliyun.com/gpuExpected output:
aliyun.com/gpu: "2"A non-zero value in the
aliyun.com/gpufield confirms that topology-aware scheduling is active.
Card model scheduling
Card model scheduling uses node labels and Kubernetes node affinity rules to pin jobs to specific GPU hardware — or keep them away from it.
Step 1: Check the GPU card model on your nodes.
kubectl get nodes -L aliyun.accelerator/nvidia_name
The NVIDIA_NAME column shows the GPU model for each node:
NAME STATUS ROLES AGE VERSION NVIDIA_NAME
cn-shanghai.192.XX.XX.176 Ready <none> 17d v1.26.3-aliyun.1 Tesla-V100-SXM2-32GB
cn-shanghai.192.XX.XX.177 Ready <none> 17d v1.26.3-aliyun.1 Tesla-V100-SXM2-32GB
Step 2: Create a job with card model scheduling.
In the ACK console, go to Workloads > Jobs and click Create From YAML. Both examples below use the aliyun.accelerator/nvidia_name label to control GPU model selection.
Specify a GPU model
Use nodeSelector to pin the job to nodes with a particular GPU model. Replace Tesla-V100-SXM2-32GB with the model from your cluster.
After the job is created, go to Workloads > Pods. The pod list shows the pod scheduled to a matching node, confirming that card model label scheduling is working.
Avoid a GPU model
Use nodeAffinity with NotIn to prevent the job from running on nodes with a specific GPU model. Replace Tesla-V100-SXM2-32GB with the model to exclude.
After the job is created, the pod is not scheduled on nodes with the aliyun.accelerator/nvidia_name: Tesla-V100-SXM2-32GB label, but can run on any other GPU node.
Exclude a card model