Configure GPU Node Labels for Optimized Scheduling in ACK Pro - Container Service for Kubernetes

In an ACK managed cluster Pro, you can control how GPU resources are allocated to workloads by assigning scheduling labels to GPU nodes. These labels let you configure exclusive access, shared compute across multiple pods, topology-aware assignment, or hardware-model-based routing — giving you fine-grained control over resource utilization and workload placement.

Scheduling label overview

GPU scheduling labels define the resource allocation policy for a node. A node supports only one GPU scheduling mode at a time (exclusive, shared, or topology-aware). Enabling one mode automatically sets the extended resources for all other modes to 0.

The four scheduling modes differ along two key dimensions: whether compute is isolated and whether GPU memory is isolated. Use this as your starting point when choosing a mode:

Scheduling mode	Label	When to use
Exclusive scheduling (default)	`ack.node.gpu.schedule: default`	Performance-critical workloads that need a full GPU — model training, high-performance computing (HPC).
Shared scheduling	`ack.node.gpu.schedule: cgpu`	Multiple concurrent lightweight tasks — multitenancy, inference. Shared compute with isolated GPU memory, based on Alibaba Cloud cGPU technology.
	`ack.node.gpu.schedule: core_mem`	Isolated compute and isolated GPU memory per pod.
	`ack.node.gpu.schedule: share`	Shared compute and GPU memory with no isolation.
	`ack.node.gpu.schedule: mps`	Shared compute with isolated GPU memory, based on NVIDIA MPS isolation technology combined with Alibaba Cloud cGPU technology.
Placement policy (for shared scheduling on multi-GPU nodes)	`ack.node.gpu.placement: binpack`	Maximize GPU utilization or reduce energy use. Fills one GPU completely before moving to the next. Default.
	`ack.node.gpu.placement: spread`	High availability. Distributes pods across different GPUs to reduce the impact of a single card failure.
Topology-aware scheduling	`ack.node.gpu.schedule: topology`	Workloads sensitive to GPU-to-GPU communication latency. Assigns the optimal combination of GPUs based on physical topology within a node.
Card model scheduling	`aliyun.accelerator/nvidia_name: <GPU_card_name>`	Route jobs to nodes with a specific GPU model, or exclude a specific model using node affinity rules.
	`aliyun.accelerator/nvidia_mem: <memory_per_card>`	Filter by GPU memory per card. Use together with card model scheduling.
	`aliyun.accelerator/nvidia_count: <total_number_of_GPU_cards>`	Filter by total number of GPU cards on the node. Use together with card model scheduling.

The ack.node.gpu.placement label only applies when shared scheduling (cgpu, core_mem, share, or mps) is enabled. The binpack and spread values are mutually exclusive — only one placement policy is active per node at a time.

Enable scheduling features

Exclusive scheduling

Nodes without a GPU scheduling label use exclusive scheduling by default. Each pod gets one full GPU card.

Removing a label does not restore exclusive scheduling if another mode was previously enabled. To switch back, set the label value explicitly: kubectl label node <NODE_NAME> ack.node.gpu.schedule=default --overwrite.

Shared scheduling

Shared scheduling is available only for ACK managed cluster Pro. For more information, see Limits.

Step 1: Install the ack-ai-installer component.

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Applications > Cloud-native AI Suite.
On the Cloud-native AI Suite page, click Deploy. On the Deploy Cloud-native AI Suite page, select Scheduling Policy Extension (Batch Task Scheduling, GPU Sharing, Topology-aware GPU Scheduling).

For details on setting the compute scheduling policy for the cGPU service, see Install and use the cGPU service.
Click Deploy Cloud-native AI Suite. On the Cloud-native AI Suite page, confirm that ack-ai-installer appears in the list of installed components.

Step 2: Enable shared scheduling on a node pool.

On the Clusters page, click the cluster name. In the left navigation pane, choose Nodes > Node Pools.
On the Node Pools page, click Create Node Pool. Configure the node labels, then click Confirm. Keep the default values for other fields. For label definitions, see Scheduling label overview.
- Basic shared scheduling: Click the icon next to Node Labels. Set the Key to ack.node.gpu.schedule and set the Value to one of: cgpu, core_mem, share, or mps. > Note: mps requires the MPS Control Daemon component. See installing the MPS Control Daemon component.
- Placement policy (multi-GPU nodes only): Add a second label. Click the icon, set the Key to ack.node.gpu.placement, and set the Value to binpack or spread.

Step 3: Verify that shared scheduling is enabled.

Run the appropriate command for your sharing mode. Replace <NODE_NAME> with a node in the target node pool.

`cgpu`/`share`/`mps`

kubectl get nodes <NODE_NAME> -o yaml | grep -q "aliyun.com/gpu-mem"

Expected output:

aliyun.com/gpu-mem: "60"

A non-zero value in the aliyun.com/gpu-mem field confirms that cgpu, share, or mps shared scheduling is active.

`core_mem`

kubectl get nodes <NODE_NAME> -o yaml | grep -E 'aliyun\.com/gpu-core\.percentage|aliyun\.com/gpu-mem'

Expected output:

aliyun.com/gpu-core.percentage:"80"
aliyun.com/gpu-mem:"6"

Both the aliyun.com/gpu-core.percentage and aliyun.com/gpu-mem fields must be non-zero for core_mem shared scheduling to be active.

`binpack`

Use the shared GPU resource query tool to inspect per-GPU allocation:

kubectl inspect cgpu

Expected output:

NAME                     IPADDRESS      GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU2(Allocated/Total)  GPU3(Allocated/Total)  GPU Memory(GiB)
cn-shanghai.192.0.2.109  192.0.2.109    15/15                  9/15                   0/15                   0/15                   24/60
--------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
24/60 (40%)

Each GPUx(Allocated/Total) column shows how much memory is in use on that card. GPU0 is fully allocated (15/15) and GPU1 is partially allocated (9/15), while GPU2 and GPU3 are empty. This sequential fill pattern — packing one card before starting the next — confirms that binpack is active.

`spread`

kubectl inspect cgpu

Expected output:

NAME                     IPADDRESS      GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU2(Allocated/Total)  GPU3(Allocated/Total)  GPU Memory(GiB)
cn-shanghai.192.0.2.109  192.0.2.109    4/15                   4/15                   0/15                   4/15                   12/60
--------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
12/60 (20%)

GPU0, GPU1, and GPU3 each show 4/15, while GPU2 is empty. This distributed allocation pattern — spreading pods across cards rather than filling one first — confirms that spread is active.

core_mem

binpack

spread

Topology-aware scheduling

Topology-aware scheduling is available only for ACK managed cluster Pro. For version requirements, see System component version requirements.

Install the ack-ai-installer component.
Add the topology label to the target node:

After enabling topology-aware scheduling on a node, that node no longer accepts non-topology-aware GPU workloads. To restore exclusive scheduling, run: kubectl label node <NODE_NAME> ack.node.gpu.schedule=default --overwrite.
```
kubectl label node <NODE_NAME> ack.node.gpu.schedule=topology
```
Verify that topology-aware scheduling is enabled:
```
kubectl get nodes <NODE_NAME> -o yaml | grep aliyun.com/gpu
```
Expected output:
```
aliyun.com/gpu: "2"
```
A non-zero value in the aliyun.com/gpu field confirms that topology-aware scheduling is active.

Card model scheduling

Card model scheduling uses node labels and Kubernetes node affinity rules to pin jobs to specific GPU hardware — or keep them away from it.

Step 1: Check the GPU card model on your nodes.

kubectl get nodes -L aliyun.accelerator/nvidia_name

The NVIDIA_NAME column shows the GPU model for each node:

NAME                        STATUS   ROLES    AGE   VERSION            NVIDIA_NAME
cn-shanghai.192.XX.XX.176   Ready    <none>   17d   v1.26.3-aliyun.1   Tesla-V100-SXM2-32GB
cn-shanghai.192.XX.XX.177   Ready    <none>   17d   v1.26.3-aliyun.1   Tesla-V100-SXM2-32GB

Click to view more ways to check GPU models.

In the ACK console, go to Workloads > Pods. Click Terminal in the Actions column for a pod such as tensorflow-mnist-multigpu-\*\*\*, then run the following commands inside the container:

Card model: nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 | sed -e 's/ /-/g'
GPU memory per card: nvidia-smi --id=0 --query-gpu=memory.total --format=csv,noheader | sed -e 's/ //g'
Total GPU cards on the node: nvidia-smi -L | wc -l

Step 2: Create a job with card model scheduling.

In the ACK console, go to Workloads > Jobs and click Create From YAML. Both examples below use the aliyun.accelerator/nvidia_name label to control GPU model selection.

Specify a GPU model

Use nodeSelector to pin the job to nodes with a particular GPU model. Replace Tesla-V100-SXM2-32GB with the model from your cluster.

Click to view the YAML file details

apiVersion: batch/v1
kind: Job
metadata:
  name: tensorflow-mnist
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: tensorflow-mnist
    spec:
      nodeSelector:
        aliyun.accelerator/nvidia_name: "Tesla-V100-SXM2-32GB" # Runs the application on a Tesla V100-SXM2-32GB GPU.
      containers:
      - name: tensorflow-mnist
        image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=1000
        - --data_dir=tensorflow-sample-code/data
        resources:
          limits:
            nvidia.com/gpu: 1
        workingDir: /root
      restartPolicy: Never

After the job is created, go to Workloads > Pods. The pod list shows the pod scheduled to a matching node, confirming that card model label scheduling is working.

Avoid a GPU model

Use nodeAffinity with NotIn to prevent the job from running on nodes with a specific GPU model. Replace Tesla-V100-SXM2-32GB with the model to exclude.

Click to view the YAML file details

apiVersion: batch/v1
kind: Job
metadata:
  name: tensorflow-mnist
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: tensorflow-mnist
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: aliyun.accelerator/nvidia_name  # Card model scheduling label
                operator: NotIn
                values:
                - "Tesla-V100-SXM2-32GB"            # Prevents the pod from being scheduled to a node with a Tesla-V100-SXM2-32GB card.
      containers:
      - name: tensorflow-mnist
        image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=1000
        - --data_dir=tensorflow-sample-code/data
        resources:
          limits:
            nvidia.com/gpu: 1
        workingDir: /root
      restartPolicy: Never

After the job is created, the pod is not scheduled on nodes with the aliyun.accelerator/nvidia_name: Tesla-V100-SXM2-32GB label, but can run on any other GPU node.

Exclude a card model

Scheduling label overview

Enable scheduling features

Exclusive scheduling

Shared scheduling

cgpu/share/mps

core_mem

binpack

spread

Topology-aware scheduling

Card model scheduling

Specify a GPU model

Avoid a GPU model

`cgpu`/`share`/`mps`

`core_mem`

`binpack`

`spread`