All Products
Search
Document Center

Container Service for Kubernetes:Allocate computing power by scheduling shared GPU

Last Updated:Mar 26, 2026

cGPU lets multiple pods share a single physical GPU in ACK Pro clusters by isolating both GPU memory and computing power at the software level. This topic shows how to create a node pool with computing power allocation enabled, verify the configuration, and deploy a workload that uses both resources.

Prerequisites

Before you begin, ensure that you have:

  • An ACK Pro cluster running Kubernetes 1.20 or later. For more information, see Create an ACK managed cluster

  • A kube-scheduler version that meets the requirement for your cluster version: For the full list of features supported by each kube-scheduler version, see kube-scheduler.

    ACK cluster version Scheduler version
    1.28 1.28.1-aliyun-5.6-998282b9 or later
    1.26 v1.26.3-aliyun-4.1-a520c096 or later
    1.24 1.24.3-ack-2.0 or later
    1.22 1.22.15-ack-2.0 or later
    1.20 1.20.4-ack-8.0 or later
  • The GPU sharing component installed with a Helm chart version later than 1.2.0. For more information, see Manage the GPU sharing component

  • cGPU 1.0.5 or later installed. For more information, see Update the cGPU version on a node

Limitations

  • Job type mixing is not supported on the same node. GPU sharing supports two types of jobs: jobs that request only GPU memory, and jobs that request both GPU memory and computing power. You cannot run both types on the same node at the same time. This constraint exists because cGPU uses software-level isolation, not hardware-level isolation (such as MIG).

  • Computing power values must be a multiple of 5, with a minimum of 5. The scale is 0–100, where 100 represents 100% of a GPU's computing power. For example, a value of 20 means the pod uses 20% of the GPU's computing power. If the specified value is not a multiple of 5, the job cannot be submitted.

  • Computing power allocation is not supported by direct node labeling. To enable computing power isolation on existing GPU-accelerated nodes, remove them from the cluster first and then add them to a node pool that supports computing power isolation. Running kubectl label nodes <NODE_NAME> ack.node.gpu.schedule=core_mem directly on existing nodes does not work.

  • Supported regions only. Computing power allocation is available in the following regions:

    Region Region ID
    China (Beijing) cn-beijing
    China (Shanghai) cn-shanghai
    China (Hangzhou) cn-hangzhou
    China (Zhangjiakou) cn-zhangjiakou
    China (Shenzhen) cn-shenzhen
    China (Chengdu) cn-chengdu
    China (Heyuan) cn-heyuan
    China (Hong Kong) cn-hongkong
    Indonesia (Jakarta) ap-southeast-5
    Singapore ap-southeast-1
    Thailand (Bangkok) ap-southeast-7
    US (Virginia) us-east-1
    US (Silicon Valley) us-west-1
    Japan (Tokyo) ap-northeast-1
    China East 2 Finance cn-shanghai-finance-1
  • Clusters created before March 1, 2022 require a manual scheduler update. Clusters created on or after March 1, 2022 automatically use the scheduler version that supports computing power allocation. For older clusters, follow these steps:

    1. Submit a ticket to apply for private preview access to shared GPU scheduling.

    2. If the installed GPU sharing Helm chart version is 1.2.0 or earlier, uninstall it: a. Log on to the ACK console. In the left navigation pane, click ACK consoleACK consoleClusters. b. Click the cluster name. In the left navigation pane, choose Applications > Helm. c. On the Helm page, find ack-ai-installer and click Delete in the Actions column. In the Delete dialog box, click OK.

    3. Install the latest GPU sharing component. For more information, see Manage the GPU sharing component.

Step 1: Create a node pool with computing power allocation

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. Click the cluster name. In the left navigation pane, choose Nodes > Node Pools.

  3. On the Node Pools page, click Create Node Pool.

  4. Configure the node pool with the following settings. For all other parameters, see Create and manage a node pool. Under Node Labels, add these two labels:

    Parameter Description
    Node Pool Name A name for the node pool. This topic uses gpu-core as an example.
    Expected Nodes The initial number of nodes. Set to 0 if you do not want to create nodes immediately.
    ECS Tags Labels to add to the Elastic Compute Service (ECS) instances in the node pool.
    Node Labels Labels to add to the nodes. Configure both of the following labels. For more information, see Labels for enabling GPU scheduling policies.
    Key Value Purpose
    ack.node.gpu.schedule core_mem Enables both GPU memory isolation and computing power isolation on the node.
    ack.node.gpu.placement binpack Uses the binpack algorithm to pack pods onto the fewest GPUs, maximizing GPU utilization.

Step 2: Verify computing power allocation

Run the following command to check whether computing power allocation is enabled on a node:

kubectl get nodes <NODE_NAME> -o yaml

Look for the aliyun.com/gpu-core.percentage field in the allocatable and capacity sections of the output. Its presence confirms that computing power allocation is active.

Expected output for a node with 4 GPUs (each with 15 GiB of memory):

# Irrelevant fields are omitted.
status:
  allocatable:
    # 4 GPUs x 100% = 400 total computing power units
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    # 4 GPUs x 15 GiB = 60 GiB total GPU memory
    aliyun.com/gpu-mem: "60"
  capacity:
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    aliyun.com/gpu-mem: "60"

Step 3: Deploy a workload with computing power limits

Without computing power allocation, a pod can use 100% of a GPU's computing power and all 15 GiB of its memory. After enabling this feature, you set explicit limits on both resources.

The following example deploys a job that requests 2 GiB of GPU memory and 30% of computing power.

  1. Create a file named cuda-sample.yaml with the following content:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: cuda-sample
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: cuda-sample
        spec:
          containers:
          - name: cuda-sample
            image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:benchmark-tensorflow-2.2.3
            command:
            - bash
            - run.sh
            - --num_batches=500000000
            - --batch_size=8
            resources:
              limits:
                aliyun.com/gpu-mem: 2                 # GPU memory in GiB
                aliyun.com/gpu-core.percentage: 30    # Percentage of GPU computing power; must be a multiple of 5
            workingDir: /root
          restartPolicy: Never

    Key resource parameters:

    Parameter Value Description
    aliyun.com/gpu-mem 2 GPU memory in GiB. The pod can use up to 2 GiB.
    aliyun.com/gpu-core.percentage 30 Percentage of GPU computing power. Must be a multiple of 5; 30 means 30%.
  2. Deploy the job:

    kubectl apply -f /tmp/cuda-sample.yaml
    The image is large. The initial pull may take several minutes.
  3. Verify that the job is running:

    kubectl get po -l app=cuda-sample

    Expected output:

    NAME                READY   STATUS    RESTARTS   AGE
    cuda-sample-m****   1/1     Running   0          15s

    Running in the STATUS column confirms the pod is active.

  4. Check the GPU memory and computing power used by the pod:

    kubectl exec -ti cuda-sample-m**** -- nvidia-smi

    Expected output:

    Thu Dec 16 02:53:22 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:08.0 Off |                    0 |
    | N/A   33C    P0    56W / 300W |    337MiB /  2154MiB |     30%      Default |
    |                               |                      |                  N/A |
    +-----------------------------------------------------------------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    +-----------------------------------------------------------------------------+

    The output confirms isolation is applied:

    • GPU memory: The pod is limited to 2,154 MiB (approximately 2 GiB), down from the full 15 GiB available before enabling the feature. Current usage is 337 MiB.

    • Computing power: The pod is limited to 30% (GPU-Util: 30%), down from 100%.

    nvidia-smi reports computing power utilization per GPU, not per pod. If n pods each request 30% and n is no greater than 3, all pods are scheduled to the same GPU and the output shows n x 30% utilization.
  5. Check the pod logs to observe throttling behavior:

    kubectl logs cuda-sample-m**** -f

    Expected output:

    [CUDA Bandwidth Test] - Starting...
    Running on...
    
     Device 0: Tesla V100-SXM2-16GB
     Quick Mode
    
    time: 2021-12-16/02:50:59,count: 0,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:01,count: 1,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:02,count: 2,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:03,count: 3,memSize: 32000000,succeed to copy data from host to gpu

    Log entries appear at a lower rate compared to a pod with unrestricted computing power, which confirms that the 30% computing power limit is in effect.

  6. (Optional) Delete the job after verification:

    kubectl delete job cuda-sample

What's next