All Products
Search
Document Center

Container Service for Kubernetes:Configure Shared GPU Scheduling cGPU Computing Power Scheduling Policy

Last Updated:Mar 26, 2026

ACK Pro edition clusters support GPU sharing through cGPU, which lets multiple containers share a single GPU by dividing its compute time into slices. This topic explains how to set the computing power allocation policy (the POLICY field in the cGPU DaemonSet) that controls how those time slices are distributed across containers.

For an introduction to cGPU, see What is cGPU.

Prerequisites

Before you begin, ensure that you have:

Constraints

  • All shared GPU nodes in a cluster must use the same policy.

  • If the cGPU isolation module is already installed on a node, restart the node after you install the shared GPU component for the policy to take effect. To restart a node, see the referenced document.

    Run cat /proc/cgpu_km/version on the node to check whether the isolation module is installed. If the command returns a version number, the module is installed.
  • If the cGPU isolation module is not installed, the policy takes effect immediately after you install the shared GPU component.

Policy values

cGPU supports six scheduling policies. Choose the one that matches your workload requirements.

Value Policy Description
0 Average scheduling Each container receives a fixed time slice. The ratio is 1/max_inst.
1 Preemptive scheduling Each container takes as many time slices as possible. The ratio is 1/current_number_of_containers.
2 Weighted preemptive scheduling Applied automatically when ALIYUN_COM_GPU_SCHD_WEIGHT is greater than 1.
3 Fixed computing power scheduling Assigns a fixed percentage of compute to each container.
4 Weak computing power scheduling Provides lighter isolation than preemptive scheduling.
5 Native scheduling Uses the GPU driver's own scheduling method.

For more details on policy behavior and examples, see cGPU Service Usage Example.

Step 1: Check whether the shared GPU component is installed

The configuration procedure differs depending on whether the shared GPU component is already installed.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Applications > Helm.

  3. On the Helm page, check whether ack-ai-installer appears in the component list.

Step 2: Configure the policy

Configure the policy when the component is not installed

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Applications > Cloud-native AI Suite.

  3. On the Cloud-native AI Suite page, click Deploy.

  4. In the Scheduling area, select Scheduling Policy Extension (Batch Task Scheduling, GPU Sharing, Topology-aware GPU Scheduling). Then click Advanced on the right.

  5. On the Parameters page, set the policy field to your target policy value. Then click OK.

    policy.jpg

  6. At the bottom of the page, click Deploy Cloud-native AI Suite.

Configure the policy when the component is installed

  1. Edit the DaemonSet that installs the cGPU isolation module:

    kubectl edit daemonset cgpu-installer -nkube-system
  2. Verify that the image version is v1.0.6 or later. The image field looks similar to:

    image: registry-vpc.cn-hongkong.aliyuncs.com/acs/cgpu-installer:<Image Version>
  3. Under spec.containers[].env, set the value of POLICY to your target policy value:

    # Other fields are omitted.
    spec:
      containers:
      - env:
        - name: POLICY
          value: "1"
    # Other fields are omitted.
  4. Save the file to apply the change.

  5. Restart the shared GPU node. For details, see the referenced document.

What's next