All Products
Search
Document Center

Container Service for Kubernetes:Work with GPU sharing and scheduling by using ack-co-scheduler

Last Updated:Mar 10, 2025

GPU sharing is a resource management policy that allows multiple tasks or processes to share one GPU. You can use GPU sharing in a registered cluster to avoid resource waste of traditional GPU scheduling and improve GPU utilization.

Prerequisites

Billing

The cloud-native AI suite is activated before you use GPU sharing. For more information about the cloud-native AI suite and how it is billed, see Overview of the cloud-native AI suite and Billing of the cloud-native AI suite.

Limits

  • Do not set the CpuPolicy parameter to static for nodes that have GPU sharing enabled.

  • The pods managed by the DaemonSet of the shared GPU do not enjoy the highest priority. Therefore, the resources may be scheduled to pods that have higher priority and the node may evict the pods managed by the DaemonSet. To prevent this issue, you can modify the actual DaemonSet of the shared GPU. For example, you can modify the gpushare-device-plugin-ds DaemonSet used to share GPU memory and specify priorityClassName: system-node-critical to ensure the priority of the pods managed by the DaemonSet.

Step 1: Install components

  1. Install the ack-ai-installer component in the registered cluster. This component implements scheduling capabilities such as GPU sharing (including GPU memory isolation) and topology-aware GPU scheduling.

    1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

    2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Applications > Helm.

    3. On the Helm page, click Create. Search for and install the ack-ai-installer component.

  2. Install the ack-co-scheduler component in the registered cluster. This component allows you to create ResourcePolicy custom resources (CRs) to use the multilevel resource scheduling feature.

    1. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Add-ons.

    2. On the Add-ons page, search for the ack-co-scheduler component, and click Install at the bottom right of the card.

Step 2: Install and use the GPU inspection tool

  1. Download kubectl-inspect-cgpu. The executable file must be downloaded to a directory included in the PATH environment variable. This section uses /usr/local/bin/ as an example.

    • If you use Linux, run the following command to download kubectl-inspect-cgpu:

      wget http://aliacs-k8s-cn-beijing.oss-cn-beijing.aliyuncs.com/gpushare/kubectl-inspect-cgpu-linux -O /usr/local/bin/kubectl-inspect-cgpu
    • If you use macOS, run the following command to download kubectl-inspect-cgpu:

      wget http://aliacs-k8s-cn-beijing.oss-cn-beijing.aliyuncs.com/gpushare/kubectl-inspect-cgpu-darwin -O /usr/local/bin/kubectl-inspect-cgpu
  2. Run the following command to grant the execute permissions to kubectl-inspect-cgpu:

    chmod +x /usr/local/bin/kubectl-inspect-cgpu

Step 3: Create GPU-accelerated nodes

Create an Elastic GPU Service, and install the driver and nvidia-container-runtime. For more information, see Create and manage a node pool.

Note
  • Skip this step if you have added GPU-accelerated nodes to the node pool and configured the environment when you created the node pool.

  • For more information about the driver installation script, see Manually update the NVIDIA driver of a node.

  • Nodes that have GPU sharing enabled must be labeled with ack.node.gpu.schedule=share. You can manually add this label to on-premises nodes. For cloud nodes, you can update the label value to ack.node.gpu.schedule=cgpu using the labeling feature provided by the node pool, enabling GPU memory isolation. For more information, see Labels for enabling GPU scheduling policies.

Step 4: Work with GPU sharing

  1. Run the following command to query the GPU usage of the cluster:

    kubectl inspect cgpu

    Expected output:

    NAME                           IPADDRESS       GPU0(Allocated/Total)  GPU Memory(GiB)
    cn-zhangjiakou.192.168.66.139  192.168.66.139  0/15                   0/15
    ---------------------------------------------------------------------------
    Allocated/Total GPU Memory In Cluster:
    0/15 (0%)
  2. Create a file named GPUtest.yaml and copy the following content to the file.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: gpu-share-sample
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: gpu-share-sample
        spec:
          schedulerName: ack-co-scheduler
          containers:
          - name: gpu-share-sample
            image: registry.cn-hangzhou.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=100000
            - --data_dir=tensorflow-sample-code/data
            resources:
              limits:
                # The unit is GiB, and this pod requests a total of 3 GiB of video memory.
                aliyun.com/gpu-mem: 3 # Set the GPU video memory size.
            workingDir: /root
          restartPolicy: Never
  3. Run the following command to deploy a sample application that has GPU sharing enabled and requests 3 GiB of GPU memory for the application:

    kubectl apply -f GPUtest.yaml
  4. Run the following command to query the memory usage of the GPU:

    kubectl inspect cgpu

    Expected output:

    NAME                           IPADDRESS       GPU0(Allocated/Total)  GPU Memory(GiB)
    cn-zhangjiakou.192.168.66.139  192.168.66.139  3/15                   3/15
    ---------------------------------------------------------------------------
    Allocated/Total GPU Memory In Cluster:
    3/15 (20%)             

    The output shows that the total GPU memory of the cn-zhangjiakou.192.168.66.139 node is 15 GiB and 3 GiB of GPU memory is allocated.

References

For more information about GPU sharing, see GPU sharing.