All Products
Search
Document Center

Container Service for Kubernetes:Achieve multi-GPU sharing

Last Updated:Apr 30, 2025

Container Service for Kubernetes (ACK) managed Pro clusters support GPU sharing. You can use GPU sharing to share GPU resources and isolate memory. This topic describes how to configure multi-GPU sharing policies.

Prerequisites

Introduction to multi-GPU sharing

Important

Multi-GPU sharing requires shared compute resources with isolated GPU memory.

During the model development phase, applications may require multiple GPUs without occupying their full resource. Allocating all GPUs to the development environment may lead to resource wastage. To avoid this issue, use multi-GPU sharing.

Multi-GPU sharing allows an application to request N GiB of GPU memory distributed across M GPUs, where each GPU allocates N/M GiB. The value of N/M must be an integer, and all M GPUs must reside on the same Kubernetes node. For example, when requesting 8 GiB of memory and specifying 2 GPUs, each GPU will allocate 4 GiB.

  • Single GPU sharing: A pod can request GPU resources that are allocated by only one GPU.

  • Multiple GPU sharing: A pod can request GPU resources that are evenly allocated by multiple GPUs.

image

Configure a multiple GPU sharing policy

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Workloads > Jobs.

  3. On the Jobs page, click Create from YAML. Copy the following content to the Template section and click Create:

    Click to view YAML content

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: tensorflow-mnist-multigpu
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: tensorflow-mnist-multigpu
            # Add a pod label and a resource limit to request 8 GiB of memory allocated by 2 GPUs. Each GPU allocates 4 GiB of memory. 
            aliyun.com/gpu-count: "2"
        spec:
          containers:
          - name: tensorflow-mnist-multigpu
            image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=100000
            - --data_dir=tensorflow-sample-code/data
            resources:
              limits:
                aliyun.com/gpu-mem: 8 # Request 8 GiB of memory. 
            workingDir: /root
          restartPolicy: Never

    YAML template description:

    • The YAML template defines a TensorFlow MNIST job. The job requests 8 GiB of memory allocated by 2 GPUs. Each GPU allocates 4 GiB of memory.

    • Add the aliyun.com/gpu-count: 2 pod label to request two GPUs.

    • Add the aliyun.com/gpu-mem: 8 resource limit to request 8 GiB of memory.

Verify the multiple GPU sharing policy

  1. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Workloads > Pods.

  2. Click Terminal in the Actions column of the pod that you created, such as tensorflow-mnist-multigpu-***, to log on to the pod and run the following command:

    nvidia-smi

    Expected output:

    Wed Jun 14 03:24:14 2023
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
    | N/A   38C    P0    61W / 300W |    569MiB /  4309MiB |      2%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   1  Tesla V100-SXM2...  On   | 00000000:00:0A.0 Off |                    0 |
    | N/A   36C    P0    61W / 300W |    381MiB /  4309MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    +-----------------------------------------------------------------------------+

    The output indicates that the pod can use only two GPUs. Each GPU can provide 4,309 MiB of memory, which is requested by the pod. The actual memory size of each GPU is 16,160 MiB.

  3. Click Logs in the Actions column of the pod to view the logs of the pod. The following information is displayed:

    totalMemory: 4.21GiB freeMemory: 3.91GiB
    totalMemory: 4.21GiB freeMemory: 3.91GiB

    The device information indicates that each GPU allocates 4 GiB of memory. The actual memory size of each GPU is 16,160 MiB. This means that memory isolation is implemented.