All Products
Search
Document Center

Container Service for Kubernetes:Enable Multi-GPU Sharing with Shared GPU Scheduling

Last Updated:Mar 26, 2026

When a single model training job needs more GPU memory than one physical card provides, multi-GPU sharing lets a Pod draw equal allocations from multiple GPU cards simultaneously — without exclusively occupying any card. ACK Pro clusters support multi-GPU sharing with GPU memory isolation, enabling finer-grained resource utilization during model development.

Prerequisites

Before you begin, ensure that you have:

Limitations

Multi-GPU sharing supports GPU memory isolation with computing power sharing only. GPU memory isolation with computing power allocation is not supported.

How it works

Multi-GPU sharing lets a single Pod request GPU memory from multiple physical GPU cards simultaneously. Each card contributes an equal share.

Mode Description
Single-GPU sharing A Pod uses a portion of one GPU card's resources.
Multi-GPU sharing A Pod spans multiple GPU cards, with each card contributing the same amount of GPU memory.

Allocation formula: If a Pod requests N GiB of GPU memory from M GPU cards, each card allocates N/M GiB.

For example, a Pod requesting 8 GiB across 2 GPU cards receives 4 GiB from each card.

Constraints:

  • N/M must be an integer.

  • All M GPU cards must be on the same Kubernetes node.

image

Configure multi-GPU sharing

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Jobs.

  3. On the Jobs page, click Create from YAML. Copy the following YAML into the Template area, then click Create.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: tensorflow-mnist-multigpu
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: tensorflow-mnist-multigpu
            aliyun.com/gpu-count: "2"    # Number of GPU cards to use
        spec:
          containers:
          - name: tensorflow-mnist-multigpu
            image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=100000
            - --data_dir=tensorflow-sample-code/data
            resources:
              limits:
                aliyun.com/gpu-mem: 8    # Total GPU memory in GiB across all cards
            workingDir: /root
          restartPolicy: Never

    Key parameters:

    Parameter Type Description
    aliyun.com/gpu-count String (Pod label) Number of GPU cards to use. Set in metadata.labels. In this example, "2" means the Pod requests GPU memory from 2 cards.
    aliyun.com/gpu-mem Integer (resource limit) Total GPU memory in GiB to request across all GPU cards. Set in resources.limits. In this example, 8 means 8 GiB total — each of the 2 cards provides 4 GiB.

Verify GPU memory isolation

After the Job starts, verify that the Pod can access only its allocated GPU memory.

  1. On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Pods.

  2. In the row for the Pod (for example, tensorflow-mnist-multigpu-***), click Actions > Terminal to open a terminal session. Run the following command:

    nvidia-smi

    The expected output is similar to:

    Wed Jun 14 03:24:14 2023
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
    | N/A   38C    P0    61W / 300W |    569MiB /  4309MiB |      2%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   1  Tesla V100-SXM2...  On   | 00000000:00:0A.0 Off |                    0 |
    | N/A   36C    P0    61W / 300W |    381MiB /  4309MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    +-----------------------------------------------------------------------------+

    Confirm the following in the output:

    • Two GPU cards are listed (GPU 0 and GPU 1), matching aliyun.com/gpu-count: "2".

    • Each card shows 4309 MiB total memory — the requested 4 GiB per card, not the physical 16,160 MiB. This confirms GPU memory isolation is active.

  3. In the row for the same Pod, click Actions > Logs to view the container logs. Confirm the following output appears twice (once per card):

    totalMemory: 4.21GiB freeMemory: 3.91GiB

    The totalMemory value of approximately 4 GiB per card — rather than the physical 16,160 MiB — confirms that GPU memory isolation is working correctly from the application's perspective.