All Products
Search
Document Center

Container Service for Kubernetes:Configure GPU sharing without GPU memory isolation

Last Updated:Mar 26, 2026

Some AI/ML workloads manage GPU memory through framework-level APIs or custom allocation logic. For those workloads, ACK's GPU memory isolation layer is redundant and may interfere with the application's own memory management. This topic shows you how to enable GPU sharing on a node pool without installing the GPU memory isolation module.

Important

Use this mode only when your application already manages GPU memory limits internally.

If you need both GPU sharing and memory isolation, see Configure GPU sharing with memory isolation.

Prerequisites

Before you begin, ensure that you have:

How it works

When GPU sharing is enabled without memory isolation:

  • The node label ack.node.gpu.schedule=share activates GPU sharing on the node pool.

  • The aliyun.com/gpu-mem resource limit tells the scheduler how much GPU memory the pod requests. ACK uses this value for scheduling decisions and ratio calculations, but does not enforce it as a hard memory cap.

  • The pod sees the full physical GPU memory (for example, 16,384 MiB on a V100). ACK injects two environment variables — ALIYUN_COM_GPU_MEM_CONTAINER and ALIYUN_COM_GPU_MEM_DEV — that the application uses to calculate its share and stay within the requested limit.

Step 1: Create a node pool

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.

  2. Click the name of the cluster, then choose Nodes > Node Pools in the left-side navigation pane.

  3. On the Node Pools page, click Create Node Pool.

  4. In the Create Node Pool dialog box, configure the following parameters, then click Confirm Order. For all other parameters, see Create and manage a node pool.

    Setting ack.node.gpu.schedule=share enables GPU sharing on the node pool without installing the GPU memory isolation module. For all supported GPU scheduling labels, see Labels for enabling GPU scheduling policies.
    Parameter Value
    Instance type Set Architecture to GPU-accelerated and select one or more GPU instance types. This example uses V100 instances.
    Expected nodes Set to 0 if you do not want to provision nodes immediately.
    Node labels Click Add Label, set Key to ack.node.gpu.schedule, and set Value to share.

Step 2: Submit a job

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.

  2. Click the name of the cluster, then choose Workloads > Jobs in the left-side navigation pane.

  3. Click Create from YAML, paste the following YAML into the Template section, and click Create.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: tensorflow-mnist-share
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: tensorflow-mnist-share
        spec:
          containers:
          - name: tensorflow-mnist-share
            image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=100000
            - --data_dir=tensorflow-sample-code/data
            resources:
              limits:
                aliyun.com/gpu-mem: 4  # Request 4 GiB of GPU memory
            workingDir: /root
          restartPolicy: Never

    The aliyun.com/gpu-mem: 4 limit requests 4 GiB of GPU memory for scheduling purposes. In this mode, it does not prevent other pods from using the remaining GPU memory on the same device.

Verify the configuration

To confirm that GPU sharing is active without memory isolation, check that the pod sees the full physical GPU memory rather than only the requested amount.

  1. On the Clusters page, click the cluster name, then choose Workloads > Pods in the left-side navigation pane.

  2. In the Actions column of the pod (for example, tensorflow-mnist-share-***), click Terminal and run:

    nvidia-smi

    Expected output:

    Wed Jun 14 06:45:56 2023
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
    | N/A   35C    P0    59W / 300W |    334MiB / 16384MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    +-----------------------------------------------------------------------------+

    What to check: The denominator in the Memory-Usage field must show the full physical GPU memory — 16384MiB for a V100. If it shows 4096MiB instead, GPU memory isolation is active on this node.

  3. Verify the environment variables ACK injects into the pod:

    ALIYUN_COM_GPU_MEM_CONTAINER=4   # GPU memory requested by this pod (GiB)
    ALIYUN_COM_GPU_MEM_DEV=16        # Total physical GPU memory (GiB)

    The application uses these variables to calculate its memory usage ratio and stay within the requested limit:

    percentage = ALIYUN_COM_GPU_MEM_CONTAINER / ALIYUN_COM_GPU_MEM_DEV = 4 / 16 = 0.25

What's next