Configure GPU sharing without GPU memory isolation - Container Service for Kubernetes

Some AI/ML workloads manage GPU memory through framework-level APIs or custom allocation logic. For those workloads, ACK's GPU memory isolation layer is redundant and may interfere with the application's own memory management. This topic shows you how to enable GPU sharing on a node pool without installing the GPU memory isolation module.

Important

Use this mode only when your application already manages GPU memory limits internally.

If you need both GPU sharing and memory isolation, see Configure GPU sharing with memory isolation.

Prerequisites

Before you begin, ensure that you have:

An ACK Pro cluster. See Create an ACK Pro cluster
The GPU inspection tool installed. See Install the GPU inspection tool

How it works

When GPU sharing is enabled without memory isolation:

The node label ack.node.gpu.schedule=share activates GPU sharing on the node pool.
The aliyun.com/gpu-mem resource limit tells the scheduler how much GPU memory the pod requests. ACK uses this value for scheduling decisions and ratio calculations, but does not enforce it as a hard memory cap.
The pod sees the full physical GPU memory (for example, 16,384 MiB on a V100). ACK injects two environment variables — ALIYUN_COM_GPU_MEM_CONTAINER and ALIYUN_COM_GPU_MEM_DEV — that the application uses to calculate its share and stay within the requested limit.

Step 1: Create a node pool

Log on to the ACK console and click Clusters in the left-side navigation pane.
Click the name of the cluster, then choose Nodes > Node Pools in the left-side navigation pane.
On the Node Pools page, click Create Node Pool.

In the Create Node Pool dialog box, configure the following parameters, then click Confirm Order. For all other parameters, see Create and manage a node pool.

Setting ack.node.gpu.schedule=share enables GPU sharing on the node pool without installing the GPU memory isolation module. For all supported GPU scheduling labels, see Labels for enabling GPU scheduling policies.

Parameter	Value
Instance type	Set Architecture to GPU-accelerated and select one or more GPU instance types. This example uses V100 instances.
Expected nodes	Set to `0` if you do not want to provision nodes immediately.
Node labels	Click Add Label, set Key to `ack.node.gpu.schedule`, and set Value to `share`.

Step 2: Submit a job

Log on to the ACK console and click Clusters in the left-side navigation pane.
Click the name of the cluster, then choose Workloads > Jobs in the left-side navigation pane.

Click Create from YAML, paste the following YAML into the Template section, and click Create.

apiVersion: batch/v1
kind: Job
metadata:
  name: tensorflow-mnist-share
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: tensorflow-mnist-share
    spec:
      containers:
      - name: tensorflow-mnist-share
        image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=100000
        - --data_dir=tensorflow-sample-code/data
        resources:
          limits:
            aliyun.com/gpu-mem: 4  # Request 4 GiB of GPU memory
        workingDir: /root
      restartPolicy: Never

The aliyun.com/gpu-mem: 4 limit requests 4 GiB of GPU memory for scheduling purposes. In this mode, it does not prevent other pods from using the remaining GPU memory on the same device.

Verify the configuration

To confirm that GPU sharing is active without memory isolation, check that the pod sees the full physical GPU memory rather than only the requested amount.

On the Clusters page, click the cluster name, then choose Workloads > Pods in the left-side navigation pane.

In the Actions column of the pod (for example, tensorflow-mnist-share-***), click Terminal and run:

nvidia-smi

Expected output:

Wed Jun 14 06:45:56 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
| N/A   35C    P0    59W / 300W |    334MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

What to check: The denominator in the Memory-Usage field must show the full physical GPU memory — 16384MiB for a V100. If it shows 4096MiB instead, GPU memory isolation is active on this node.

Verify the environment variables ACK injects into the pod:

ALIYUN_COM_GPU_MEM_CONTAINER=4   # GPU memory requested by this pod (GiB)
ALIYUN_COM_GPU_MEM_DEV=16        # Total physical GPU memory (GiB)

The application uses these variables to calculate its memory usage ratio and stay within the requested limit:

percentage = ALIYUN_COM_GPU_MEM_CONTAINER / ALIYUN_COM_GPU_MEM_DEV = 4 / 16 = 0.25

Container Service for Kubernetes:Configure GPU sharing without GPU memory isolation

Prerequisites

How it works

Step 1: Create a node pool

Step 2: Submit a job

Verify the configuration

What's next