How to configure GPU sharing without GPU memory isolation -

You may require GPU sharing without GPU memory isolation in some scenarios. For example, some applications, such as Java applications, allow you to specify the maximum amount of GPU memory that the applications can use. In this scenario, if you use GPU memory isolation, exceptions may occur. To address this problem, you can disable GPU memory isolation for nodes that support GPU sharing. This topic describes how to configure GPU sharing without GPU memory isolation.

Prerequisites

A Container Service for Kubernetes (ACK) dedicated cluster that contains GPU-accelerated nodes is created. For more information, see Create an ACK dedicated cluster with GPU-accelerated nodes.
The ack-cgpu component is installed. For more information, see Install the ack-cgpu component.

Step 1: Create a node pool

Perform the following steps to create a node pool that has GPU memory isolation disabled.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Nodes > Node Pools in the left-side navigation pane.
On the Node Pools page, click Create Node Pool. In the Create Node Pool dialog box, configure the parameters and click Confirm Order.
The following table describes the key parameters. For more information, see Create a node pool.
- Expected Nodes: Specify the initial number of nodes in the node pool. If you do not want to add nodes to the node pool, set this parameter to 0.
- Node Label: Add GPU sharing labels to nodes. For more information about the labels, see Labels for enabling GPU scheduling policies and methods for changing label values.
  Click the icon, set Key to gpushare, and then set Value to true.

Step 2: Submit a job

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Workloads > Jobs in the left-side navigation pane.

On the Jobs page, click Create from YAML. In the code editor on the Create page, paste the following content and Create.

apiVersion: batch/v1
kind: Job
metadata:
  name: tensorflow-mnist-share
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: tensorflow-mnist-share
    spec:
      containers:
      - name: tensorflow-mnist-share
        image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=100000
        - --data_dir=tensorflow-sample-code/data
        resources:
          limits:
            aliyun.com/gpu-mem: 4 # Request 4 GiB of GPU memory. 
        workingDir: /root
      restartPolicy: Never

Code description:

The YAML content defines a TensorFlow job. The job creates one pod and the pod requests 4 GiB of GPU memory.
You can set aliyun.com/gpu-mem: 4 below resources.limits to request 4 GiB of GPU memory.

Step 3: Verify the configuration

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Workloads > Pods in the left-side navigation pane.
On the Pods page, choose Terminal > tensorflow-mnist-share in the Actions column of the pod that you created in Step 2 to log on to the pod.

Run the following command to query GPU memory information:

nvidia-smi

Expected output:

Wed Jun 14 06:45:56 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
| N/A   35C    P0    59W / 300W |    334MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The output indicates that the GPU allocated to the pod provides 16,384 MiB of memory. In this example, the GPU model is V100. If GPU memory isolation is enabled, the value equals the amount of memory requested by the pod, which is 4 GiB. This indicates that the configuration is in effect.

The application needs to read the GPU memory allocation information from the following environment variables.

ALIYUN_COM_GPU_MEM_CONTAINER=4 # The GPU memory available for the pod. 
ALIYUN_COM_GPU_MEM_DEV=16 # The total GPU memory provided by each GPU.

If the application requires the ratio of available GPU memory, you can use the following formula to calculate the ratio of GPU memory used by the application to total GPU memory provided by the GPU based on the preceding environment variables:

percetange = ALIYUN_COM_GPU_MEM_CONTAINER / ALIYUN_COM_GPU_MEM_DEV = 4 / 16 = 0.25

:Configure GPU sharing without GPU memory isolation

Prerequisites

Step 1: Create a node pool

Step 2: Submit a job

Step 3: Verify the configuration

References