You can partition an NVIDIA A100 GPU into up to seven separate GPU instances without security risks. This way, the GPU can provide separate GPU resources for different users. This improves the utilization of GPU resources. This topic describes how to use a node pool to enable the Multi-Instance GPU (MIG) feature for nodes equipped with NVIDIA A100 GPUs.

Prerequisites

  • A Kubernetes cluster for heterogeneous computing is created. The cluster must use Kubernetes 1.20.4 or later. For more information, see Create a Kubernetes cluster for heterogeneous computing.
    Notice Only Elastic Compute Service (ECS) bare metal instances that are equipped with NVIDIA A100 GPUs support the MIG feature. These instances belong to the instance types whose names start with ecs.ebmgn7.. ECS bare metal instances that are equipped with NVIDIA A100 GPUs and also use GPU passthrough do not support the MIG feature. These instances belong to the instance types whose names start with ecs.gn7.. For more information about ECS bare metal instances, see Overview.
  • A kubectl client is connected to the cluster. For more information, see Connect to ACK clusters by using kubectl.

Background information

At GPU Technology Conference (GTC) 2020, NVIDIA released the Tesla A100 GPU that is based on the Ampere architecture. The Ampere architecture is the next-generation architecture that inherits the benefits of the previous Volta and Turing architectures. The Ampere architecture also provides new features, including MIG.

An NVIDIA A100 GPU consists of seven compute units and eight memory units. You can partition an NVIDIA A100 GPU into multiple GPU instances that have the same or different combinations of compute units and memory units.
Note Each memory unit contains 5 GB of GPU memory.
For more information about the features of NVIDIA A100 GPUs, see NVIDIA A100 Multi-Instance GPU User Guide.
The specification of a GPU instance is in the following format: [compute]g.[memory]gb. For example, the specification of a GPU instance with one compute unit and one memory unit is 1g.5gb. The following table describes the specifications and numbers of different MIG instances that can be partitioned from an NVIDIA A100 GPU.
Specification Specification ID Maximum number of instances available Number of compute units per instance Number of memory units per instance
1g.5gb 19 7 1 1 (5 GB)
2g.10gb 14 3 2 2 (10 GB)
3g.20gb 9 2 3 4 (20 GB)
4g.20gb 5 1 4 4 (20 GB)
7g.40gb 0 1 7 8 (40 GB)

Use node pool labels to partition a GPU

When you create a node pool, you can add a label to the node pool to partition a GPU into multiple MIG instances. You can use the following methods to partition a GPU into multiple MIG instances:
  • Specify a MIG instance specification or a MIG specification ID

    This method allows you to partition an NVIDIA A100 GPU into multiple MIG instances of the same specification.

  • Specify a sequence of MIG instance specifications

    This method allows you to partition an NVIDIA A100 GPU into multiple MIG instances of different specifications.

Method 1: Specify a MIG instance specification or a MIG specification ID

  • Add a label with the following key to the node pool: ack.aliyun.com/gpu-partition-size.
  • Set the value of the label to a MIG instance specification or a MIG specification ID, for example, 1g.5gb or 19.
  • Example: ack.aliyun.com/gpu-partition-size=1g.5gb or ack.aliyun.com/gpu-partition-size=19.
  • Description: Each NVIDIA A100 GPU is partitioned into as many MIG instances of the specified specification as possible. The preceding table shows that an NVIDIA A100 GPU can be partitioned into up to 7 MIG instances of the 1g.5gb specification.
Figure 1. Example of labels added in the ACK console
Example of node labels

Method 2: Specify a sequence of MIG instance specifications

  • Add a label with the following key to the node pool: ack.aliyun.com/gpu-partition-sequence.
  • Set the value of the label to a sequence of MIG instance specifications or MIG specification IDs, for example, 1g.5gb-2g.10gb-3g.20gb or 19-14-9. However, only a few particular sequences are supported.

    NVIDIA A100 GPUs support 18 sequences. For more information, see the "A100 Supported Profiles" section of the NVIDIA A100 Multi-Instance GPU User Guide.

    Container Service for Kubernetes (ACK) supports only the following sequences.
    Sequence of MIG specification IDs Sequence of MIG specifications
    19-19-19-19-19-19-19 1g.5gb-1g.5gb-1g.5gb-1g.5gb-1g.5gb-1g.5gb-1g.5gb
    14-14-14-19 2g.10gb-2g.10gb-2g.10gb-1g.5gb
    9-9 3g.20gb-3g.20gb
    5-19-19-19 4g.20gb-1g.5gb-1g.5gb-1g.5gb
  • Example: ack.aliyun.com/gpu-partition-sequence=3g.20gb-3g.20gb or ack.aliyun.com/gpu-partition-sequence=9-9.
  • Description: Each NVIDIA A100 GPU is partitioned into MIG instances of the specifications specified in the sequence. For example, a sequence of 3g.20gb-3g.20gb specifies that an NVIDIA A100 GPU is partitioned into two MIG instances of the 3g.20gb specification.
    Notice Commas (,) are not supported when you specify the value of a label in the ACK console. Separate multiple MIG specifications or MIG specification IDs with hyphens (-). For example, the ack.aliyun.com/gpu-partition-sequence=1g.5gb,2g.10gb,3g.20gb label is invalid. The valid format is ack.aliyun.com/gpu-partition-sequence=1g.5gb-2g.10gb-3g.20gb.
Figure 2. Example of labels added in the ACK console
Example of node labels

Create a node pool

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
  4. In the left-side navigation pane of the details page, choose Nodes > Node Pools.
  5. In the upper-right corner of the Node Pools page, click Create Node Pool.
  6. In the Create Node Pool dialog box, set the node pool parameters.
    For more information about the parameters, see Create a managed Kubernetes cluster with GPU-accelerated nodes. The following table describes some of the parameters.
    Parameter Description
    Instance Type Choose ECS Bare Metal Instance > GPU Type. Then, select an instance type whose name starts with ecs.ebmgn7..
    Node Label Click the Add icon icon. Set Key to ack.aliyun.com/gpu-partition-sequence or ack.aliyun.com/gpu-partition-sequence. Then, set Value to the MIG specification sequence or MIG specification ID sequence that you want to use.

    For more information about node pool labels, see Use node pool labels to partition a GPU. In this example, the ack.aliyun.com/gpu-partition-sequence=3g.20gb-3g.20gb label is added.

    Add a node label
  7. After you set the parameters, click Confirm Order.

Verify that MIG is enabled for nodes in the node pool

After the node pool is created, perform the following steps to verify that the MIG feature is enabled for nodes in the node pool.

  1. Run the following command to query nodes that have MIG enabled:
    kubectl get nodes -l aliyun.accelerator/nvidia_mig_sequence
    Expected output:
    NAME                      STATUS   ROLES    AGE   VERSION
    cn-beijing.192.168.XX.XX   Ready    <none>   99s   v1.20.4-aliyun.1
  2. Run the following command to query the MIG instances that are partitioned from the node.
    Query the resources whose names are prefixed with nvidia.com/mig-.
    kubectl get nodes cn-beijing.192.168.XX.XX -o yaml | grep " nvidia.com/mig-" | uniq

    Expected output:

    nvidia.com/mig-3g.20gb: "16"
    The output shows that the MIG resource name is nvidia.com/mig-3g.20gb and a total of 16 MIG instances of the 3g.20gb specification are available.
    Note The MIG resource name that is returned varies based on the specified MIG instance specification or MIG specification sequence. In this example, the MIG specification sequence is specified by the ack.aliyun.com/gpu-partition-sequence=3g.20gb-3g.20gb label.
  3. Apply for MIG resources.
    To request MIG resources for a pod, you must specify the MIG resource name and quantity that you want to request in the resources.limits field of the pod YAML template.
    // Other settings are not shown. 
            resources:
              limits:
                nvidia.com/mig-<MIG instance specification, for example, 3g.20gb>: <Quantity>
    .....
    1. Create a mig-sample.yaml file and copy the following content to the file.
      The following sample configuration applies for two MIG instances of the 3g.20gb specification.
      cat > /tmp/mig-sample.yaml <<- EOF
      apiVersion: batch/v1
      kind: Job
      metadata:
        name: pytorch-mnist
      spec:
        parallelism: 1
        template:
          spec:
            containers:
            - name: pytorch-mnist
              image: registry.cn-beijing.aliyuncs.com/ai-samples/nvidia-pytorch-sample:20.11-py3
              command:
              - python
              - main.py
              resources:
                limits:
                  nvidia.com/mig-3g.20gb: 2 # Apply for two MIG instances of the 3g.20gb specification. 
            restartPolicy: Never
      EOF
    2. Run the following command to deploy the application:
      kubectl apply -f /tmp/mig-sample.yaml
  4. Check whether the application pod is in the Running state. Then, query the pod log.
    1. Run the following command to check whether the pod is in the Running state:
      kubectl get pod
      Expected output:
      NAME                  READY   STATUS    RESTARTS   AGE
      pytorch-mnist-t2ncm   1/1     Running   0          2m25s
    2. If the pod is in the Running state, run the following command to query the pod log:
      kubectl logs pytorch-mnist-t2ncm
      Expected output:
      /opt/conda/lib/python3.6/site-packages/torchvision/datasets/mnist.py:480: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:141.)
        return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
      cuda is available:  True
      current device:  0
      current device name:  A100-SXM4-40GB MIG 3g.20gb
      Using downloaded and verified file: ../data/MNIST/raw/train-images-idx3-ubyte.gz
      Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw
      Using downloaded and verified file: ../data/MNIST/raw/train-labels-idx1-ubyte.gz
      Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw
      Using downloaded and verified file: ../data/MNIST/raw/t10k-images-idx3-ubyte.gz
      Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw
      Using downloaded and verified file: ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz
      Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw
      Processing...
      Done!
      ......
  5. Connect to the pod and query the NVIDIA GPU resources that are allocated.
    1. Run the following command to connect to the pod:
      kubectl exec -ti pytorch-mnist-t2ncm bash
    2. Run the following command to view the NVIDIA GPU resources allocated to the pod:
      nvidia-smi -L

      Expected output:

      GPU 0: A100-SXM4-40GB (UUID: GPU-da5ae315-b486-77ea-9527-3cbbf5b9****)
        MIG 3g.20gb Device 0: (UUID: MIG-GPU-da5ae315-b486-77ea-9527-3cbbf5b9****/1/0)
      GPU 1: A100-SXM4-40GB (UUID: GPU-cc7ebb1f-902c-671c-3fcb-8daf49cc****)
        MIG 3g.20gb Device 0: (UUID: MIG-GPU-cc7ebb1f-902c-671c-3fcb-8daf49cc****/2/0)

      The output shows that two MIG instances of the 3g.20gb specification are allocated to the pod.

  6. After the pod is terminated, run the following command to query the pod status:
    kubectl get pod

    Expected output:

    NAME                  READY   STATUS      RESTARTS   AGE
    pytorch-mnist-t2ncm   0/1     Completed   0          3m1s

    If the pod is in the Completed state, it indicates that the MIG feature is enabled for the nodes in the node pool.