In scenarios such as cloud gaming, AI model inference, and deep learning teaching, a single task cannot make full use of the computing and storage resources of a GPU device. To enable small tasks to utilize GPU resources more efficiently, Alibaba Cloud provides lightweight GPU instances based on the virtualization of NVIDIA GPUs. This topic describes how to create a Kubernetes cluster to manage and schedule lightweight GPU workloads.


You have activated Container Service and Resource Access Management (RAM).

Background information

A lightweight GPU instance is an Alibaba Cloud ECS virtual server equipped with a lightweight graphics card. This lightweight graphics card provides a small memory size, which is only a fraction of that of a regular graphics card and varies with the instance type. For more information, see Compute optimized instance families with GPU capabilities.

When you create or expand a Kubernetes cluster, you can select GPU instance types based on your needs. Existing lightweight GPU instances can be added to a Kubernetes cluster, either manually or automatically. The procedure to run tasks on lightweight GPU instances is the same as that of regular GPU instances.

This topic shows how to use lightweight GPU instances by creating a Kubernetes cluster and adding vgn5i instances to the cluster.

Container Service performs the following operations to create a Kubernetes cluster:
  • Creates ECS instances, configures a public key to enable SSH logon from master nodes to other nodes, and configures the Kubernetes cluster through CloudInit.
  • Creates a security group that allows access to the VPC network over ICMP.
  • If you do not specify an existing VPC network, a new VPC network and VSwitch are created, and SNAT rules are created for the VSwitch.
  • Creates VPC routing rules.
  • Creates a NAT gateway and an Elastic IP address.
  • Creates a RAM user and grants it permissions to query, create, and delete ECS instances, permissions to add and delete cloud disks, and all permissions on SLB, CloudMonitor, VPC, Log Service, and NAS. The Kubernetes cluster dynamically creates SLB instances, cloud disks, and VPC routing rules based on your settings.
  • Creates an internal SLB instance and opens port 6443.
  • Creates a public SLB instance and opens ports 6443, 8443, and 22. If you choose to enable SSH logon when you create the cluster, port 22 is enabled. Otherwise, port 22 is not enabled.


  • SLB instances that are created along with the cluster only support the pay-as-you-go billing method.
  • Kubernetes clusters only support VPC networks.
  • By default, each account has specific quotas on the amount of cloud resources that can be created. You cannot create clusters if the quota limit is exceeded. Make sure that you have sufficient quotas before you create a cluster.

    To request a quota increase, submit a ticket.

    • An account can create up to 5 clusters in all regions. A cluster can contain up to 40 nodes. To create more clusters or nodes, submit a ticket.
      Note In a Kubernetes cluster, you can create up to 48 route entries per VPC. This means that a cluster can contain up to 48 nodes. To increase the number of nodes, submit a ticket to increase the number of route entries first.
    • An account can create up to 100 security groups.
    • An account can create up to 60 pay-as-you-go SLB instances.
    • An account can create up to 20 Elastic IP addresses.
  • The limits on ECS instances are as follows:
    • Only the CentOS operating system is supported.
    • The pay-as-you-go and subscription billing methods are supported.
    Note After an ECS instance is created, you can change its billing method from pay-as-you-go to subscription in the console. For more information, see Switch the billing method from pay-as-you-go to subscription.

Create a Kubernetes cluster with vgn5i instances

Both dedicated and managed Kubernetes clusters support vgn5i instances. The following example shows how to create a managed Kubernetes cluster with vgn5i instances.

  1. Log on to the Container Service console.
  2. In the left-side navigation pane, choose Clusters > Clusters to go to the Clusters page.
  3. On the Select Cluster Template page that appears, select Standard Managed Cluster and click Create to go to the Managed Kubernetes page.
    Note To create a GPU cluster, select ECS instance types with GPU capabilities to create worker nodes. For more information about other parameters, see Create an ACK cluster.
  4. Configure worker nodes. In this example, select Light-weight Compute Optimized Type with GPU vgn5i as the instance type of worker nodes.
    • To create new instances, you need to specify the instance family, instance type, and the number of worker nodes to be created. In this example, two GPU nodes are created and the instance type is ecs.vgn5i-m2.xlarge.
    • To add existing instances, you must create lightweight GPU instances in the target region in advance.
  5. Specify the other parameters and click Create Cluster to start the deployment.
    After the cluster is created, click Clusters > Nodes to go to the Nodes page.

    Select the target cluster from the Clusters drop-down list. Find the newly created node and click More > Details to view the GPU devices that are attached to the node.

    In the Overview section, you can view the labels that are attached to the lightweight GPU node, which indicate the number of GPU devices, the GPU memory, and the virtualized GPU device name. For example, aliyun.accelerator/nvidia_count: 1, aliyun.accelerator/nvidia_mem: 2048MiB, and aliyun.accelerator/nvidia_name: GRID-P4-2Q.

Run a CUDA container to test the lightweight GPU nodes

You can use lightweight GPU nodes in the same way that you use regular GPU nodes. In the Pod configuration file, specify that the resource is required to run the task. Each GPU instance has only one lightweight GPU device. You need to set to 1.
Note If your cluster contains both lightweight GPU instances and regular GPU instances, you must set nodeSelector and node labels to schedule Pods to different nodes.

The following example uses a CUDA sample application to test GPU devices.

  1. In the left-side navigation pane, choose Applications > Deployments to go to the Deployments page.
  2. In the upper-right corner, click Create from Template.
  3. Select the target cluster and namespace. Set the Sample Template field to Custom and enter the following code in the Template field. Then click Create.

    In this example, the sample template defines a deployment that runs a CUDA container.

    The aliyun.accelerator/nvidia_mem: 2048MiB label is used to deploy the Pod to the lightweight GPU node created in step Create a Kubernetes cluster with vgn5i instances.

    apiVersion: apps/v1
    kind: Deployment
        name: cuda-sample
        app: cuda-sample
        replicas: 1
            app: cuda-sample
            app: cuda-sample
          aliyun.accelerator/nvidia_mem: 2048MiB
        - name: cuda-sample
          image: cuda:9.0-cudnn7-devel-ubuntu16.04
          command: ["tail", "-f", "/dev/null"]
  4. After cuda-sample is started, run the kubectl exec command to view the GPU device status and CUDA sample code in the container.
    1. Connect to Kubernetes clusters through kubectl.
    2. Run the following command to obtain the name of the Pod:
      $kubectl get pod
      NAME                           READY   STATUS    RESTARTS   AGE
      cuda-sample-79f9fc9cc5-hlpp5   1/1     Running   0          4h27m
    3. Run the following command to view the GPU device information:
      The output shows that the container has a GPU device named GRID P4-2Q, which provides a memory of 2,048 MiB.
      $kubectl exec -it cuda-sample-79f9fc9cc5-hlpp5 bash
      root@cuda-sample-79f9fc9cc5-hlpp5:/# nvidia-smi
      Thu Nov 21 14:45:45 2019
      | NVIDIA-SMI 418.70       Driver Version: 418.70       CUDA Version: 10.1     |
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |   0  GRID P4-2Q          On   | 00000000:00:07.0 Off |                  N/A |
      | N/A   N/A    P8    N/A /  N/A |    144MiB /  2048MiB |      0%      Default |
      | Processes:                                                       GPU Memory |
      |  GPU       PID   Type   Process name                             Usage      |
      |  No running processes found                                                 |
    4. Run the following commands to view the CUDA version, install the CUDA sample application, and test the sample code:
      root@cuda-sample-79f9fc9cc5-hlpp5:/# nvcc -V
      nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2017 NVIDIA Corporation
      Built on Fri_Sep__1_21:08:03_CDT_2017
      Cuda compilation tools, release 9.0, V9.0.176
      root@cuda-sample-79f9fc9cc5-hlpp5:/# apt-get update && apt-get install -y cuda-samples-9-0
      root@cuda-sample-79f9fc9cc5-hlpp5:/# cd /usr/local/cuda-9.0/samples/0_Simple/simpleTexture
      root@cuda-sample-79f9fc9cc5-hlpp5:/usr/local/cuda-9.0/samples/0_Simple/simpleTexture# make
      root@cuda-sample-79f9fc9cc5-hlpp5:/usr/local/cuda-9.0/samples/0_Simple/simpleTexture# ./simpleTexture
      simpleTexture starting...
      GPU Device 0: "GRID P4-2Q" with compute capability 6.1
      Loaded 'lena_bw.pgm', 512 x 512 pixels
      Processing time: 0.038000 (ms)
      6898.53 Mpixels/sec
      Wrote './data/lena_bw_out.pgm'
      Comparing files
          output:    <./data/lena_bw_out.pgm>
          reference: <./data/ref_rotated.pgm>
      simpleTexture completed, returned OK