All Products
Search
Document Center

ECI GPU examples

Last Updated: Aug 29, 2019

GPU-enabled container groups are designed for the following purposes:

Each container group provided by Alibaba Cloud Elastic Container Instance (ECI) has a pre-installed GPU driver and requires no other software, such as TensorFlow and CUDA Toolkit. These software programs can run in the container group directly through a GPU-embedded docker image, for example, tensorflow/tensorflow:1.13.1-gpu, nvidia/cuda.

Usage

API

  • CreateContainerGroup

This operation is described in the API reference. To support GPUs, an extra request parameter is required.

Additional request parameter

Parameter Type Required Description
InstanceType String Yes The instance type.

Additional parameter in container settings

Parameter Type Required Description
Gpu Integer Yes The number of GPUs in the container.

The following table lists the capacities of different InstanceType values.

vCPU Memory (GiB) GPU type GPU count InstanceType
2 8.0 P4 1 ecs.gn5i-c2g1.large
4 16.0 P4 1 ecs.gn5i-c4g1.xlarge
8 32.0 P4 1 ecs.gn5i-c8g1.2xlarge
16 64.0 P4 1 ecs.gn5i-c16g1.4xlarge
32 128.0 P4 2 ecs.gn5i-c16g1.8xlarge
56 224.0 P4 2 ecs.gn5i-c28g1.14xlarge
8 32.0 V100 1 ecs.gn6v-c8g1.2xlarge
32 128.0 V100 4 ecs.gn6v-c8g1.8xlarge
64 256.0 V100 8 ecs.gn6v-c8g1.16xlarge

When calling the CreateContainerGroup operation to create a GPU-enabled container group, you must specify the InstanceType parameter. If you do not specify this parameter but specify Gpu for Container, an error code is returned.

A GPU-enabled container group consists of containers. The sum of Gpu values for all containers in the container group cannot exceed the GPU count specified by InstanceType. Otherwise, the CreateContainerGroup request fails.

  • UpdateContainerGroup

This operation is similar to CreateContainerGroup. To change the number of GPUs in a container, you must specify the Gpu parameter for the container in the UpdateContainerGroup operation.

Additional parameter in container settings

Parameter Type Required Description
Gpu Integer Yes The number of GPUs in the container.
  • Retain the original parameters for other operations, such as RestartContainerGroup and DeleteContainerGroup.

Example of API use

The following example uses Go SDK. The method for creating a container group by using Python or Java is similar. The following describes the simple steps of creating a GPU-enabled container group.

  1. Create an ECI client
Parameter Type Required Description
region String Yes The ID of the region in which the container group resides.
accessKey String Yes The AccessKey ID of the user.
secretKey String Yes The AccessKey Secret of the user.
  1. eciClient, err = eci.NewClientWithAccessKey(region, accessKey, secretKey)
  1. Create a container group

Add the InstanceType parameter to the CreateContainerGroup operation and the Gpu field in the container settings.

  1. request := eci.CreateCreateContainerGroupRequest()
  2. request.SecurityGroupId = sg-xxx
  3. request.VSwitchId = vsw-xxx
  4. request.ContainerGroupName = name-xxx
  5. request.InstanceType = ecs.xxx
  6. //Create a container
  7. containers := make([]eci.CreateContainer, 1, 1)
  8. c := eci.CreateContainer{
  9. Name: name-xxx,
  10. Image: "tensorflow/tensorflow:1.13.1-gpu-py3",
  11. Cpu: 2,
  12. Memory: 4,
  13. Gpu: 2,
  14. }
  15. containers = append(containers, c)
  16. //End
  17. request.Containers = containers
  18. response, err := ecsClient.CreateContainerGroup(request)
  1. Update a container group

To update the GPU setting for a container, you only need to set the number of GPUs, as shown below:

  1. request := eci.CreateUpdateContainerGroupRequest()
  2. request.ContainerGroupId = "eci-xxx" //The container group ID must exist.
  3. //Create a container
  4. containers := make([]eci.UpdateContainer, 1, 1)
  5. c := eci.UpdateContainer{
  6. Name: name-xxx, //The name must exist.
  7. Gpu: 1,
  8. }
  9. containers = append(containers, c)
  10. //End
  11. request.Containers = containers
  12. response, err := ecsClient.UpdateContainerGroup(request)
  1. Delete a container group

This operation is similar to the preceding operations.

  1. request := eci.CreateDeleteContainerGroupRequest()
  2. request.ContainerGroupId = "eci-xxx"
  3. response, err := eciClient.DeleteContainerGroup(request)

Usage of Virtual Kubelet

Add annotations and virtual-kubelet.io/gpu-type in the pod declaration.

Add resources, limits, and nvidia.com/gpu to the container declaration.

The valid values of virtual-kubelet.io/gpu-type are P4 and V100.

The valid values of resources, limits, and nvidia.com/gpu are 1, 2, 4, and 8.

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: podname
  5. annotations:
  6. "virtual-kubelet.io/gpu-type" : "P4"
  7. spec:
  8. containers:
  9. resources:
  10. limits:
  11. "nvidia.com/gpu": "1"

Declare the ECS instance type directly

This method directly declares the ECS instance type that you want to use.

Add annotations k8s.aliyun.com/eci-instance-type to the pod declaration.

This declaration takes precedence over the preceding type.

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: podname
  5. annotations:
  6. "k8s.aliyun.com/eci-instance-type" : "ecs.gn5i-c4g1.xlarge"