All Products
Search
Document Center

Create a GPU-accelerated elastic container instance

Last Updated: Jun 25, 2021

This topic describes how to create and use a GPU-accelerated elastic container instance.

Background information

You can specify GPU-accelerated Elastic Compute Service (ECS) instance types to create GPU-accelerated elastic container instances. You can run GPU-accelerated Docker images on GPU-accelerated elastic container instances without the need to install TensorFlow software such as tensorflow-gpu 1.13.1 or CUDA Toolkit software such as NVIDIA CUDA.

Note

The GPU driver version supported by GPU-accelerated elastic container instances is NVIDIA 460.73.01. The CUDA Toolkit version supported by GPU-accelerated elastic container instances is 11.2.

The following GPU-accelerated ECS instance families are supported:

  • gn6v, GPU-accelerated compute optimized instance family that uses NVIDIA V100 GPUs and includes a variety of instance types such as ecs.gn6v-c8g1.2xlarge

  • gn6i, GPU-accelerated compute optimized instance family that uses NVIDIA T4 GPUs and includes a variety of instance types such as ecs.gn6i-c4g1.xlarge

  • gn5i, GPU-accelerated compute optimized instance family that uses NVIDIA P4 GPUs and includes a variety of instance types such as ecs.gn5i-c2g1.large

  • gn5, GPU-accelerated compute optimized instance family that uses NVIDIA P100 GPUs and includes a variety of instance types such as ecs.gn5-c4g1.xlarge

    Note

    The gn5 instance family is equipped with local disks. You can attach and use local disks on elastic container instances. For more information, see Create an elastic container instance that has local disks attached.

For more information, visit the following links:

Kubernetes mode

You can add annotations to metadata in the configuration file of a pod to specify a GPU-accelerated ECS instance type. After you specify a GPU-accelerated ECS instance type, you must add the nvidia.com/gpu field to the containers.resources section to declare GPU resources.

Notice

The nvidia.com/gpu field indicates the number of GPUs to be allocated to a container. You must specify this field when you create a GPU-accelerated elastic container instance. If you do not specify this field, an error is returned when the pod is being started.

Sample code:

apiVersion: apps/v1 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
  name: nginx-gpu-demo-1
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        k8s.aliyun.com/eci-use-specs: ecs.gn5i-c4g1.xlarge  #Specify a GPU-accelerated ECS instance type.
    spec:
      containers:
      - name: nginx
        image: registry-vpc.cn-beijing.aliyuncs.com/eci_open/nginx:1.15.10 # replace it with your exactly <image_name:tags>
        resources:
            limits:
              nvidia.com/gpu: '1'    #Specify the number of GPUs to be allocated to a container. This is a required field. If you do not specify this field, an error is returned when the pod is being started. 
        ports:
        - containerPort: 80

API mode

When you call the CreateContainerGroup operation to create an elastic container instance, you can use the InstanceType parameter to specify a GPU-accelerated ECS instance type and use the Container.N.Gpu parameter to specify the number of GPUs to be allocated to a container. The following table describes these parameters. For more information, see CreateContainerGroup.

Parameter

Type

Required

Example

Description

InstanceType

String

No

ecs.gn6v-c8g1.2xlarge

The GPU-accelerated ECS instance types. The following GPU-accelerated ECS instance families are supported: gn6v, gn6i, gn5i, and gn5. You can specify up to five ECS instance types at a time. Separate multiple instance types with commas (,). Example: ecs.gn6v-c8g1.2xlarge,ecs.gn6i-c4g1.xlarge.

Container.N.Gpu

Integer

No

1

The number of GPUs to be allocated to the container.

Notice

When you create a GPU-accelerated elastic container instance, you must specify both the InstanceType and Container.N.Gpu parameters. You must make sure that the sum of the Container.N.Gpu and InitContainer.N.Gpu values (the total number of GPUs to be allocated to all the containers) does not exceed the number of GPUs provided by each of the specified GPU-accelerated ECS instance types.

You can call the UpdateContainerGroup operation to change the number of GPUs allocated to each container in an existing GPU-accelerated elastic container instance. The following table describes these parameters. For more information, see UpdateContainerGroup.

Parameter

Type

Required

Example

Description

Container.N.Gpu

Integer

No

1

The number of GPUs allocated to the container.

InitContainer.N.Gpu

Integer

No

1

The number of GPUs allocated to the init container.