All Products
Search
Document Center

Create a GPU-accelerated elastic container instance

Last Updated: Apr 28, 2021

This topic describes how to create and use a GPU-accelerated elastic container instance.

Background information

You can specify GPU-accelerated Elastic Compute Service (ECS) instance types to create GPU-accelerated elastic container instances. You can run GPU-accelerated Docker images on GPU-accelerated elastic container instances, without the need to install TensorFlow software such as tensorflow-gpu 1.13.1 or CUDA Toolkit software such as NVDIA CUDA.

Note

The GPU driver version supported by GPU-accelerated elastic container instances is NVIDIA 440.64. The CUDA Toolkit version supported by GPU-accelerated elastic container instances is V10.1.

The following GPU-accelerated ECS instance families are supported:

  • gn6v, GPU-accelerated and compute optimized instance family, which uses NVIDIA V100 GPUs and includes multiple instance types such as ecs.gn6v-c8g1.2xlarge

  • gn6i, GPU-accelerated and compute optimized instance family, which uses NVIDIA T4 GPUs and includes multiple instance types such as ecs.gn6i-c4g1.xlarge

  • gn5i, GPU-accelerated and compute optimized instance family, which uses NVIDIA P4 GPUs and includes multiple instance types such as ecs.gn5i-c2g1.large

  • gn5, GPU-accelerated and compute optimized instance family, which uses NVIDIA P100 GPUs and includes multiple instance types such as ecs.gn5-c4g1.xlarge

    Note

    The gn5 instance family is equipped with local disks. You can mount and use local disks on elastic container instances. For more information, see Create elastic container instances equipped with local disks.

For more information, see:

Kubernetes mode

You can specify a GPU-accelerated ECS instance type by adding annotations to the YAML file. Take note of the following items:

  • Add annotations to the spec.template.metadata section.

  • You must declare GPU resources by adding the nvidia.com/gpu field in the containers.resources section.

Notice

The nvidia.com/gpu field indicates the number of GPUs required by a container. You must specify this field when you create a GPU-accelerated elastic container instance. If you do not specify this field, an error is returned when the pod is being started.

Sample code:

apiVersion: apps/v1 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
  name: nginx-gpu-demo-1
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        k8s.aliyun.com/eci-use-specs: ecs.gn5i-c4g1.xlarge  #Specify a GPU-accelerated ECS instance type.
    spec:
      containers:
      - name: nginx
        image: registry-vpc.cn-beijing.aliyuncs.com/eci_open/nginx:1.15.10 # replace it with your exactly <image_name:tags>
        resources:
            limits:
              nvidia.com/gpu: '1'    #Specify the number of GPUs required by a container. This is a required field. If you do not specify this field, an error is returned when the pod is being started.
        ports:
        - containerPort: 80

API mode

When you call the CreateContainerGroup operation to create an elastic container instance, you can use the InstanceType parameter to specify a GPU-accelerated ECS instance type and use the Container.N.Gpu parameter to specify the number of GPUs required by a container. The following table describes the parameters. For more information, see CreateContainerGroup.

Parameter

Type

Required

Example

Description

InstanceType

String

No

ecs.gn6v-c8g1.2xlarge

The ECS instance types. The following GPU-accelerated ECS instance families are supported: gn6v, gn6i, gn5i, and gn5. You can specify up to five ECS instance types at a time. Separate multiple instance types with commas (,). Example: ecs.gn6v-c8g1.2xlarge,ecs.gn6i-c4g1.xlarge.

Container.N.Gpu

Integer

No

1

The number of GPUs allocated to the container.

Notice

When you create a GPU-accelerated elastic container instance, you must specify both the InstanceType and Container.N.Gpu parameters. You must also make sure that the sum of the Container.N.Gpu and InitContainer.N.Gpu values (the total number of GPUs to be allocated to all the containers) does not exceed the number of GPUs provided by each of the specified GPU-accelerated ECS instance types.

You can call the UpdateContainerGroup operation to change the number of GPUs allocated to each container in an existing GPU-accelerated elastic container instance. The following table describes the related parameters. For more information, see UpdateContainerGroup.

Parameter

Type

Required

Example

Description

Container.N.Gpu

Integer

No

1

The number of GPUs allocated to the container.

InitContainer.N.Gpu

Integer

No

1

The number of GPUs allocated to the init container.