All Products
Search
Document Center

Container Service for Kubernetes:Use GPU-accelerated elastic container instances

Last Updated:Mar 26, 2024

GPU-accelerated elastic container instances come with built-in GPUs and Compute Unified Device Architecture (CUDA) drivers. Therefore, to run a GPU-accelerated elastic container instance, you need only to use a base image that is preinstalled with software such as CUDA Toolkit. You do not need to manually install the GPU driver. This topic describes how to use a GPU-accelerated elastic container instance.

Supported instance type families

GPU-accelerated ECS instance types contain GPUs and are suitable for scenarios such as deep learning and image processing. GPU-related Docker images can be directly run on a GPU-accelerated elastic container instance. A NVIDIA GPU driver is pre-installed in the instance. The supported driver and CUDA versions vary with GPU types.

Category

GPU-accelerated instance family

Driver and CUDA versions

vGPU-accelerated instance families

sgn7i-vws

NVIDIA 470.141.03 and CUDA 11.4

vgn7i-vws

vgn6i-vws

GPU-accelerated compute-optimized instance families

gn7e

  • NVIDIA 470.82.01 and CUDA 11.4 (default)

  • NVIDIA 525.85.12 and CUDA 12.0

gn7i

gn7s

gn7

gn6v

gn6e

gn6i

gn5i

gn5

For more information about ECS instance types, see the following topics:

Configurations

You can add annotations to the metadata in the configuration file of a pod to specify GPU-accelerated ECS instance types. After you specify GPU-accelerated ECS instance types, you must add the nvidia.com/gpu field to the containers.resources section to specify the number of GPUs that you want to allocate to a container.

Important
  • The value of the nvidia.com/gpu field specifies the number of GPUs that you want to allocate to a container. You must specify the field when you create a GPU-accelerated pod. If you do not specify this field, an error is returned when the pod is started.

  • By default, multiple containers in the elastic container instance can share the GPUs. You must make sure that the number of GPUs that you allocate to a single container does not exceed the number of GPUs that the specified GPU-accelerated ECS instance type provides.

Example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  labels:
    app: test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx-test
      labels:
        app: nginx
        alibabacloud.com/eci: "true" 
      annotations:
        k8s.aliyun.com/eci-use-specs: "ecs.gn6i-c4g1.xlarge,ecs.gn6i-c8g1.2xlarge" # Specify a maximum of five GPU-accelerated ECS instance types at a time. 
    spec:
      containers:
      - name: nginx
        image: registry.cn-shanghai.aliyuncs.com/eci_open/nginx:1.14.2
        resources:
            limits:
              nvidia.com/gpu: "1" # The number of GPUs required by the Nginx container. The GPUs are shared. 
        ports:
        - containerPort: 80
      - name: busybox
        image: registry.cn-shanghai.aliyuncs.com/eci_open/busybox:1.30
        command: ["sleep"]
        args: ["999999"]
        resources:
            limits:
              nvidia.com/gpu: "1" # The number of GPUs required by the BusyBox container. The GPUs are shared.

By default, a GPU-accelerated elastic container instance automatically installs the supported driver and CUDA versions based on the specified GPU-accelerated ECS instance type. In some scenarios, you may need to use different driver and CUDA versions for different GPU-accelerated elastic container instances. In this case, you can add annotations to specify the driver and CUDA versions. For example, if you specify ecs.gn6i-c4g1.xlarge as the GPU-accelerated ECS instance type, the default driver and CUDA versions installed are NVIDIA 470.82.01 and CUDA 11.4. After you add the k8s.aliyun.com/eci-gpu-driver-version: tesla=525.85.12 annotation, the driver and CUDA versions installed change to NVIDIA 525.85.12 and CUDA 12.0. The following code provides an example in YAML format.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  labels:
    app: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx-test
      labels:
        app: nginx
        alibabacloud.com/eci: "true" 
      annotations:
        k8s.aliyun.com/eci-use-specs: ecs.gn6i-c4g1.xlarge # Specify the supported GPU-accelerated ECS instance types. The instance types support the change of driver version. 
        k8s.aliyun.com/eci-gpu-driver-version: tesla=525.85.12 # to specify the GPU driver version. 
    spec:
      containers:
      - name: nginx
        image: registry.cn-shanghai.aliyuncs.com/eci_open/nginx:1.14.2
        resources:
            limits:
              nvidia.com/gpu: "1" # The number of GPUs required by the container.