This topic describes how to create an Elastic Container Instance (ECI) pod with a specific Elastic Compute Service (ECS) GPU-accelerated instance type and change the GPU driver version.
Supported instance types
For more information about ECS instance types, see the following topics:
Configuration
Add the k8s.aliyun.com/eci-use-specs annotation to the pod metadata to specify a GPU instance type. After specifying the instance type, you must add the nvidia.com/gpu field in the resources section of a container to request GPU resources.
The
nvidia.com/gpufield specifies the number of GPUs the container requires. You must specify this field when you create a GPU pod. If this field is not specified, pod creation will fail.By default, multiple containers can share GPUs. Make sure that the number of GPUs configured for a single container does not exceed the total number of GPUs in the specified instance type.
The following is a sample YAML file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
labels:
app: test
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
name: nginx-test
labels:
app: nginx
alibabacloud.com/eci: "true"
annotations:
# Specify up to five GPU-accelerated ECS instance types at a time, separated by commas.
# The system selects the first available type from the list.
k8s.aliyun.com/eci-use-specs: "ecs.gn6i-c4g1.xlarge,ecs.gn6i-c8g1.2xlarge"
spec:
containers:
- name: nginx
image: registry-us-east-1.aliyuncs.com/eci_open/nginx:1.14.2
resources:
limits:
nvidia.com/gpu: "1" # Request 1 GPU for this container (GPUs are shared by default).
ports:
- containerPort: 80
- name: busybox
image: registry-us-east-1.aliyuncs.com/eci_open/busybox:1.30
command: ["sleep"]
args: ["999999"]
resources:
limits:
nvidia.com/gpu: "1" # Request 1 GPU for this container (GPUs are shared by default).By default, an ECI GPU instance automatically installs a supported driver and CUDA version based on the specified GPU instance type. If your workload requires a different driver version, you can add the k8s.aliyun.com/eci-gpu-driver-version annotation to specify the driver version.
For example, if you specify the ecs.gn6i-c4g1.xlarge instance type, the default driver is Tesla 550 and the default CUDA version is 12.4. Adding the k8s.aliyun.com/eci-gpu-driver-version: tesla=535 annotation changes the driver to Tesla 535 and the CUDA version to 12.2. The following is a sample YAML file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
labels:
app: test
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
name: nginx-test
labels:
app: nginx
alibabacloud.com/eci: "true"
annotations:
# Specify a supported GPU instance type that allows you to change the driver version.
k8s.aliyun.com/eci-use-specs: ecs.gn6i-c4g1.xlarge
# Override the default driver version (Tesla 550) with Tesla 535.
k8s.aliyun.com/eci-gpu-driver-version: tesla=535
spec:
containers:
- name: nginx
image: registry-us-east-1.aliyuncs.com/eci_open/nginx:1.14.2
resources:
limits:
nvidia.com/gpu: "1" # Request 1 GPU for this container.