This topic describes how to create an Elastic Container Instance (ECI) pod with a specific Elastic Compute Service (ECS) GPU-accelerated instance type and change the GPU driver version.
Supported instance families
For more information about ECS instance types, see the following topics:
Configurations
To specify a GPU-accelerated instance type, add the k8s.aliyun.com/eci-use-specs annotation to the pod's metadata. To request GPUs, add the nvidia.com/gpu field to the container's resources.
The value of
nvidia.com/gpuspecifies the number of GPUs the container requires. If not set, the pod fails to start.By default, multiple containers can share GPUs. Ensure the number of GPUs allocated to any single container does not exceed the total available in the specified instance type.
Example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
labels:
app: test
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
name: nginx-test
labels:
app: nginx
alibabacloud.com/eci: "true"
annotations:
k8s.aliyun.com/eci-use-specs: "ecs.gn6i-c4g1.xlarge,ecs.gn6i-c8g1.2xlarge" # Specify a maximum of five GPU-accelerated ECS instance types at a time.
spec:
containers:
- name: nginx
image: registry.cn-shanghai.aliyuncs.com/eci_open/nginx:1.14.2
resources:
limits:
nvidia.com/gpu: "1" # The number of GPUs required by the Nginx container. The GPUs are shared.
ports:
- containerPort: 80
- name: busybox
image: registry.cn-shanghai.aliyuncs.com/eci_open/busybox:1.30
command: ["sleep"]
args: ["999999"]
resources:
limits:
nvidia.com/gpu: "1" # The number of GPUs required by the BusyBox container. The GPUs are shared.
By default, ECI automatically installs a supported driver and CUDA version based on the specified GPU-accelerated instance type. In some cases, you may need to use different driver and CUDA versions for different workloads. To do this, add the k8s.aliyun.com/eci-gpu-driver-version annotation to specify a driver version. For example, if you specify the ecs.gn6i-c4g1.xlarge instance type, the default installation includes the Tesla 470 driver and CUDA 11.4. By adding the annotation k8s.aliyun.com/eci-gpu-driver-version: tesla=535, you can switch to the Tesla 535 driver and CUDA 12.2.
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
labels:
app: test
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
name: nginx-test
labels:
app: nginx
alibabacloud.com/eci: "true"
annotations:
k8s.aliyun.com/eci-use-specs: ecs.gn6i-c4g1.xlarge # Specify the supported GPU-accelerated ECS instance types. The instance types support the change of driver version.
k8s.aliyun.com/eci-gpu-driver-version: tesla=535 # Specify the GPU driver version.
spec:
containers:
- name: nginx
image: registry.cn-shanghai.aliyuncs.com/eci_open/nginx:1.14.2
resources:
limits:
nvidia.com/gpu: "1" # The number of GPUs required by the container.