To deploy GPU-accelerated workloads in Knative — such as AI inference and high-performance computing — specify GPU resources in the Knative Service configuration. You can also enable GPU sharing to let multiple pods share a single physical GPU, reducing costs when workloads do not need dedicated GPU access.
Prerequisites
Before you begin, ensure that you have:
-
Knative deployed in your cluster. For more information, see Deploy Knative
Configure GPU resources
Add the k8s.aliyun.com/eci-use-specs annotation to spec.template.metadata.annotations to specify a GPU-accelerated Elastic Compute Service (ECS) instance type. Add the nvidia.com/gpu field to spec.containers.resources.limits to specify the number of GPUs required. If you omit nvidia.com/gpu, the pod fails to start.
The following example configures a Knative Service with one GPU:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
spec:
template:
metadata:
labels:
app: helloworld-go
annotations:
k8s.aliyun.com/eci-use-specs: ecs.gn5i-c4g1.xlarge # GPU-accelerated ECS instance type
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
ports:
- containerPort: 8080
resources:
limits:
nvidia.com/gpu: '1' # Number of GPUs required. Required field.
Supported GPU instance families
| Instance family | GPU model | Example instance type |
|---|---|---|
| gn7i | NVIDIA A10 | ecs.gn7i-c8g1.2xlarge |
| gn7 | — | ecs.gn7-c12g1.3xlarge |
| gn6v | NVIDIA V100 | ecs.gn6v-c8g1.2xlarge |
| gn6e | NVIDIA V100 | ecs.gn6e-c12g1.3xlarge |
| gn6i | NVIDIA T4 | ecs.gn6i-c4g1.xlarge |
| gn5i | NVIDIA P4 | ecs.gn5i-c2g1.large |
| gn5 | NVIDIA P100 | ecs.gn5-c4g1.xlarge |
The supported GPU driver version is NVIDIA 460.73.01 and the supported CUDA Toolkit version is 11.2. The gn5 instance family is equipped with local disks. For information on mounting local disks to elastic container instances (ECIs), see Create an elastic container instance that has local disks attached. For the full list of available instance types by region, see ECS instance types available for each region and Overview of instance families.
Enable GPU sharing
GPU sharing lets multiple pods share a single physical GPU, reducing costs when workloads do not require dedicated GPU access.
To enable GPU sharing:
-
Enable GPU sharing for nodes. For more information, see Enable GPU sharing.
-
Add the
aliyun.com/gpu-memfield tospec.containers.resources.limitsin your Knative Service to specify the GPU memory size:apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go namespace: default spec: template: metadata: annotations: autoscaling.knative.dev/maxScale: "100" autoscaling.knative.dev/minScale: "0" spec: containerConcurrency: 1 containers: - image: registry-vpc.cn-hangzhou.aliyuncs.com/hz-suoxing-test/test:helloworld-go name: user-container ports: - containerPort: 6666 name: http1 protocol: TCP resources: limits: aliyun.com/gpu-mem: "3" # Specify the GPU memory size.
What's next
-
Best practices for deploying AI inference services in Knative — deploy AI models as inference services, configure autoscaling, and allocate GPU resources flexibly.