All Products
Search
Document Center

Container Service for Kubernetes:Configure GPU resources for a Knative Service and enable GPU sharing

Last Updated:Mar 26, 2026

To deploy GPU-accelerated workloads in Knative — such as AI inference and high-performance computing — specify GPU resources in the Knative Service configuration. You can also enable GPU sharing to let multiple pods share a single physical GPU, reducing costs when workloads do not need dedicated GPU access.

Prerequisites

Before you begin, ensure that you have:

  • Knative deployed in your cluster. For more information, see Deploy Knative

Configure GPU resources

Add the k8s.aliyun.com/eci-use-specs annotation to spec.template.metadata.annotations to specify a GPU-accelerated Elastic Compute Service (ECS) instance type. Add the nvidia.com/gpu field to spec.containers.resources.limits to specify the number of GPUs required. If you omit nvidia.com/gpu, the pod fails to start.

The following example configures a Knative Service with one GPU:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: helloworld-go
spec:
  template:
    metadata:
      labels:
        app: helloworld-go
      annotations:
        k8s.aliyun.com/eci-use-specs: ecs.gn5i-c4g1.xlarge  # GPU-accelerated ECS instance type
    spec:
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
          ports:
          - containerPort: 8080
          resources:
            limits:
              nvidia.com/gpu: '1'    # Number of GPUs required. Required field.

Supported GPU instance families

Instance family GPU model Example instance type
gn7i NVIDIA A10 ecs.gn7i-c8g1.2xlarge
gn7 ecs.gn7-c12g1.3xlarge
gn6v NVIDIA V100 ecs.gn6v-c8g1.2xlarge
gn6e NVIDIA V100 ecs.gn6e-c12g1.3xlarge
gn6i NVIDIA T4 ecs.gn6i-c4g1.xlarge
gn5i NVIDIA P4 ecs.gn5i-c2g1.large
gn5 NVIDIA P100 ecs.gn5-c4g1.xlarge
The supported GPU driver version is NVIDIA 460.73.01 and the supported CUDA Toolkit version is 11.2. The gn5 instance family is equipped with local disks. For information on mounting local disks to elastic container instances (ECIs), see Create an elastic container instance that has local disks attached. For the full list of available instance types by region, see ECS instance types available for each region and Overview of instance families.

Enable GPU sharing

GPU sharing lets multiple pods share a single physical GPU, reducing costs when workloads do not require dedicated GPU access.

To enable GPU sharing:

  1. Enable GPU sharing for nodes. For more information, see Enable GPU sharing.

  2. Add the aliyun.com/gpu-mem field to spec.containers.resources.limits in your Knative Service to specify the GPU memory size:

    apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
      name: helloworld-go
      namespace: default
    spec:
      template:
        metadata:
          annotations:
            autoscaling.knative.dev/maxScale: "100"
            autoscaling.knative.dev/minScale: "0"
        spec:
          containerConcurrency: 1
          containers:
          - image: registry-vpc.cn-hangzhou.aliyuncs.com/hz-suoxing-test/test:helloworld-go
            name: user-container
            ports:
            - containerPort: 6666
              name: http1
              protocol: TCP
            resources:
              limits:
                aliyun.com/gpu-mem: "3"    # Specify the GPU memory size.

What's next