All Products
Search
Document Center

Container Service for Kubernetes:Use GPU resources and enable GPU sharing in Knative

Last Updated:Mar 26, 2026

To run AI inference, high-performance computing, or other GPU workloads in Knative, configure your Knative Service to request GPU resources. You can assign a dedicated GPU to a service or enable GPU sharing so multiple pods split a single physical GPU.

Prerequisites

Before you begin, ensure that you have:

  • Knative deployed in your ACK cluster. For more information, see Deploy Knative.

Configure a dedicated GPU

Add two fields to your Knative Service manifest:

  • k8s.aliyun.com/eci-use-specs annotation in spec.template.metadata.annotations — specifies the GPU-accelerated ECS instance type.

  • nvidia.com/gpu resource limit in spec.containers.resources.limits — specifies the number of GPUs the container requires. This field is required. If you omit it, the pod fails to start.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: helloworld-go
spec:
  template:
    metadata:
      labels:
        app: helloworld-go
      annotations:
        k8s.aliyun.com/eci-use-specs: ecs.gn5i-c4g1.xlarge  # GPU-accelerated ECS instance type
    spec:
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
          ports:
            - containerPort: 8080
          resources:
            limits:
              nvidia.com/gpu: '1'  # Number of GPUs required. Required field — omitting it causes the pod to fail at startup.

Supported GPU instance families

Instance family GPU chip Example instance type
gn7i NVIDIA A10 ecs.gn7i-c8g1.2xlarge
gn7 ecs.gn7-c12g1.3xlarge
gn6v NVIDIA V100 ecs.gn6v-c8g1.2xlarge
gn6e NVIDIA V100 ecs.gn6e-c12g1.3xlarge
gn6i NVIDIA T4 ecs.gn6i-c4g1.xlarge
gn5i NVIDIA P4 ecs.gn5i-c2g1.large
gn5 NVIDIA P100 ecs.gn5-c4g1.xlarge
The gn5 instance family includes local disks. To mount local disks to elastic container instances, see Create an elastic container instance that has local disks attached.

For the full list of GPU-accelerated ECS instance types available in your region, see ECS instance types available for each region. For general information about instance families, see Overview of instance families.

GPU-accelerated elastic container instances support NVIDIA GPU driver version 460.73.01 and CUDA Toolkit version 11.2.

Enable GPU sharing

GPU sharing lets multiple pods share a single physical GPU by dividing its memory. Use GPU sharing for workloads such as lightweight inference services or development environments.

  1. Enable GPU sharing on the nodes. For instructions, see Enable GPU sharing.

  2. In your Knative Service manifest, set aliyun.com/gpu-mem under spec.containers.resources.limits to specify the GPU memory size (in GB) each container receives.

    apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
      name: helloworld-go
      namespace: default
    spec:
      template:
        metadata:
          annotations:
            autoscaling.knative.dev/maxScale: "100"  # Maximum number of pod replicas
            autoscaling.knative.dev/minScale: "0"    # Scale to zero when idle
        spec:
          containerConcurrency: 1  # Maximum concurrent requests per pod replica
          containers:
            - image: registry-vpc.cn-hangzhou.aliyuncs.com/hz-suoxing-test/test:helloworld-go
              name: user-container
              ports:
                - containerPort: 6666
                  name: http1
                  protocol: TCP
              resources:
                limits:
                  aliyun.com/gpu-mem: "3"  # GPU memory allocated to this container, in GB

What's next