All Products
Search
Document Center

Container Service for Kubernetes:Specify GPU models and driver versions for ACS GPU-accelerated pods

Last Updated:Mar 26, 2026

Container Compute Service (ACS) lets you declare a GPU model and NVIDIA driver version directly in your pod spec. Use this when your workload requires a specific GPU hardware type or a driver version different from the cluster default—without managing the underlying infrastructure.

Prerequisites

Before you begin, ensure that you have:

  • An ACK cluster with ACS enabled

  • kubectl configured to connect to your cluster

  • The GPU model you need (to view supported GPU models, submit a ticket)

GPU models

ACS supports two ways to request GPU resources:

  • On demand: GPU resources are allocated when the pod is scheduled.

  • Capacity reservation: Resources are pre-reserved and automatically deducted from the reservation when the pod is created. For details, see Capacity reservations for GPU-accelerated pods.

Specify a GPU model

Set the alibabacloud.com/gpu-model-series label in the pod metadata to request a specific GPU model.

Compute class Label Example value
GPU-accelerated alibabacloud.com/gpu-model-series T4

The following example shows how to set the label in a pod spec:

metadata:
  labels:
    # Set the compute class to gpu.
    alibabacloud.com/compute-class: "gpu"
    # Set the GPU model. Replace example-model with the actual GPU model, such as T4.
    alibabacloud.com/gpu-model-series: "example-model"

Driver versions

ACS lets you specify an NVIDIA driver version at the pod level.

How to choose a driver version

GPU-heavy applications typically rely on Compute Unified Device Architecture (CUDA), a parallel computing platform and programming model released by NVIDIA in 2007. The CUDA runtime version in your container is determined by the CUDA base image you build from. For example, if your image is built from nvidia/cuda:12.2.0-base-ubuntu20.04, the CUDA runtime version is 12.2.0.

cuda.png
Note

The CUDA software stack includes two APIs:

  • Driver API (in the NVIDIA driver package): provides a variety of complex features.

  • Runtime API (in the CUDA Toolkit package): encapsulates partial drivers and provides implicit driver initialization. For driver and toolkit compatibility requirements, see CUDA Toolkit Release Notes.

Supported driver versions

Make sure the driver version you specify is supported by ACS. The following table lists supported driver versions by GPU model.

GPU model Supported driver versions
8th-gen GPU A 550.90.07 (default)
8th-gen GPU B 550.90.07 (default), 535.161.08
T4 535.161.08 (default), 525.105.17

Pod label vs cluster default

ACS provides two ways to configure driver versions. Choose based on your scope of control:

Method Scope When to use
Pod label (alibabacloud.com/gpu-driver-version) Single pod or workload A specific workload needs a driver version different from the cluster default
acs-profile cluster default Entire cluster Standardize the driver version across all pods in the cluster

To change the cluster-level default, see Use acs-profile to automatically inject pod configurations.

Specify a driver version

Set the alibabacloud.com/gpu-driver-version label in the pod metadata.

Compute class Label Example value
GPU-accelerated alibabacloud.com/gpu-driver-version 535.161.08

The following example shows how to set both GPU model and driver version labels together:

metadata:
  labels:
    # Set the compute class to gpu.
    alibabacloud.com/compute-class: "gpu"
    # Set the GPU model. Replace example-model with the actual GPU model, such as T4.
    alibabacloud.com/gpu-model-series: "example-model"
    # Set the driver version to 535.161.08.
    alibabacloud.com/gpu-driver-version: "535.161.08"

Deploy a GPU pod with a specific model and driver version

The following steps create a Deployment that requests a GPU pod with a specified model and driver version, then verify the GPU configuration.

  1. Create a file named acs-pod-with-model-and-driver.yaml with the following content:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: acs-pod-with-model-and-driver
      namespace: default
      labels:
        app: acs-pod-with-model-and-driver
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: acs-pod-with-model-and-driver
      template:
        metadata:
          name: acs-pod-with-model-and-driver
          labels:
            app: acs-pod-with-model-and-driver
            # Specify ACS compute power.
            alibabacloud.com/acs: "true"
            # Set the compute class to gpu.
            alibabacloud.com/compute-class: "gpu"
            # example-model indicates the GPU model. Replace it with the actual GPU model, such as T4.
            alibabacloud.com/gpu-model-series: "<example-model>"
            # Set the driver version to 535.161.08.
            alibabacloud.com/gpu-driver-version: "535.161.08"
        spec:
          containers:
          - image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5
            name: tensorflow-mnist
            command:
            - sleep
            - infinity
            resources:
              requests:
                cpu: 1
                memory: 1Gi
                nvidia.com/gpu: 1
              limits:
                cpu: 1
                memory: 1Gi
                nvidia.com/gpu: 1
  2. Deploy the Deployment:

    kubectl apply -f acs-pod-with-model-and-driver.yaml
  3. Verify the pod is running:

    kubectl get pod

    The expected output is similar to:

    NAME                                             READY   STATUS    RESTARTS   AGE
    acs-pod-with-model-and-driver-7b89cbf4cf-2w66p   1/1     Running   0          6m26s
  4. Query the GPU information of the pod:

    Note

    /usr/bin/nvidia-smi contains the command parameters encapsulated in the sample container image.

    Important

    The actual output varies. The values shown here are representative.

    kubectl exec -it acs-pod-with-model-and-driver-7b89cbf4cf-2w66p -- /usr/bin/nvidia-smi

    The expected output is similar to:

    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI xxx.xxx.xx             Driver Version: 535.161.08   CUDA Version: xx.x     |
    |-----------------------------------------+----------------------+----------------------+
    ...
    |=========================================+======================+======================|
    |   x  NVIDIA example-model           xx  | xxxxxxxx:xx:xx.x xxx |                    x |
    | xxx   xxx    xx              xxx / xxxx |      xxxx /       xxx|      x%      xxxxxxxx|
    |                                         |                      |                  xxx |
    +-----------------------------------------+----------------------+----------------------+

    The output confirms the GPU model is example-model and the driver version is 535.161.08.

What's next