All Products
Search
Document Center

Container Service for Kubernetes:Specify an NVIDIA driver version for nodes by adding a label

Last Updated:Nov 18, 2025

By default, different types and versions of Alibaba Cloud Container Service for Kubernetes (ACK) clusters install different NVIDIA GPU driver versions. If your Compute Unified Device Architecture (CUDA) toolkit requires a newer driver for compatibility, you can install a custom version on your GPU nodes. This topic explains how to use a node pool label to customize the NVIDIA GPU driver version on GPU nodes.

Important notes

  • ACK does not guarantee the compatibility between the NVIDIA driver version and the CUDA toolkit version. You must verify the compatibility between them.

  • For detailed driver version requirements for different NVIDIA GPU models, see the official NVIDIA documentation.

  • For custom operating system images that already have GPU components installed, such as the GPU driver or NVIDIA Container Runtime, ACK cannot guarantee that the pre-installed driver is compatible with other ACK GPU components, such as monitoring components.

  • This method applies the custom driver only to new or scaled-out nodes. The installation is triggered upon node addition and does not affect existing nodes. To apply a new driver to existing nodes, you must first remove the nodes, then add them back to the cluster.

  • For instance types gn7 and ebmgn7, driver versions 510.xxx and 515.xxx have compatibility issues. We recommend using driver versions earlier than 510 (for example, 470.xxx.xxxx) with GSP disabled, or driver versions 525.125.06 or later.

  • Elastic Compute Service (ECS) instances of the ebmgn7 or ebmgn7e instance types support only NVIDIA driver versions 460.32.03 or later.

  • During node pool creation, if the specified driver version is not listed in ACK's supported NVIDIA driver versions, ACK will automatically install the default driver version. Specifying driver versions incompatible with the latest OS may cause node addition failures. In such cases, always select the latest supported driver version.

Step 1: Select the NVIDIA GPU driver version

Select the required NVIDIA GPU driver version from the list. This topic uses driver version 550.144.03 as an example.

Step 2: Create a node pool and specify the driver version

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, find the cluster to manage and click its name. In the left navigation pane, choose Nodes > Node Pools.

  3. In the upper-left corner, click Create Node Pool. For details about the configuration parameters, see Create and manage a node pool. Configure the key parameters as follows:

    In the Node Labels section under Advanced Options, add a label. Click the 1 icon. In the Key field, enter ack.aliyun.com/nvidia-driver-version. In the Value field, enter 550.144.03.

Step 3: Verify the custom NVIDIA driver installation

  1. Run the following command to view pods with the component: nvidia-device-plugin label:

    kubectl get po -n kube-system -l component=nvidia-device-plugin -o wide

    Expected output:

    NAME                             READY   STATUS    RESTARTS   AGE     IP              NODE                       NOMINATED NODE   READINESS GATES
    ack-nvidia-device-plugin-fnctc   1/1     Running   0          2m33s   10.117.227.43   cn-qingdao.10.117.XXX.XX   <none>           <none>

    The output shows a pod named ack-nvidia-device-plugin-fnctc running on the newly added node.

  2. Run the following command to verify that the node uses the expected driver version:

    kubectl exec -ti ack-nvidia-device-plugin-fnctc -n kube-system -- nvidia-smi

    Expected output:

    Mon Mar 24 08:51:55 2025       
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.6     |
    |-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  Tesla P4                       On  |   00000000:00:07.0 Off |                    0 |
    | N/A   33C    P8              7W /   75W |       0MiB /   7680MiB |      0%      Default |
    |                                         |                        |                  N/A |
    +-----------------------------------------+------------------------+----------------------+
                                                                                             
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |  No running processes found                                                             |
    +-----------------------------------------------------------------------------------------+

    The output shows Driver Version: 550.144.03. The output confirms that the node pool label successfully applied the custom NVIDIA driver.

Alternative methods

Alternatively, you can set the custom driver label in the node pool's configuration when using the CreateClusterNodePool API operation. The following example shows the tags section:

{
  // Other fields are not shown.
  ......
    "tags": [
        {
            "key": "ack.aliyun.com/nvidia-driver-version",
            "value": "550.144.03"
        }
    ],
  // Other fields are not shown.
  ......
}