All Products
Search
Document Center

Container Service for Kubernetes:Specify a custom GPU driver version for a node

Last Updated:Mar 26, 2026

ACK installs a default NVIDIA driver version for each cluster type. If that version is incompatible with your CUDA library, set a node pool label to pin a specific driver version on new GPU nodes.

Prerequisites

Before you begin, ensure that you have:

Limitations

Driver compatibility

ACK does not guarantee compatibility between driver versions and CUDA library versions. Verify compatibility yourself using the NVIDIA driver download page.

Applies to new nodes only

The label triggers driver installation when a node joins the cluster — either during initial creation or scale-out. Existing nodes are not affected. To update the driver on an existing node, remove the node and add it again.

Custom OS images

For custom OS images that already include GPU components such as the GPU driver and NVIDIA Container Runtime, ACK cannot guarantee that the custom GPU driver is compatible with other ACK GPU components, such as monitoring components.

Unsupported driver versions fall back to the default

If the version you specify is not in ACK's supported list, ACK installs the default driver version instead. A driver version that is incompatible with the latest OS can also cause node addition failures — always use the latest supported version.

Instance type compatibility

Instance type Compatible driver versions
gn7, ebmgn7 Earlier than 510.xxx (e.g., 470.xxx.xxxx with GSP disabled), or 525.125.06 and later. Versions 510.xxx and 515.xxx are not compatible.
ebmgn7, ebmgn7e 460.32.03 and later

Step 1: Select a driver version

Open ACK's supported NVIDIA driver versions and pick a version that matches your CUDA library requirements. The following steps use 550.144.03 as an example.

Step 2: Create a node pool with the driver version label

  1. Log on to the Container Service Management Console. In the left navigation pane, click Container Service Management ConsoleClusters.

  2. Click the name of your cluster. In the left navigation pane, choose Nodes > Node Pools.

  3. In the upper-left corner, click Create Node Pool. For a full description of configuration options, see Create and manage a node pool.

  4. In the Node Labels section, click the 1 icon to add a label:

    • Key: ack.aliyun.com/nvidia-driver-version

    • Value: 550.144.03

  5. Complete the remaining configuration and submit the node pool.

Step 3: Verify the driver installation

After the node pool is created and a node has joined, verify that the expected driver version is installed.

  1. List the pods with the component: nvidia-device-plugin label to find the pod running on your new node:

    kubectl get po -n kube-system -l component=nvidia-device-plugin -o wide

    Expected output:

    NAME                             READY   STATUS    RESTARTS   AGE     IP              NODE                       NOMINATED NODE   READINESS GATES
    ack-nvidia-device-plugin-fnctc   1/1     Running   0          2m33s   10.117.227.43   cn-qingdao.10.117.XXX.XX   <none>           <none>

    Note the pod name in the NAME column that corresponds to your new node (for example, ack-nvidia-device-plugin-fnctc).

  2. Run nvidia-smi inside that pod to confirm the driver version:

    kubectl exec -ti ack-nvidia-device-plugin-fnctc -n kube-system -- nvidia-smi

    Expected output:

    Mon Mar 24 08:51:55 2025
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.6     |
    |-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  Tesla P4                       On  |   00000000:00:07.0 Off |                    0 |
    | N/A   33C    P8              7W /   75W |       0MiB /   7680MiB |      0%      Default |
    |                                         |                        |                  N/A |
    +-----------------------------------------+------------------------+----------------------+
    
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |  No running processes found                                                             |
    +-----------------------------------------------------------------------------------------+

    The Driver Version: 550.144.03 line confirms that the custom driver is installed.

Set the driver version using the API

To set the driver version label when creating a node pool with the CreateClusterNodePool API, include it in the tags field of the request body:

{
  "tags": [
    {
      "key": "ack.aliyun.com/nvidia-driver-version",
      "value": "550.144.03"
    }
  ]
}