ACK installs a default NVIDIA driver version for each cluster type. If that version is incompatible with your CUDA library, set a node pool label to pin a specific driver version on new GPU nodes.
Prerequisites
Before you begin, ensure that you have:
-
An ACK cluster with GPU-capable instance types available
-
The driver version you need, selected from ACK's supported NVIDIA driver versions
Limitations
Driver compatibility
ACK does not guarantee compatibility between driver versions and CUDA library versions. Verify compatibility yourself using the NVIDIA driver download page.
Applies to new nodes only
The label triggers driver installation when a node joins the cluster — either during initial creation or scale-out. Existing nodes are not affected. To update the driver on an existing node, remove the node and add it again.
Custom OS images
For custom OS images that already include GPU components such as the GPU driver and NVIDIA Container Runtime, ACK cannot guarantee that the custom GPU driver is compatible with other ACK GPU components, such as monitoring components.
Unsupported driver versions fall back to the default
If the version you specify is not in ACK's supported list, ACK installs the default driver version instead. A driver version that is incompatible with the latest OS can also cause node addition failures — always use the latest supported version.
Instance type compatibility
| Instance type | Compatible driver versions |
|---|---|
| gn7, ebmgn7 | Earlier than 510.xxx (e.g., 470.xxx.xxxx with GSP disabled), or 525.125.06 and later. Versions 510.xxx and 515.xxx are not compatible. |
| ebmgn7, ebmgn7e | 460.32.03 and later |
Step 1: Select a driver version
Open ACK's supported NVIDIA driver versions and pick a version that matches your CUDA library requirements. The following steps use 550.144.03 as an example.
Step 2: Create a node pool with the driver version label
-
Log on to the Container Service Management Console. In the left navigation pane, click Container Service Management ConsoleClusters.
-
Click the name of your cluster. In the left navigation pane, choose Nodes > Node Pools.
-
In the upper-left corner, click Create Node Pool. For a full description of configuration options, see Create and manage a node pool.
-
In the Node Labels section, click the
icon to add a label:-
Key:
ack.aliyun.com/nvidia-driver-version -
Value:
550.144.03
-
-
Complete the remaining configuration and submit the node pool.
Step 3: Verify the driver installation
After the node pool is created and a node has joined, verify that the expected driver version is installed.
-
List the pods with the
component: nvidia-device-pluginlabel to find the pod running on your new node:kubectl get po -n kube-system -l component=nvidia-device-plugin -o wideExpected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ack-nvidia-device-plugin-fnctc 1/1 Running 0 2m33s 10.117.227.43 cn-qingdao.10.117.XXX.XX <none> <none>Note the pod name in the NAME column that corresponds to your new node (for example,
ack-nvidia-device-plugin-fnctc). -
Run
nvidia-smiinside that pod to confirm the driver version:kubectl exec -ti ack-nvidia-device-plugin-fnctc -n kube-system -- nvidia-smiExpected output:
Mon Mar 24 08:51:55 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla P4 On | 00000000:00:07.0 Off | 0 | | N/A 33C P8 7W / 75W | 0MiB / 7680MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+The
Driver Version: 550.144.03line confirms that the custom driver is installed.
Set the driver version using the API
To set the driver version label when creating a node pool with the CreateClusterNodePool API, include it in the tags field of the request body:
{
"tags": [
{
"key": "ack.aliyun.com/nvidia-driver-version",
"value": "550.144.03"
}
]
}