NVIDIA has reported the CVE-2021-1056 vulnerability, which is related to device isolation and NVIDIA GPU drivers. Elastic GPU Service (EGS) instances that are deployed in a Container Service for Kubernetes (ACK) cluster may also be exposed to this vulnerability. This topic describes the background information, impact, and fixes of this vulnerability.
Background information
The CVE-2021-1056 vulnerability is related to device isolation and NVIDIA GPU drivers. This vulnerability allows an attacker to gain access to all GPU devices on a node by creating character device files in non-privileged containers that run on this node.
For more information about this vulnerability, see CVE-2021-1056.
Affected versions
- If you selected a custom NVIDIA driver or updated an NVIDIA driver, check whether the NVIDIA driver that you installed is affected by this vulnerability based on the preceding figure.
- If the NVIDIA driver is installed by default for the ACK cluster, you must check whether
the ACK cluster is affected by this vulnerability. ACK clusters that are affected
by this vulnerability are:
- ACK 1.16.9-aliyun.1. By default, the NVIDIA driver of version 418.87.01 is installed.
- ACK 1.18.8-aliyun.1. By default, the NVIDIA driver of version 418.87.01 is installed.
Verify the version of the NVIDIA driver on a GPU-accelerated node
nvidia-smi
Expected output:
Fri Apr 16 10:58:19 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:07.0 Off | 0 |
| N/A 34C P0 37W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The output indicates that the version of the NVIDIA driver is 418.87.01.
Fixes
Upgrade the NVIDIA driver based on the preceding figure.
- If your NVIDIA driver belongs to the R390 branch, upgrade it to version 390.141.
- If your NVIDIA driver belongs to the R418 branch, upgrade it to version 418.181.07.
- If your NVIDIA driver belongs to the R450 branch, upgrade it to version 450.102.04.
- If your NVIDIA driver belongs to the R460 branch, upgrade it to version 460.32.03.
For more information about how to upgrade the NVIDIA driver, see Use a node pool to upgrade the NVIDIA driver for a node, Manually upgrade the NVIDIA driver for a node, and Use a node pool to create a node with a custom NVIDIA driver version.