For high-performance computing or graphics acceleration in workloads such as deep learning, AI, OpenGL, Direct3D, and cloud gaming, a GPU must have the Tesla driver installed to deliver its full performance and provide smooth graphics rendering. If you did not install the Tesla driver when you created your GPU-accelerated compute-optimized instance that runs Linux, you must manually install it afterward. This topic describes how to manually install the Tesla driver on a GPU-accelerated compute-optimized instance that runs Linux.
Procedure
This topic applies to all GPU-accelerated compute-optimized instances that run Linux. For more information, see GPU-accelerated compute-optimized instances (gn/ebm/scc series). You can install only a Tesla driver that is compatible with the operating system of the instance. For example, a GPU instance that runs Linux supports only the Tesla driver for Linux.
Step 1: Download the NVIDIA Tesla driver
Go to the NVIDIA driver download page.
NoteFor more information about how to install and configure NVIDIA drivers, see the NVIDIA Driver Installation Quickstart Guide.
Set the search criteria and click Search.
The following table describes the search criteria.
Criterion
Description
Example
Product type
Product series
Product family
Select the product type, product series, and product family based on the GPU in your instance.
NoteTo view the details of a GPU instance, such as the instance ID, instance type, and operating system, see View instance information.
Data Center / Tesla
A-Series
NVIDIA A10
Operating system
Select the Linux operating system version based on the image used by your instance.
Linux 64-bit
CUDA Toolkit
Select a CUDA Toolkit version.
11.4
Language
Select a language for the driver.
Chinese (Simplified)
On the search results page, click Beta, Older Drivers, and More.
Find the driver that you want to download and click View.
For example, select Data Center Driver for Linux x64 with driver version 470.161.03 and CUDA Toolkit version 11.4.
On the driver details page, right-click Download and select Copy Link Address.
Connect to the Linux GPU instance.
For more information, see Connect to a Linux instance by using a password or key.
Run the following command to download the driver installation package.
The driver download URL in the example command is the link you copied in Step 5.
wget https://us.download.nvidia.com/tesla/470.161.03/NVIDIA-Linux-x86_64-470.161.03.run
Step 2: Install the NVIDIA Tesla driver
The installation method for the Tesla driver depends on the operating system.
CentOS
Run the following command to check whether the kernel-devel and kernel-headers packages are installed.
sudo rpm -qa | grep $(uname -r)If the output includes version information for the kernel-devel and kernel-headers packages, they are already installed.
kernel-3.10.0-1062.18.1.el7.x86_64 kernel-devel-3.10.0-1062.18.1.el7.x86_64 kernel-headers-3.10.0-1062.18.1.el7.x86_64If you do not find kernel-devel-* and kernel-headers-* in the output, download and install the matching kernel-devel and kernel-headers packages for your kernel version.
ImportantIf the kernel-devel version does not match the kernel version, the driver compilation fails during the driver RPM installation. Therefore, you must confirm the version number of kernel-* in the output, and then download the matching kernel-devel version. In the example output, the kernel version is 3.10.0-1062.18.1.el7.x86_64.
Grant permissions and install the Tesla driver.
For Linux 64-bit operating systems, we recommend using the Tesla driver in the .run format, such as NVIDIA-Linux-x86_64-xxxx.run. Run the following commands to grant permissions and install the Tesla driver.
NoteIf you are using a Tesla driver in another format, such as .deb or .rpm, see the NVIDIA CUDA Installation Guide for Linux for installation instructions.
sudo chmod +x NVIDIA-Linux-x86_64-xxxx.runsudo sh NVIDIA-Linux-x86_64-xxxx.runRun the following command to verify the installation.
nvidia-smiIf the output resembles the following, the Tesla driver is installed.
[ecs-use xxx 9sgg1tZ ~]$ nvidia-smi Tue Sep 10 13:58:31 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10 Off | 00000000:00:07.0 Off | 0 | | 0% 34C P0 62W / 150W | 0MiB / 22731MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+(Optional) Enable Persistence Mode by using the NVIDIA Persistence Daemon.
After the Tesla driver is installed, Persistence Mode is disabled (
off) by default. The Tesla driver performs more stably when Persistence Mode is enabled. To ensure service stability, we recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon.NotePersistence Mode is a user-configurable driver property that keeps the target GPU initialized even when no clients are connected.
Enabling Persistence Mode by using
nvidia-smi -pm 1causes issues, such as the setting being lost after the instance is restarted. For more information, see Persistence Mode fails to persist, and ECC status or MIG feature settings also fail after a GPU instance is restarted. We recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon.
Run the following command to start the NVIDIA Persistence Daemon.
sudo nvidia-persistenced --user username # Replace username with your username.Run the following command to check the status of Persistence Mode.
nvidia-smiThe returned message indicates that Persistence-M is in the enabled (
on) state.[ecs-usexxx2q9sgg1tZ ~]$ sudo nvidia-persistenced --user ecs-user [ecs-usexxx2q9sgg1tZ ~]$ nvidia-smi Tue Sep 10 14:02:16 2024 +-------------------------------+----------------------+----------------------+ | NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10 On | 00000000:00:07.0 Off | 0 | | 0% 33C P8 8W / 150W | 0MiB / 22731MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
(Optional) Configure Persistence Mode to enable on system reboot.
If the system reboots, the enabled (
on) state of Persistence Mode is lost. You can perform the following operations to re-enable Persistence Mode.The Tesla driver installation package installs NVIDIA's installation scripts, such as example scripts and installer scripts, to
/usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2.Run the following commands to decompress and install the NVIDIA scripts.
cd /usr/share/doc/NVIDIA_GLX-1.0/samples/ sudo tar xf nvidia-persistenced-init.tar.bz2 cd nvidia-persistenced-init sudo sh install.shRun the following command to check if the NVIDIA Persistence Daemon is running.
sudo systemctl status nvidia-persistencedIf the output resembles the following, the NVIDIA Persistence Daemon is running.
[ecs-user@xxx nvidia-persistenced-init]$ sudo systemctl status nvidia-persistenced ● nvidia-persistenced.service - NVIDIA Persistence Daemon Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2024-09-10 14:13:20 CST; 40s ago Process: 13882 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced (code=exited, status=0/SUCCESS) Main PID: 13883 (nvidia-persiste) Tasks: 1 (limit: 383833) Memory: 196.0K CGroup: /system.slice/nvidia-persistenced.service └─13883 /usr/bin/nvidia-persistenced --user nvidia-persistenced Sep 10 14:13:19 iZbp13orbqqx6m2q9sgg1tZ systemd[1]: Starting NVIDIA Persistence Daemon... Sep 10 14:13:19 iZbp13orbqqx6m2q9sgg1tZ nvidia-persistenced[13883]: Started (13883) Sep 10 14:13:20 iZbp13orbqqx6m2q9sgg1tZ systemd[1]: Started NVIDIA Persistence Daemon.NoteYou can adapt the NVIDIA Persistence Daemon installation script for your operating system to ensure it works correctly.
Run the following command to verify that the Persistence Mode is set to
on.nvidia-smi(Optional) Run the following commands to stop the NVIDIA Persistence Daemon.
You can disable the NVIDIA Persistence Daemon if it is no longer needed.
sudo systemctl stop nvidia-persistenced sudo systemctl disable nvidia-persistenced
(Conditionally required) If your instance belongs to the ebmgn8v, ebmgn7, or ebmgn7e instance family, install the nvidia-fabricmanager service that matches your driver version.
ImportantIf the instance belongs to the ebmgn8v, ebmgn7, or ebmgn7e instance family, you cannot use the GPU instance if the nvidia-fabricmanager service that matches the driver version is not installed.
If your GPU instance does not belong to the ebmgn8v, ebmgn7, or ebmgn7e instance family, skip this step.
Install the nvidia-fabricmanager service.
You can install the nvidia-fabricmanager service from the source code or an installation package. The following sample commands use CentOS 7.x and CentOS 8.x as the operating systems and driver version 460.91.03 as an example. In the commands, replace
driver_versionwith the version number of the driver that you downloaded in Step 1: Download the NVIDIA Tesla driver.Source code
Installation package
Run the following commands to start the nvidia-fabricmanager service.
sudo systemctl enable nvidia-fabricmanager sudo systemctl start nvidia-fabricmanagerRun the following command to check the status of the nvidia-fabricmanager service.
systemctl status nvidia-fabricmanagerIf the following output is returned, the nvidia-fabricmanager service is running.
nvidia-fabricmanager.service - NVIDIA fabric manager service Loaded: loaded (/lib/systemd/system/nvidia-fabricmanager.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2021-09-13 19:14:45 CST; 1 weeks 1 days ago Process: 1928 ExecStart=/usr/bin/nv-fabricmanager -c /usr/share/nvidia/nvswitch/fabricmanager.cfg (code=exited, status=0/SUCCESS) Main PID: 2140 (nv-fabricmanage) Tasks: 18 (limit: 19660) CGroup: /system.slice/nvidia-fabricmanager.service └─2140 /usr/bin/nv-fabricmanager -c /usr/share/nvidia/nvswitch/fabricmanager.cfg Sep 13 19:14:26 xxx systemd[1]: Starting NVIDIA fabric manager service... Sep 13 19:14:45 xxx nv-fabricmanager[2140]: Successfully configured all the available GPUs and NVSwitches. Sep 13 19:14:45 xxx systemd[1]: Started NVIDIA fabric manager service.
Ubuntu and others
Grant permissions and install the Tesla driver.
For Linux 64-bit operating systems, we recommend using the Tesla driver in the .run format, such as NVIDIA-Linux-x86_64-xxxx.run. Run the following commands to grant permissions and install the Tesla driver.
NoteIf you are using a Tesla driver in another format, such as .deb or .rpm, see the NVIDIA CUDA Installation Guide for Linux for installation instructions.
sudo chmod +x NVIDIA-Linux-x86_64-xxxx.runsudo sh NVIDIA-Linux-x86_64-xxxx.runRun the following command to verify the installation.
nvidia-smiIf the output resembles the following, the Tesla driver is installed.
[ecs-use xxx 9sgg1tZ ~]$ nvidia-smi Tue Sep 10 13:58:31 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10 Off | 00000000:00:07.0 Off | 0 | | 0% 34C P0 62W / 150W | 0MiB / 22731MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+(Optional) Enable Persistence Mode by using the NVIDIA Persistence Daemon.
After the Tesla driver is installed, Persistence Mode is disabled (
off) by default. The Tesla driver performs more stably when Persistence Mode is enabled. To ensure service stability, we recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon.NotePersistence Mode is a user-configurable driver property that keeps the target GPU initialized even when no clients are connected.
Enabling Persistence Mode by using
nvidia-smi -pm 1causes issues, such as the setting being lost after the instance is restarted. For more information, see Persistence Mode fails to persist, and ECC status or MIG feature settings also fail after a GPU instance is restarted. We recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon.
Run the following command to start the NVIDIA Persistence Daemon.
sudo nvidia-persistenced --user username # Replace username with your username.Run the following command to check the status of Persistence Mode.
nvidia-smiThe returned message indicates that Persistence-M is in the enabled (
on) state.[ecs-usexxx2q9sgg1tZ ~]$ sudo nvidia-persistenced --user ecs-user [ecs-usexxx2q9sgg1tZ ~]$ nvidia-smi Tue Sep 10 14:02:16 2024 +-------------------------------+----------------------+----------------------+ | NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10 On | 00000000:00:07.0 Off | 0 | | 0% 33C P8 8W / 150W | 0MiB / 22731MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
(Optional) Configure Persistence Mode to enable on system reboot.
If the system reboots, the enabled (
on) state of Persistence Mode is lost. You can perform the following operations to re-enable Persistence Mode.The Tesla driver installation package installs NVIDIA's installation scripts, such as example scripts and installer scripts, to
/usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2.Run the following commands to decompress and install the NVIDIA scripts.
cd /usr/share/doc/NVIDIA_GLX-1.0/samples/ sudo tar xf nvidia-persistenced-init.tar.bz2 cd nvidia-persistenced-init sudo sh install.shRun the following command to check if the NVIDIA Persistence Daemon is running.
sudo systemctl status nvidia-persistencedIf the output resembles the following, the NVIDIA Persistence Daemon is running.
[ecs-user@xxx nvidia-persistenced-init]$ sudo systemctl status nvidia-persistenced ● nvidia-persistenced.service - NVIDIA Persistence Daemon Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2024-09-10 14:13:20 CST; 40s ago Process: 13882 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced (code=exited, status=0/SUCCESS) Main PID: 13883 (nvidia-persiste) Tasks: 1 (limit: 383833) Memory: 196.0K CGroup: /system.slice/nvidia-persistenced.service └─13883 /usr/bin/nvidia-persistenced --user nvidia-persistenced Sep 10 14:13:19 iZbp13orbqqx6m2q9sgg1tZ systemd[1]: Starting NVIDIA Persistence Daemon... Sep 10 14:13:19 iZbp13orbqqx6m2q9sgg1tZ nvidia-persistenced[13883]: Started (13883) Sep 10 14:13:20 iZbp13orbqqx6m2q9sgg1tZ systemd[1]: Started NVIDIA Persistence Daemon.NoteYou can adapt the NVIDIA Persistence Daemon installation script for your operating system to ensure it works correctly.
Run the following command to verify that the Persistence Mode is set to
on.nvidia-smi(Optional) Run the following commands to stop the NVIDIA Persistence Daemon.
You can disable the NVIDIA Persistence Daemon if it is no longer needed.
sudo systemctl stop nvidia-persistenced sudo systemctl disable nvidia-persistenced
(Conditionally required) If your instance belongs to the ebmgn8v, ebmgn7, or ebmgn7e instance family, install the nvidia-fabricmanager service that matches your driver version.
ImportantIf the instance belongs to the ebmgn8v, ebmgn7, or ebmgn7e instance family, you cannot use the GPU instance if the nvidia-fabricmanager service that matches the driver version is not installed.
If your GPU instance does not belong to the ebmgn8v, ebmgn7, or ebmgn7e instance family, skip this step.
Install the nvidia-fabricmanager service.
You can install the nvidia-fabricmanager service from source code or an installation package. The following command examples are for the Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, or Ubuntu 24.04 operating system. In the commands, replace
driver_versionwith the version of the driver that you downloaded in Step 1: Download the NVIDIA Tesla driver.ImportantOn Ubuntu 22.04, the nvidia-fabricmanager service requires a Tesla driver version later than 515.48.07. The following example for Ubuntu 22.04 uses driver version 535.154.05.
On Ubuntu 24.04, the nvidia-fabricmanager service requires a Tesla driver version later than 550.90.07. The following example for Ubuntu 24.04 uses driver version 570.133.20.
Source code
Installation package
Run the following commands to start the nvidia-fabricmanager service.
sudo systemctl enable nvidia-fabricmanager sudo systemctl start nvidia-fabricmanagerRun the following command to check the status of the nvidia-fabricmanager service.
systemctl status nvidia-fabricmanagerIf the following output is returned, the nvidia-fabricmanager service is running.
nvidia-fabricmanager.service - NVIDIA fabric manager service Loaded: loaded (/lib/systemd/system/nvidia-fabricmanager.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2021-09-13 19:14:45 CST; 1 weeks 1 days ago Process: 1928 ExecStart=/usr/bin/nv-fabricmanager -c /usr/share/nvidia/nvswitch/fabricmanager.cfg (code=exited, status=0/SUCCESS) Main PID: 2140 (nv-fabricmanage) Tasks: 18 (limit: 19660) CGroup: /system.slice/nvidia-fabricmanager.service └─2140 /usr/bin/nv-fabricmanager -c /usr/share/nvidia/nvswitch/fabricmanager.cfg Sep 13 19:14:26 xxx systemd[1]: Starting NVIDIA fabric manager service... Sep 13 19:14:45 xxx nv-fabricmanager[2140]: Successfully configured all the available GPUs and NVSwitches. Sep 13 19:14:45 xxx systemd[1]: Started NVIDIA fabric manager service.NoteThe nvidia-fabricmanager package version must match the Tesla driver version to ensure the GPU works correctly. On Ubuntu, if you install the nvidia-fabricmanager service by using an installation package, the
apt-dailyservice may automatically update the nvidia-fabricmanager package. This can cause a version mismatch with the Tesla driver, which prevents the nvidia-fabricmanager service from starting and makes the GPU unusable. To resolve this issue, see The GPU does not work as expected because the nvidia-fabricmanager version is different from the Tesla driver version.
References
If you purchased a GPU-accelerated compute-optimized instance that runs Windows, you must install the Tesla driver to use the instance for general-purpose computing workloads, such as deep learning and AI. For more information, see Manually install the Tesla driver on a GPU-accelerated compute-optimized instance (Windows).
If you want to install the Tesla driver when you create a GPU instance, see Automatically install or load the Tesla driver when creating a GPU instance.
If you need to uninstall the current Tesla driver for any reason, see Uninstall the Tesla driver.
If the installed driver version is not suitable for your workload, or if you installed an incorrect driver type or version and the GPU instance becomes unusable, you can uninstall the current driver and install a new one, or directly upgrade the driver. For information about how to upgrade the driver, see Upgrade the NVIDIA driver.