install a Tesla driver on Linux - Elastic GPU Service - Alibaba Cloud Documentation Center

GPU-accelerated instances on which NVIDIA Tesla drivers are installed can deliver high computing performance or provide smoother graphics display effects in specific scenarios. The scenarios include general-purpose computing scenarios such as deep learning and AI scenarios, and graphics acceleration scenarios such as Open Graphics Library (OpenGL), Direct3D, and cloud gaming scenarios. If you do not install a Tesla driver when you create a GPU-accelerated compute-optimized Linux instance, you must install the Tesla driver after you create the instance. This topic describes how to manually install a Tesla driver on a GPU-accelerated compute-optimized Linux instance.

Procedure

This topic applies to all GPU-accelerated compute-optimized Linux instances. For more information, see GPU-accelerated compute-accelerated instance families. You can install only Tesla drivers that run the same OS as the instances. For example, you can install only a Linux Tesla driver on a GPU-accelerated compute-optimized Linux instance.

Step 1: Download a Tesla driver

Visit the NVIDIA Driver Downloads page on the NVIDIA official website.
Note
For more information about how to install and configure an NVIDIA driver, see Driver Installation.

Configure filters and click Search to search for a driver that is suitable for your instance.

驱动下载.png

The following table describes the filters.

Filter	Description	Example
Product Type Product Series Product	From the Product Type, Product Series, and Product drop-down lists, select values based on the GPU with which your GPU-accelerated compute-optimized instance is configured. Note For more information about how to view the details of a GPU-accelerated instance, such as the instance ID, instance type, and OS, see View instance information.	Data Center / Tesla A-Series NVDIA A10
Operating System	Select a Linux version based on the image of the instance.	Linux 64-bit
CUDA Toolkit	Select a CUDA Toolkit version.	11.4
Language	Select a language for the driver.	Chinese (Simplified)
Recommended/Beta	By default, All is selected. You can use the default setting.	All

The following table lists the GPU information, Tesla driver versions, and CUDA Toolkit versions that are supported by specific GPU-accelerated compute-optimized instance families.

Instance family	gn5	gn5i	gn6v	gn6i	gn6e	gn7	gn7i	gn7e
Product Type	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla
Product Series	P-Series	P-Series	V-Series	T-Series	V-Series	A-Series	A-Series	A-Series
Recommended Tesla driver version	Version 410.79 or later					Version 450.80.02 or later	Version 460.73.01 or later	Version 450.80.02 or later
Recommended CUDA Toolkit version	CUDA Toolkit 10.1 Update 2					CUDA Toolkit 11.0 Update 1	CUDA Toolkit 11.2	CUDA Toolkit 11.0 Update 1

Note

The preceding table lists only the GPU information about specific popular GPU-accelerated compute-optimized instance families. Instances that use the same GPU have the same GPU information, such as the same product type, product series, and product family. For example, instances of the ebmgn7i and gn7i instance families use NVIDIA A10 GPUs. Therefore, the product type, product series, and product family of the instances are the same.
When you manually install the Tesla driver and CUDA Toolkit, you must make sure that the driver version is compatible with the CUDA Toolkit version. For more information, see CUDA Compatibility.

In the search result, find the driver version that you want to download, such as version 470.161.03, and click the driver name.
On the driver details page, click Download.
On the Download page, right-click Agree & Download and select Copy URL.
Use one of the following methods to connect to your GPU-accelerated compute-optimized Linux instance.
Method
References
Workbench
Connect to a Linux instance by using a password or key
Virtual Network Computing (VNC)
Connect to an instance by using VNC
Append the download address that you copied in Substep 5 to the wget command and run the command to download the installation package of the driver.
Sample command:
```
wget https://us.download.nvidia.com/tesla/470.161.03/NVIDIA-Linux-x86_64-470.161.03.run
```

Step 2: Install the Tesla driver

The method of installing a Tesla driver on an instance varies based on the OS of the instance. The following section describes how to install a Tesla driver on different OSs.

CentOS

Run the following command to check whether the kernel-devel and kernel-headers packages are installed on the GPU-accelerated instance:
```
rpm  -qa | grep $(uname -r)
```
- If the command output includes the version information about the kernel-devel and kernel-headers packages, the packages are installed. Sample command output:
```
kernel-3.10.0-1062.18.1.el7.x86_64
kernel-devel-3.10.0-1062.18.1.el7.x86_64
kernel-headers-3.10.0-1062.18.1.el7.x86_64
```
- If the command output does not include the version information about the kernel-devel (kernel-devel-*) and kernel-headers (kernel-headers-*) packages, you must download and install the packages of the required version. For more information, see kernel-devel and kernel-headers.
  Important
  Make sure that the kernel-devel version is the same as the kernel version. Otherwise, a compilation error occurs when you install RPM Package Manager (RPM) for your driver. Therefore, check the kernel version in the command output before you download the kernel-devel version. In the preceding command output, the kernel version is 3.10.0-1062.18.1.el7.x86_64.
Grant the permissions on the installation package of your Tesla driver and install the driver.
In this example, a Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:
Note
If the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.
```
chmod +x NVIDIA-Linux-x86_64-xxxx.run
```
```
sh NVIDIA-Linux-x86_64-xxxx.run
```
Run the following command to check whether the Tesla driver is installed:
```
nvidia-smi
```
If the following command output is displayed, the Tesla driver is installed.
(Optional) Enable the persistence mode (Persistence-M) by using the NVIDIA Persistence Daemon.
After the Tesla driver is installed, Persistence-M is in the disabled (off) state by default. A Tesla driver can achieve more stable performance when Persistence-M is enabled. We recommend that you enable Persistence-M by using the NVIDIA Persistence Daemon to ensure business continuity. For more information, see Persistence Daemon.
Note
- Persistence-M is a term for a user-settable driver property that keeps a GPU in the initialized state.
- NVIDIA provides the Persistence Mode (Legacy) method to enable Persistence-M by using the nvidia-smi -pm 1 command. The Persistence Mode (Legacy) method is near end-of-life and will be deprecated and replaced by the NVIDIA Persistence Daemon method.
1. Run the following command to run the NVIDIA Persistence Daemon:
```
sudo nvidia-persistenced --user username 
# Replace username with your username.
```
2. Run the following command to view the status of Persistence-M:
```
nvidia-smi
```
  If the following command output is displayed, Persistence-M is in the enabled (on) state.
(Optional) Enable Persistence-M after you restart the system.
If you restart the system, the enabled (on) state of Persistence-M becomes invalid. You can perform the following operations to enable Persistence-M:
Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path by installing the Tesla driver installation package.
1. Run the following command to decompress and install the installation script provided by NVIDIA:
```
cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
tar xf nvidia-persistenced-init.tar.bz2
cd  nvidia-persistenced-init
sh install.sh
```
2. Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:
```
systemctl status nvidia-persistenced
```
  If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.
  Note
  You can adapt the NVIDIA Persistence Daemon installation script based on your OS to ensure that the NVIDIA Persistence Daemon works as expected.
3. Run the following command to verify that Persistence-M is in the enabled (on) state:
```
nvidia-smi
```
4. (Optional) Run the following command to disable the NVIDIA Persistence Daemon.
  You can disable the NVIDIA Persistence Daemon based on your business requirements.
```
systemctl stop nvidia-persistenced
systemctl disable nvidia-persistenced
```
(Conditionally required) Install NVIDIA Fabric Manager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.
Important
- If your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install NVIDIA Fabric Manager that matches the driver version. Otherwise, you cannot use the instance as expected.
- You can skip this operation if your GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.
1. Install NVIDIA Fabric Manager.
  You can install NVIDIA Fabric Manager by using the source code or the installation package. The commands that are required to install NVIDIA Fabric Manager vary based on your OS. In the following examples, the driver version is 460.91.03, and CentOS 7.x and CentOS 8.x are used. Replace driver_version with the version of the driver that you downloaded in Step 1: Download a Tesla driver.
  - Source code
    - CentOS 7.x
      driver_version=460.91.03 yum -y install yum-utils yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo yum install -y nvidia-fabric-manager-${driver_version}-1
    - CentOS 8.x
      driver_version=460.91.03 driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}') distribution=rhel8 ARCH=$( /bin/arch ) dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/$distribution/${ARCH}/cuda-$distribution.repo dnf module enable -y nvidia-driver:${driver_version_main} dnf install -y nvidia-fabric-manager-0:${driver_version}-1
  - Installation package
    - CentOS 7.x
      driver_version=460.91.03 wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/rhel7/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
    - CentOS 8.x
      driver_version=460.91.03 wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/rhel8/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
2. Run the following commands to start NVIDIA Fabric Manager:
```
systemctl enable nvidia-fabricmanager
systemctl start nvidia-fabricmanager
```
3. Run the following command to check whether NVIDIA Fabric Manager is installed:
```
systemctl status nvidia-fabricmanager
```
  If the following command output is displayed, NVIDIA Fabric Manager is installed.

Other Linux distributions such as Ubuntu

Grant the permissions on the installation package of your Tesla driver and install the driver.
In this example, a Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:
Note
If the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.
```
chmod +x NVIDIA-Linux-x86_64-xxxx.run
```
```
sh NVIDIA-Linux-x86_64-xxxx.run
```
Run the following command to check whether the Tesla driver is installed:
```
nvidia-smi
```
If the following command output is displayed, the Tesla driver is installed.
(Optional) Enable the persistence mode (Persistence-M) by using the NVIDIA Persistence Daemon.
After the Tesla driver is installed, Persistence-M is in the disabled (off) state by default. A Tesla driver can achieve more stable performance when Persistence-M is enabled. We recommend that you enable Persistence-M by using the NVIDIA Persistence Daemon to ensure business continuity. For more information, see Persistence Daemon.
Note
- Persistence-M is a term for a user-settable driver property that keeps a GPU in the initialized state.
- NVIDIA provides the Persistence Mode (Legacy) method to enable Persistence-M by using the nvidia-smi -pm 1 command. The Persistence Mode (Legacy) method is near end-of-life and will be deprecated and replaced by the NVIDIA Persistence Daemon method.
1. Run the following command to run the NVIDIA Persistence Daemon:
```
sudo nvidia-persistenced --user username 
# Replace username with your username.
```
2. Run the following command to view the status of Persistence-M:
```
nvidia-smi
```
  If the following command output is displayed, Persistence-M is in the enabled (on) state.
(Optional) Enable Persistence-M after you restart the system.
If you restart the system, the enabled (on) state of Persistence-M becomes invalid. You can perform the following operations to enable Persistence-M:
Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path by installing the Tesla driver installation package.
1. Run the following command to decompress and install the installation script provided by NVIDIA:
```
cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
tar xf nvidia-persistenced-init.tar.bz2
cd  nvidia-persistenced-init
sh install.sh
```
2. Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:
```
systemctl status nvidia-persistenced
```
  If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.
  Note
  You can adapt the NVIDIA Persistence Daemon installation script based on your OS to ensure that the NVIDIA Persistence Daemon works as expected.
3. Run the following command to verify that Persistence-M is in the enabled (on) state:
```
nvidia-smi
```
4. (Optional) Run the following command to disable the NVIDIA Persistence Daemon.
  You can disable the NVIDIA Persistence Daemon based on your business requirements.
```
systemctl stop nvidia-persistenced
systemctl disable nvidia-persistenced
```

(Conditionally required) Install NVIDIA Fabric Manager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.

Important

If your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install NVIDIA Fabric Manager that matches the driver version. Otherwise, you cannot use the instance as expected.
You can skip this operation if your GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.

Install NVIDIA Fabric Manager.

You can install NVIDIA Fabric Manager by using the source code or the installation package. The commands that are required to install NVIDIA Fabric Manager vary based on your OS. In the following examples, the driver version is 460.91.03, and Ubuntu 16.04, Ubuntu 18.04, and Ubuntu 20.04 are used. Replace driver_version with the version of the driver that you downloaded in Step 1: Download a Tesla driver.

Source code

Ubuntu 16.04, Ubuntu 18.04, or Ubuntu 20.04

driver_version=460.91.03
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
apt-key add 3bf863cc.pub
rm 3bf863cc.pub
echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
apt-get update
apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*

Installation package

Ubuntu 16.04

driver_version=460.91.03
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu1604/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb

Ubuntu 18.04

driver_version=460.91.03
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu1804/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb

Ubuntu 20.04

driver_version=460.91.03
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb

Run the following commands to start NVIDIA Fabric Manager:

systemctl enable nvidia-fabricmanager
systemctl start nvidia-fabricmanager

Run the following command to check whether NVIDIA Fabric Manager is installed:
```
systemctl status nvidia-fabricmanager
```
If the following command output is displayed, NVIDIA Fabric Manager is installed.

References

If you purchase a GPU-accelerated compute-optimized Windows instance, you can install only a Tesla driver to better use the instance in general-purpose computing scenarios, such as deep learning and AI scenarios. For more information, see Install a Tesla driver on a GPU-accelerated compute-optimized Windows instance.
You can install a Tesla driver when you create a GPU-accelerated instance. For more information, see Create a GPU-accelerated instance.
If you no longer need a Tesla driver due to a specific reason, you can uninstall the driver. For more information, see Uninstall an NVIDIA Tesla driver.
If the driver version of your GPU-accelerated instance cannot meet your business requirements, or your GPU-accelerated instance becomes unavailable due to an invalid driver type or version, you can uninstall the driver and install a new driver. You can also upgrade the driver. For more information, see Upgrade an NVIDIA Tesla or GRID driver.

Method	References
Workbench	Connect to a Linux instance by using a password or key
Virtual Network Computing (VNC)	Connect to an instance by using VNC