All Products
Search
Document Center

Elastic GPU Service:Install a Tesla driver on a GPU-accelerated compute-optimized Linux instance

Last Updated:Mar 05, 2024

GPU-accelerated instances installed with NVIDIA Tesla drivers can deliver higher computing performance or provide smoother graphics display effects in general computing scenarios such as deep learning and AI scenarios, and in graphics acceleration scenarios such as Open Graphics Library (OpenGL), Direct3D, and cloud gaming scenarios. If you do not install a Tesla driver when you create a GPU-accelerated compute-optimized Linux instance, you must install the Tesla driver after you create the instance. This topic describes how to manually install a Tesla driver on a GPU-accelerated compute-optimized Linux instance.

Procedure

This topic is suitable for all GPU-accelerated compute-optimized Linux instances. For more information, see GPU-accelerated compute-accelerated instance families. You can install only Tesla drivers that run the same OS as the instances. For example, you can install only a Linux Tesla driver on a GPU-accelerated compute-optimized Linux instance.

Step 1: Download a Tesla driver

  1. Visit the NVIDIA Driver Downloads page on the NVIDIA official website.

    Note

    For more information about how to install and configure an NVIDIA driver, see NVIDIA Driver Installation Quickstart Guide.

  2. Configure filters and click Search to search for a driver that is suitable for your instance.

    驱动下载.png

    The following table describes the filters.

    Filter

    Description

    Example

    • Product Type

    • Product Series

    • Product

    From the Product Type, Product Series, and Product drop-down lists, select values based on the GPU with which your GPU-accelerated compute-optimized instance is configured.

    Note

    For more information about how to view the details of a GPU-accelerated instance, such as the instance ID, instance type, and OS, see View instance information.

    • Data Center / Tesla

    • A-Series

    • NVDIA A10

    Operating System

    Select a Linux version based on the image of the instance.

    Linux 64-bit

    CUDA Toolkit

    Select a CUDA Toolkit version.

    11.4

    Language

    Select a language for the driver.

    Chinese (Simplified)

    Recommended/Beta

    By default, All is selected. You can use the default setting.

    All

    The following table describes the GPU information about specific GPU-accelerated compute-optimized instance families.

    Instance family

    gn5

    gn5i

    gn6v

    gn6i

    gn6e

    gn7

    gn7i

    gn7e

    Product Type

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Product Series

    P-Series

    P-Series

    V-Series

    T-Series

    V-Series

    A-Series

    A-Series

    A-Series

    Note

    The preceding table lists only the GPU information about specific popular GPU-accelerated compute-optimized instance families. Instances that use the same GPU have the same GPU information, such as the same product type, product series, and product family. For example, instances of the ebmgn7i and gn7i instance families use NVIDIA A10 GPUs. Therefore, the product type, product series, and product family of the instances are the same.

  3. In the search result, find the driver version that you want to download, such as version 470.161.03, and click the driver name.

  4. On the driver details page, click Download.

    下载页面.png

  5. In the Download section, right-click Agree & Download and select Copy URL.

    复制下载地址.png

  6. Use one of the following methods to connect to your GPU-accelerated compute-optimized Linux instance.

    Method

    References

    Workbench

    Connect to a Linux instance by using a password or key

    Virtual Network Computing (VNC)

    Connect to an instance by using VNC

  7. Append the download address that you copied in Substep 5 to the wget command and run the command to download the installation package of the driver.

    Sample command:

    wget https://us.download.nvidia.com/tesla/470.161.03/NVIDIA-Linux-x86_64-470.161.03.run

Step 2: Install the Tesla driver

The method of installing a Tesla driver on an instance varies based on the OS of the instance. The following section describes how to install a Tesla driver on different OSs.

CentOS

  1. Run the following command to check whether the kernel-devel and kernel-headers packages are installed on the GPU-accelerated instance:

    rpm  -qa | grep $(uname -r)
    • If the command output includes the version information about the kernel-devel and kernel-headers packages, the packages are installed. Sample command output:

      kernel-3.10.0-1062.18.1.el7.x86_64
      kernel-devel-3.10.0-1062.18.1.el7.x86_64
      kernel-headers-3.10.0-1062.18.1.el7.x86_64
    • If the command output does not include the version information about the kernel-devel (kernel-devel-*) and kernel-headers (kernel-headers-*) packages, you must download and install the packages of the required version. For more information, see kernel-devel and kernel-headers.

      Important

      Make sure that the kernel-devel version is the same as the kernel version. Otherwise, a compilation error occurs when you install RPM Package Manager (RPM) for your driver. Therefore, check the kernel version in the command output before you download the kernel-devel version. In the preceding command output, the kernel version is 3.10.0-1062.18.1.el7.x86_64.

  2. Grant the permissions on the installation package of your Tesla driver and install the driver.

    In this example, a Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:

    Note

    If the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.

    chmod +x NVIDIA-Linux-x86_64-xxxx.run
    sh NVIDIA-Linux-x86_64-xxxx.run
  3. Run the following command to check whether the Tesla driver is installed:

    nvidia-smi

    If the following command output is displayed, the Tesla driver is installed.

    off状态.png

  4. (Optional) Enable the persistence mode (Persistence-M) by using the NVIDIA Persistence Daemon.

    After the Tesla driver is installed, Persistence-M is in the disabled (off) state by default. A Tesla driver can achieve more stable performance when Persistence-M is enabled. We recommend that you enable Persistence-M by using the NVIDIA Persistence Daemon to ensure business continuity. For more information, see Persistence Daemon.

    Note
    • Persistence-M is a term for a user-settable driver property that keeps a GPU in the initialized state.

    • NVIDIA provides the Persistence Mode (Legacy) method to enable Persistence-M by using the nvidia-smi -pm 1 command. The Persistence Mode (Legacy) method is near end-of-life and will be deprecated and replaced by the NVIDIA Persistence Daemon method.

    1. Run the following command to run the NVIDIA Persistence Daemon:

      sudo nvidia-persistenced --user username 
      # Replace username with your username.

    2. Run the following command to view the status of Persistence-M:

      nvidia-smi

      If the following command output is displayed, Persistence-M is in the enabled (on) state.

      2023-08-11_18-02-33.png

  5. (Optional) Enable Persistence-M after you restart the system.

    If you restart the system, the enabled (on) state of Persistence-M becomes invalid. You can perform the following operations to enable Persistence-M:

    Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path by installing the Tesla driver installation package.

    1. Run the following command to decompress and install the installation script provided by NVIDIA:

      cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
      tar xf nvidia-persistenced-init.tar.bz2
      cd  nvidia-persistenced-init
      sh install.sh
    2. Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:

      systemctl status nvidia-persistenced

      If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.

      已安装.png

      Note

      You can adapt the NVIDIA Persistence Daemon installation script based on your OS to ensure that the NVIDIA Persistence Daemon works as expected.

    3. Run the following command to verify that Persistence-M is in the enabled (on) state:

      nvidia-smi
    4. (Optional) Run the following command to disable the NVIDIA Persistence Daemon.

      You can disable the NVIDIA Persistence Daemon based on your business requirements.

      systemctl stop nvidia-persistenced
      systemctl disable nvidia-persistenced
  6. (Optional) Install NVIDIA Fabric Manager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.

    Important
    • If your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install NVIDIA Fabric Manager that matches the driver version. Otherwise, you cannot use the instance as expected.

    • You can skip this operation if your GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.

    1. Install NVIDIA Fabric Manager.

      You can install NVIDIA Fabric Manager by using the source code or the installation package. The commands that are required to install NVIDIA Fabric Manager vary based on your OS. In the following examples, the driver version is 460.91.03, and CentOS 7.x and CentOS 8.x are used. Replace driver_version with the version of the driver that you downloaded in Step 1: Download a Tesla driver.

      • Source code

        • CentOS 7.x

          driver_version=460.91.03
          yum -y install yum-utils
          yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
          yum install -y nvidia-fabric-manager-${driver_version}-1
        • CentOS 8.x

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          distribution=rhel8
          ARCH=$( /bin/arch )
          dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/$distribution/${ARCH}/cuda-$distribution.repo
          dnf module enable -y nvidia-driver:${driver_version_main}
          dnf install -y nvidia-fabric-manager-0:${driver_version}-1
      • Installation package

        • CentOS 7.x

          driver_version=460.91.03
          wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/rhel7/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
        • CentOS 8.x

          driver_version=460.91.03
          wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/rhel8/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
    2. Run the following commands to start NVIDIA Fabric Manager:

      systemctl enable nvidia-fabricmanager
      systemctl start nvidia-fabricmanager
    3. Run the following command to check whether NVIDIA Fabric Manager is installed:

      systemctl status nvidia-fabricmanager

      If the following command output is displayed, NVIDIA Fabric Manager is installed.

      image.png

Other Linux distributions such as Ubuntu

  1. Grant the permissions on the installation package of your Tesla driver and install the driver.

    In this example, a Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:

    Note

    If the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.

    chmod +x NVIDIA-Linux-x86_64-xxxx.run
    sh NVIDIA-Linux-x86_64-xxxx.run
  2. Run the following command to check whether the Tesla driver is installed:

    nvidia-smi

    If the following command output is displayed, the Tesla driver is installed.

    off状态.png

  3. (Optional) Enable the persistence mode (Persistence-M) by using the NVIDIA Persistence Daemon.

    After the Tesla driver is installed, Persistence-M is in the disabled (off) state by default. A Tesla driver can achieve more stable performance when Persistence-M is enabled. We recommend that you enable Persistence-M by using the NVIDIA Persistence Daemon to ensure business continuity. For more information, see Persistence Daemon.

    Note
    • Persistence-M is a term for a user-settable driver property that keeps a GPU in the initialized state.

    • NVIDIA provides the Persistence Mode (Legacy) method to enable Persistence-M by using the nvidia-smi -pm 1 command. The Persistence Mode (Legacy) method is near end-of-life and will be deprecated and replaced by the NVIDIA Persistence Daemon method.

    1. Run the following command to run the NVIDIA Persistence Daemon:

      sudo nvidia-persistenced --user username 
      # Replace username with your username.

    2. Run the following command to view the status of Persistence-M:

      nvidia-smi

      If the following command output is displayed, Persistence-M is in the enabled (on) state.

      2023-08-11_18-02-33.png

  4. (Optional) Enable Persistence-M after you restart the system.

    If you restart the system, the enabled (on) state of Persistence-M becomes invalid. You can perform the following operations to enable Persistence-M:

    Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path by installing the Tesla driver installation package.

    1. Run the following command to decompress and install the installation script provided by NVIDIA:

      cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
      tar xf nvidia-persistenced-init.tar.bz2
      cd  nvidia-persistenced-init
      sh install.sh
    2. Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:

      systemctl status nvidia-persistenced

      If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.

      已安装.png

      Note

      You can adapt the NVIDIA Persistence Daemon installation script based on your OS to ensure that the NVIDIA Persistence Daemon works as expected.

    3. Run the following command to verify that Persistence-M is in the enabled (on) state:

      nvidia-smi
    4. (Optional) Run the following command to disable the NVIDIA Persistence Daemon.

      You can disable the NVIDIA Persistence Daemon based on your business requirements.

      systemctl stop nvidia-persistenced
      systemctl disable nvidia-persistenced
  5. (Optional) Install NVIDIA Fabric Manager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.

    Important
    • If your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install NVIDIA Fabric Manager that matches the driver version. Otherwise, you cannot use the instance as expected.

    • You can skip this operation if your GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.

    1. Install NVIDIA Fabric Manager.

      You can install NVIDIA Fabric Manager by using the source code or the installation package. The commands that are required to install NVIDIA Fabric Manager vary based on your OS. In the following examples, the driver version is 460.91.03, and Ubuntu 16.04, Ubuntu 18.04, and Ubuntu 20.04 are used. Replace driver_version with the version of the driver that you downloaded in Step 1: Download a Tesla driver.

      • Source code

        Ubuntu 16.04, Ubuntu 18.04, or Ubuntu 20.04

        driver_version=460.91.03
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
        wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
        mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
        wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
        apt-key add 3bf863cc.pub
        rm 3bf863cc.pub
        echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
        apt-get update
        apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*
      • Installation package

        • Ubuntu 16.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu1604/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 18.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu1804/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 20.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
    2. Run the following commands to start NVIDIA Fabric Manager:

      systemctl enable nvidia-fabricmanager
      systemctl start nvidia-fabricmanager
    3. Run the following command to check whether NVIDIA Fabric Manager is installed:

      systemctl status nvidia-fabricmanager

      If the following command output is displayed, NVIDIA Fabric Manager is installed.

      image.png

References

  • If you purchase a GPU-accelerated compute-optimized Windows instance, you can install only a Tesla driver to better use the instance in general computing scenarios, such as deep learning and AI scenarios. For more information, see Install a Tesla driver on a GPU-accelerated compute-optimized Windows instance.

  • You can install a Tesla driver when you create a GPU-accelerated instance. For more information, see Create a GPU-accelerated instance.

  • If you no longer need a Tesla driver, you can uninstall the driver. For more information, see Uninstall an NVIDIA Tesla driver.

  • If the driver version of your GPU-accelerated instance cannot meet your business requirements, or your GPU-accelerated instance becomes unavailable due to an invalid driver type or version, you can uninstall the driver and install a new driver. You can also upgrade the driver. For more information, see Upgrade an NVIDIA Tesla or GRID driver.