If you do not configure a GPU driver that supports automatic installation or if you cannot find a public image of the required OS type and version when you create a GPU-accelerated compute-optimized instance, you must install a driver on the instance after the instance is created to ensure the performance of the instance. This topic describes how to install a GPU driver on a Linux GPU-accelerated compute-optimized instance after the instance is created.

Background information

To install a GPU driver on a GPU-accelerated instance, the OS of the GPU driver must be the same as that of the GPU-accelerated instance. This topic describes how to install a GPU driver only on a Linux GPU-accelerated compute-optimized instance. For more information about how to install a GPU driver on a Windows GPU-accelerated compute-optimized instance, see Install a GPU driver on a Windows GPU-accelerated compute-optimized instance.

Procedure

  1. Visit the DOWNLOAD DRIVERS page on the NVIDIA official website.
  2. Set the search conditions and click SEARCH to search for the driver that you want to install.
    2021-07-02_18-25-40

    The following table describes the search conditions.

    Search condition Description Example
    • Product Type
    • Product Series
    • Product
    From the Product Type, Product Series, and Product drop-down lists, select values based on the GPU with which your GPU-accelerated compute-optimized instance is configured. For more information about how to view instance specifications, see View instance information.

    The following table describes the driver specifications that you can select for each instance family.

    • Data Center / Tesla
    • P-Series
    • Tesla P100
    Operating System Select a Linux version based on the image of the instance.

    If you cannot find the Linux version that matches the image of the instance, select Linux 64-bit.

    Linux 64-bit
    CUDA Toolkit Select a CUDA Toolkit version. 11.2
    Language Select a language for the driver that you want to install. Chinese (Simplified)
    Recommended/Beta By default, All is selected. You can use the default setting. All
  3. In the search result, find the driver version that you want to download and click the driver name.
  4. On the driver details page, click DOWNLOAD. On the Download page, right-click AGREE & DOWNLOAD and select Copy link address to copy the download address.
    download-linux
  5. Connect to the GPU-accelerated compute-optimized instance.
    Use one of the methods that are described in the following table to connect to the instance.
    Connection method References
    Workbench Connect to a Linux instance by using a password or key
    VNC Connect to a Linux instance by using a password
  6. Paste the download address that you copied in Step 4 to the wget command and run the command to download the installation package. Sample command:
    wget https://cn.download.nvidia.com/tesla/460.73.01/NVIDIA-Linux-x86_64-460.73.01.run
  7. Install the GPU driver.
    1. If your instance runs CentOS, run the following command to check whether the kernel-devel and kernel-headers packages are installed on the instance. If your instance runs another OS type such as Ubuntu in which the kernel-devel and kernel-headers packages are pre-installed, skip this step.
      rpm  -qa | grep $(uname -r)
      • If the command output contains the version information of the kernel-devel and kernel-headers packages, the packages are installed. Sample command output:
        kernel-3.10.0-1062.18.1.el7.x86_64
        kernel-devel-3.10.0-1062.18.1.el7.x86_64
        kernel-headers-3.10.0-1062.18.1.el7.x86_64
      • If the command output does not contain the version information in the formats of kernel-devel-* and kernel-headers-*, you must download and install the kernel-devel and kernel-headers packages of the kernel version that you want to use.
        Note If the kernel-devel version is different from the kernel version, a compilation error occurs in the driver when you install RPM Package Manager for your driver. Therefore, you must check the kernel version information in the format of kernel-* in the command output before you download the kernel-devel package. In the preceding command output, the version number of the kernel is 3.10.0-1062.18.1.el7.x86_64.
    2. Install the GPU driver.

      In this example, a Linux 64-bit driver package in the .run format is downloaded. Example: NVIDIA-Linux-x86_64-xxxx.run. Run the following commands to grant the execute permissions on the GPU driver to all users and install the GPU driver:

      chmod +x NVIDIA-Linux-x86_64-xxxx.run
      sh NVIDIA-Linux-x86_64-xxxx.run
    3. Run the following command to check whether the driver is installed:
      nvidia-smi
      If the command output is similar to the information displayed in the following figure, the GPU driver is installed. kernel
  8. If the GPU-accelerated instance that you created belongs to the instance family ebmgn7, ebmgn7e, or sccgn7ex, perform the following operations to install NVIDIA Fabric Manager of the version that matches your driver version. Otherwise, you cannot use the instance as expected.
    1. Install NVIDIA Fabric Manager.
      You can install NVIDIA Fabric Manager by using the source code or an installation package. The commands that you can run to install NVIDIA Fabric Manager vary based on the instance OS. In this example, a driver of the 460.91.03 version is used. You can change the version number next to driver_version= based on your business requirements. Sample commands:
      • Install NVIDIA Fabric Manager by using the source code.
        • CentOS 7.x
          driver_version=460.91.03
          yum -y install yum-utils
          yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
          yum install -y nvidia-fabric-manager-${driver_version}-1
        • CentOS 8.x
          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          distribution=rhel8
          ARCH=$( /bin/arch )
          dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/$distribution/${ARCH}/cuda-$distribution.repo
          dnf module enable -y nvidia-driver:${driver_version_main}
          dnf install -y nvidia-fabric-manager-0:${driver_version}-1
        • Ubuntu 16.04, Ubuntu 18.04, or Ubuntu 20.04
          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
          wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
          mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
          wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/7fa2af80.pub
          apt-key add 7fa2af80.pub
          rm 7fa2af80.pub
          echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
          apt-get update
          apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*
      • Install NVIDIA Fabric Manager by using an installation package.
        • CentOS 7.x
          driver_version=460.91.03
          wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/rhel7/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
        • CentOS 8.x
          driver_version=460.91.03
          wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/rhel8/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
        • Ubuntu 16.04
          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu1604/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 18.04
          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu1804/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 20.04
          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
    2. Run the following commands to start NVIDIA Fabric Manager:
      systemctl enable nvidia-fabricmanager
      systemctl start nvidia-fabricmanager
    3. Run the following command to view the status of NVIDIA Fabric Manager:
      systemctl status nvidia-fabricmanager

      If the command output is similar to the information displayed in the following figure, NVIDIA Fabric Manager is installed.

      2021-09-28_15-09-52
  9. If the GPU-accelerated instance that you created belongs to the instance family gn7, you can install a GPU driver only of a version earlier than 510.47.03. If you want to use NVIDIA A100 and install CUDA Toolkit 11.6 and a GPU driver of the 510.47.03 version or later on NVIDIA A100, you must create a GPU-accelerated instance that belongs to the instance family gn7e.
    After the GPU-accelerated instance is created, you must manually install the GPU driver. For more information, see Step 7.