If your GPU-accelerated compute-optimized instance is not configured with a driver, you must install a driver on the instance to ensure the performance of the instance. For example, you must install a driver when you create a GPU-accelerated compute-optimized instance that is not configured with a GPU driver that supports automatic installation, or when you cannot find a public image of the required OS type and version for your driver to create a GPU-accelerated compute-optimized instance. This topic describes how to install a Linux GPU driver on a GPU-accelerated compute-optimized instance.

Background information

To install a GPU driver on a GPU-accelerated instance, the OS of the GPU driver must be the same as that of the GPU-accelerated instance. This topic only describes how to install a Linux GPU driver on a GPU-accelerated compute-optimized instance. For more information about how to install a Windows GPU driver on a GPU-accelerated compute-optimized instance, see Install a Windows GPU driver on a GPU-accelerated compute-optimized instance.

Procedure

  1. Visit the DOWNLOAD DRIVERS page on the NVIDIA official website.
  2. Search for the driver that you want to install.
    1. From the Product Type, Product Series, and Product drop-down lists, select values based on the specifications of your GPU-accelerated compute-optimized instance. For more information about how to view instance specifications, see View instance information.
      The following table describes the specifications of each instance family.
      Specification gn4 gn5 gn5i gn6v gn6i gn6e gn7 gn7i
      Product Type Data Center / Tesla Data Center / Tesla Data Center / Tesla Data Center / Tesla Data Center / Tesla Data Center / Tesla Data Center / Tesla Data Center / Tesla
      Product Series M-Class P-Series P-Series V-Series T-Series V-Series A-Series A-Series
      Product M40 Tesla P100 Tesla P4 Tesla V100 Tesla T4 Tesla V100 NVIDIA A100 NVIDIA A10
    2. From the Operating System drop-down list, select a Linux OS based on the image of your instance.
      If your instance runs the Debian OS, select Linux 64-bit from the drop-down list. If you cannot find the OS that you want, select Show all Operating Systems at the bottom of the drop-down list.
    3. From the CUDA Toolkit drop-down list, select a version for CUDA Toolkit.
    4. From the Language drop-down list, select a language.
    5. Click SEARCH, find the required driver version, and then click the driver name.
  3. On the driver details page, click DOWNLOAD. On the Download page, right-click DOWNLOAD and select Copy link address to copy the download address.
  4. Connect to the instance.
    Use one of the following methods to remotely connect to the instance.
    Connection method References
    VNC Connect to a Linux instance by using password authentication
  5. Paste the download address that you have copied in Step 3 to the wget command and run the following command to download the installation package.
    wget https://cn.download.nvidia.com/tesla/460.73.01/NVIDIA-Linux-x86_64-460.73.01.run
  6. Install the GPU driver.
    1. Run the following command to check whether the kernel-devel and kernel-headers packages are installed on the instance.

      An instance that runs CentOS is used in this example.

      rpm  -qa | grep $(uname -r)
      • If the command output contains the version information about the kernel-devel and kernel-headers packages, the packages are installed.
        kernel-3.10.0-1062.18.1.el7.x86_64
        kernel-devel-3.10.0-1062.18.1.el7.x86_64
        kernel-headers-3.10.0-1062.18.1.el7.x86_64
      • If the command output does not contain information about kernel-devel-* and kernel-headers-*, you must download and install the kernel-devel and kernel-headers packages of the kernel version that you want to use.
        Note If the kernel-devel version is inconsistent with the kernel version, a compilation error occurs in the driver when you install RPM Package Manager for your driver. Therefore, you must check the version number of kernel-* in the command output before you download kernel-devel. In the preceding command output, the version number of the kernel is 3.10.0-1062.18.1.el7.x86_64.
    2. Install the GPU driver.

      A Linux 64-bit driver that is downloaded in the .run format is used in this example. Example: NVIDIA-Linux-x86_64-xxxx.run. Run the following commands to authorize and install the GPU driver:

      chmod +x NVIDIA-Linux-x86_64-xxxx.run
      sh NVIDIA-Linux-x86_64-xxxx.run
    3. Run the following command to check whether the driver is installed:
      nvidia-smi
      If information similar to the following command output is displayed, the GPU driver is installed. kernel
  7. If you want to create a GPU-accelerated instance that belongs to the instance family ebmgn7, perform the following operations to install NVIDIA Fabric Manager of the version that matches your driver version. Otherwise, you cannot use the instance as expected.
    1. Install NVIDIA Fabric Manager.
      The commands that you run to install NVIDIA Fabric Manager vary based on OS types. The following information describes how you run the commands.

      A driver of the 460.91.03 version is used in this example. You can change the code next to driver_version= based on your business requirements.

      • CentOS 7.x
        driver_version=460.91.03
        yum -y install yum-utils
        yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
        yum install -y nvidia-fabric-manager-${driver_version}-1
      • CentOS 8.x
        driver_version=460.91.03
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        distribution=rhel8
        ARCH=$( /bin/arch )
        dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/$distribution/${ARCH}/cuda-$distribution.repo
        dnf module enable -y nvidia-driver:${driver_version_main}
        dnf install -y nvidia-fabric-manager-0:${driver_version}-1
      • Ubuntu 16.04 or Ubuntu 18.04
        driver_version=460.91.03
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
        wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
        mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
        wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/7fa2af80.pub
        apt-key add 7fa2af80.pub
        rm 7fa2af80.pub
        echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
        apt-get update
        apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*
      • Ubuntu 20.04
        driver_version=460.91.03
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        apt-get update
        apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*
    2. Run the following commands to start NVIDIA Fabric Manager:
      systemctl enable nvidia-fabricmanager
      systemctl start nvidia-fabricmanager
    3. Run the following command to view the status of NVIDIA Fabric Manager:
      systemctl status nvidia-fabricmanager

      If the following command output is displayed, NVIDIA Fabric Manager is installed.

      2021-09-28_15-09-52