To install a GPU driver on a GPU-accelerated compute-optimized Linux instance, you can configure automatic installation for the GPU driver when you create an instance, or you can manually install the GPU driver after the instance is created. This topic describes how to manually install a GPU driver for a GPU-accelerated compute-optimized Linux instance.

Background information

GPU drivers are OS-specific. For example, you cannot install a Windows driver on a Linux instance and vice versa.
Note For more information about installing a GPU driver on a GPU-accelerated compute-optimized Windows instance, see Install a GPU driver on a GPU-accelerated compute-optimized Windows instance.

Scenario

This topic applies to the scenarios that meet the following requirements.

  • You did not enable Automatically Install GPU Driver when you created the instance, or you cannot find the public images of the required OS types or versions in Alibaba Cloud Marketplace.
    Note For more information about how to automatically install the GPU driver when you create a GPU-accelerated compute-accelerated Linux instance, see Select an instance type. .
  • You want use GPU-accelerated compute-optimized instances in general computing scenarios, such as running deep learning and AI tasks, or to perform graphics-intensive workloads such as OpenGL and Direct3D tasks. We recommend that you install GPU drivers in these scenarios.
  • This topic is applicable to GPU-accelerated compute-optimized Linux instance families. For more information about GPU-accelerated compute-optimized instance families, see GPU-accelerated compute-optimized instance families.
Note For more information about scenarios and installation methods of GPU drivers, see Driver installation method.

Procedure

Make sure that you understand the following information before you download a GPU driver:
  • If the GPU-accelerated instance that you created belongs to the instance family gn7, you can only install a GPU driver whose version is earlier than 510.47.03.
  • If the GPU-accelerated instance that you created belongs to the instance family gn7e, you can use NVIDIA A100 to install CUDA 11.6 toolkit, or a driver whose version is 510.47.03 or later.
  1. Download the GPU driver.
    1. Visit the NVIDIA Driver Downloads page on the NVIDIA official website.
      Note For more information about installing and configuring the NVIDIA driver, see NVIDIA Driver Installation Quickstart Guide.
    2. Set the filters and click SEARCH to search for the driver that you want to install.
      liunx

      The following table describes the filters.

      FilterDescriptionExample:
      • Product Type
      • Series
      • Product
      From the Product Type, Product Series, and Product drop-down lists, select values based on the GPU with which your GPU-accelerated compute-optimized instance is configured.
      Note For more information about how to view the details of a GPU-accelerated instance, such as the instance ID, instance type, and operating system, see View instance information.
      • Data Center / Tesla
      • P-Series
      • Tesla P100
      Operating SystemSelect a Linux version based on the image of the instance. Linux 64-bit
      CUDA ToolkitSelect a CUDA Toolkit version. 11.2
      LanguageSelect a language for the driver that you want to install. Chinese (Simplified)
      Recommended/BetaBy default, All is selected. You can use the default setting. All
      The following table describes the GPU information of each instance family.
      Instance familygn4gn5gn5ign6vgn6ign6egn7gn7ign7e
      Product TypeData Center / TeslaData Center / TeslaData Center / TeslaData Center / TeslaData Center / TeslaData Center / TeslaData Center / TeslaData Center / TeslaData Center / Tesla
      Product SeriesM-ClassP-SeriesP-SeriesV-SeriesT-SeriesV-SeriesA-SeriesA-SeriesA-Series
      ProductM40Tesla P100Tesla P4Tesla V100Tesla T4Tesla V100NVIDIA A100NVIDIA A10NVIDIA A100
      Note The preceding table lists only the GPU information of some popular GPU-accelerated compute-optimized instance families. Instances that use the same GPU model have the same GPU information, such as product type, product series, and product family. For example, instances of the ebmgn7i and gn7i instance families use NVIDIA A10 GPUs. Therefore, the product type, product series, and product family of these instances are the same.
    3. In the search result, find the driver version that you want to download and click the driver name.
    4. On the driver details page, click Download. linux download page
    5. On the Download page, right-click Agree & Download and select Copy URL. driver download
    6. Use one of the following methods to connect to the instance.
    7. Append the download address that you copied in step 1 to the wget command and run the command to download the installation package. Sample command:
      wget https://cn.download.nvidia.com/tesla/460.73.01/NVIDIA-Linux-x86_64-460.73.01.run
  2. Install the GPU driver.
    The method of installing the GPU driver varies with the operating system of the instance.
    CentOS
    1. Run the following command to query whether the kernel-devel and kernel-headers packages are installed on the GPU-accelerated instance.
      rpm  -qa | grep $(uname -r)
      • If the command output contains the version information of the kernel-devel and kernel-headers packages, the packages are installed. Sample command output:
        kernel-3.10.0-1062.18.1.el7.x86_64
        kernel-devel-3.10.0-1062.18.1.el7.x86_64
        kernel-headers-3.10.0-1062.18.1.el7.x86_64
      • Check whether the version information of the kernel-devel (kernel-devel-*) and kernel-headers (kernel-headers-*) packages is returned. If not, install the packages.
        Important Make sure that the version of the packages is the same as that of the kernel. Otherwise, compilation errors will occur when you install RPM Package Manager for your driver. You can determine version of the kernel (kernel-*) from the command output. In the preceding command output, the version number of the kernel is 3.10.0-1062.18.1.el7.x86_64.
    2. Authorize and install the GPU driver.

      In this example, a Linux 64-bit driver package in the .run format is downloaded. Example: NVIDIA-Linux-x86_64-xxxx.run. Run the following commands to grant the execute permissions on the GPU driver to all users and install the GPU driver:

      chmod +x NVIDIA-Linux-x86_64-xxxx.run
      sh NVIDIA-Linux-x86_64-xxxx.run
    3. Run the following command to check whether the driver is installed:
      nvidia-smi
      If the command output is similar to the information displayed in the following figure, the GPU driver is installed. kernel
    4. Install NVIDIA Fabric Manager. Make sure that the version of NVIDIA Fabric Manager matches your driver version.
      This step is required only if the GPU-accelerated instance that you created belongs to the ebmgn7, ebmgn7e, ebmgn7ex, or sccgn7ex instance family.
      Important This step is required only if the GPU-accelerated instance that you created belongs to the ebmgn7, ebmgn7e, ebmgn7ex or sccgn7ex instance family.
      1. Install NVIDIA Fabric Manager.

        You can install NVIDIA Fabric Manager by using the source code or an installation package. The commands required to install NVIDIA Fabric Manager vary based on the instance OS. This example provides instructions on how to install driver version 460.91.03 on CentOS 7.x and CentOS 8.x. Replace driver_version with the version number of the driver that you download in step 1.

        • Source code
          • CentOS 7.x
            driver_version=460.91.03
            yum -y install yum-utils
            yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
            yum install -y nvidia-fabric-manager-${driver_version}-1
          • CentOS 8.x
            driver_version=460.91.03
            driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
            distribution=rhel8
            ARCH=$( /bin/arch )
            dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/$distribution/${ARCH}/cuda-$distribution.repo
            dnf module enable -y nvidia-driver:${driver_version_main}
            dnf install -y nvidia-fabric-manager-0:${driver_version}-1
        • Installation package
          • CentOS 7.x
            driver_version=460.91.03
            wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/rhel7/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
            rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          • CentOS 8.x
            driver_version=460.91.03
            wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/rhel8/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
            rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
      2. Run the following commands to start NVIDIA Fabric Manager:
        systemctl enable nvidia-fabricmanager
        systemctl start nvidia-fabricmanager
      3. Run the following command to check whether the NVIDIA Fabric Manager is installed:
        systemctl status nvidia-fabricmanager
        If the command output is similar to the information displayed in the following figure, NVIDIA Fabric Manager is installed. 2021-09-28_15-09-52
    Ubuntu and other operating systems
    1. Authorize and install the GPU driver.

      In this example, a Linux 64-bit driver package in the .run format is downloaded. Example: NVIDIA-Linux-x86_64-xxxx.run. Run the following commands to grant the execute permissions on the GPU driver to all users and install the GPU driver:

      chmod +x NVIDIA-Linux-x86_64-xxxx.run
      sh NVIDIA-Linux-x86_64-xxxx.run
    2. Run the following command to check whether the driver is installed:
      nvidia-smi
      If the command output is similar to the information displayed in the following figure, the GPU driver is installed. kernel
    3. Optional:Install NVIDIA Fabric Manager. Make sure that the version of NVIDIA Fabric Manager matches your driver version.
      This step is required only if the GPU-accelerated instance that you created belongs to the ebmgn7, ebmgn7e, ebmgn7ex, or sccgn7ex instance family.
      Important This step is required only if the GPU-accelerated instance that you created belongs to the ebmgn7, ebmgn7e, ebmgn7ex or sccgn7ex instance family.
      1. Install NVIDIA Fabric Manager.

        You can install NVIDIA Fabric Manager by using the source code or an installation package. This example provides instructions on how to install driver version 460.91.03 on Ubuntu 16.04, Ubuntu 18.04, or Ubuntu 20.04 operating system. Replace driver_version with the version number of the driver that you downloaded in step 1.

        • Source code
          Ubuntu 16.04, Ubuntu 18.04, or Ubuntu 20.04
          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
          wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
          mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
          wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/7fa2af80.pub
          apt-key add 7fa2af80.pub
          rm 7fa2af80.pub
          echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
          apt-get update
          apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*
        • Installation package
          • Ubuntu 16.04
            driver_version=460.91.03
            driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
            wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu1604/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
            dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          • Ubuntu 18.04
            driver_version=460.91.03
            driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
            wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu1804/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
            dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          • Ubuntu 20.04
            driver_version=460.91.03
            driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
            wget http://mirrors.cloud.aliyuncs.com/nvidia-cuda/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
            dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
      2. Run the following commands to start NVIDIA Fabric Manager:
        systemctl enable nvidia-fabricmanager
        systemctl start nvidia-fabricmanager
      3. Run the following command to check whether the NVIDIA Fabric Manager is installed:
        systemctl status nvidia-fabricmanager
        If the command output is similar to the information displayed in the following figure, NVIDIA Fabric Manager is installed. 2021-09-28_15-09-52