You must install the GPU driver to use a compute optimized instance with GPU capabilities. You can choose whether to install the GPU driver when you create an instance, or manually install the driver after the instance is created. This topic describes how to create a compute optimized instance with GPU capabilities and install the driver during the creation process.

Background information

If you choose to install the driver when you create the instance, take note of the following items:
  • The GPU driver can only be installed on instances that run Linux public images.
  • The automatic installation script has been updated to V2.1. It can be used in the automatic installation of different versions of GPU drivers, CUDA, and CUDA Deep Neural Network (cuDNN) libraries.
  • Depending on the internal bandwidth and the number of CPUs of the specific instance type, the automatic installation process requires 10 to 15 minutes. The GPU cannot be used during the installation process. Do not perform any operation on the instance or install other GPU-related software. Otherwise, the automatic installation may fail, causing the instance to become unavailable.
  • If you change the operating system after you create the instance, ensure that you use the same image or an image that can automatically install the CUDA and GPU drivers to prevent failure in automatic installation. For more information, see Images supporting automatic installation of CUDA and GPU drivers.
  • You can connect to the instance and view the installation progress and result in the installation log:
    • If the installation is in progress, you can see the installation progress bar.
    • If the installation succeeds, NVIDIA INSTALL OK is displayed.
    • If the installation fails, NVIDIA INSTALL FAIL is displayed.
    • The storage path for detailed installation logs is /root/nvidia/nvidia_install.log.

Procedure

This procedure focuses on the configurations related to compute optimized instances with GPU capabilities. For more information about general configurations, see Create an instance by using the provided wizard.

  1. Go to the ECS buy page.
  2. Perform the following steps to complete Basic Configurations. When you configure parameters, note that:
    • Region: Select the region and zone based on the following table. The following table is for reference only. The actual prices on the buy page will prevail in the case the information is inconsistent.
      The following table lists the regions and zones that provide compute optimized instance families with GPU capabilities.
      Instance type Zone
      gn4
      • Beijing Zone A and Shanghai Zone B
      • Shenzhen Zone B and Shenzhen Zone C
      gn5
      • Beijing Zone C, Beijing Zone D, Beijing Zone G, Zhangjiakou (Beijing Winter Olympics) Zone A, Zhangjiakou Zone B, and Hohhot Zone A
      • Hangzhou Zone F, Hangzhou Zone G, Hangzhou Zone I, Shanghai Zone B, Shanghai Zone D, Shanghai Zone E, and Shanghai Zone F
      • Shenzhen Zone A, Shenzhen Zone B, Shenzhen Zone C, Shenzhen Zone D, and Shenzhen Zone E
      • Hongkong Zone B and Hongkong Zone C
      • Singapore Zone A, Singapore Zone B, Sydney Zone A, Sydney Zone B, Kuala Lumpur Zone A, Kuala Lumpur Zone B, and Jakarta Zone A
      • Tokyo Zone A and Tokyo Zone B
      • Silicon Valley Zone A, Silicon Valley Zone B, Virginia Zone A, and Virginia Zone B
      • Frankfurt Zone A and Frankfurt Zone B
      • Mumbai Zone A
      gn5i
      • Beijing Zone C, Beijing Zone E, Beijing Zone G, Zhangjiakou (Beijing Winter Olympics) Zone A, and Zhangjiakou Zone B
      • Hangzhou Zone B, Hangzhou Zone G, Shanghai Zone D, and Shanghai Zone E
      • Shenzhen Zone A, Shenzhen Zone C, Shenzhen Zone D, and Shenzhen Zone E
      gn6v
      • Beijing Zone G, Beijing Zone H, Zhangjiakou (Beijing Winter Olympics) Zone A, and Zhangjiakou Zone B
      • Hangzhou Zone H, Hangzhou Zone I, and Shanghai Zone F
      • Shenzhen Zone E
      • Silicon Valley Zone B
      gn6i
      • Hangzhou Zone H, Shanghai Zone F, and Shanghai Zone G
      • Shenzhen Zone E
      • Singapore Zone C
    • Instance Type: Choose Heterogeneous Computing > Compute Optimized Type with GPU and select the instance type as needed.
    • Image: Some public images support the automatic installation of the GPU driver. You can also select an image pre-installed with the GPU driver and related software by clicking Marketplace Image.
      Note If you select Shared Image or Custom Image, make sure that the selected image is pre-installed with the required GPU driver and software.
      • Public images are basic system images provided by Alibaba Cloud or its third-party partners. The following public images support the automatic installation of GPU drivers:
        • CentOS 64-bit images that are applied and tested by Alibaba Cloud
        • Ubuntu 16.04 64-bit images
        • Ubuntu 18.04 64-bit images
        • SUSE Linux Enterprise Server 12 SP2 64-bit images

        If you select an image that supports the automatic installation of the GPU driver, select Auto-install GPU Driver, and select the versions of GPU driver, CUDA, and cuDNN library. If you want to create an instance for a new business system, we recommend that you select the latest version.

        For images that support the automatic installation of the GPU driver, if you do not select Auto-install GPU Driver, you can configure the installation script in the User Data module. For more information about the installation script, see Automatic installation script V2.1.

        If you do not select Auto-install GPU Driver or the image does not support the automatic installation of the GPU driver, you can manually install the GPU driver after you create an instance. For more information, see Install the GPU driver.

        Note If you call the RunInstances operation to create a compute optimized instance with GPU capabilities, you must use the UserData parameter to upload the installation script. The script content must be Base64-encoded.
      • Alibaba Cloud Marketplace provides images with operating systems, application environments, and various software pre-installed. Marketplace images are reviewed by Alibaba Cloud to ensure quality and stability. You can deploy ECS instances with one click without any configuration for the instance. Alibaba Cloud Marketplace provides images that support deep learning and machine learning:
        • If you decide to use the compute optimized instance with GPU capabilities in deep learning, you can select an image with deep learning software pre-installed. You can use the keywords deep learning to search for available images on Alibaba Cloud Marketplace. Only CentOS 7.3 is supported.
        • The NVIDIA GPU Cloud VM Image is an optimized environment for running the deep learning software, HPC applications, and HPC visualization tools available from the NVIDIA GPU Cloud (NGC) container registry. Instance families gn5, gn5i, gn6v, and gn6i support NGC. For more information, see Deploy an NGC on gn5 instances.
  3. Complete the Networking configurations. When you configure parameters, note that:
    • Network Type: Select VPC.
    • Public IP Address: Select a bandwidth based on your business needs.
      Notice If you select an image of Windows 2008 R2 or earlier in Basic Configurations, you cannot connect to the compute optimized instance with GPU capabilities through the management terminal after the GPU driver is installed and takes effect. When you connect to the instance, a black screen or the startup interface persists. You need to select Assign Public IP Address in the Public IP Address section, or bind an Elastic IP Address after you create the instance to connect to the instance over other protocols, such as RDP (Remote Desktop in Windows), PCOIP, and XenDeskop HDX 3D. RDP does not support applications such as DirectX and OpenGL. You need to install the VNC service and client.
  4. Complete the System Configurations. When you configure parameters, note that:
    • Logon Credentials: We recommend you select Key Pair or Password. If you select Set Later, to log on to the instance through the management terminal, you must bind an SSH key pair or reset the password and then restart the instance for the modification to take effect. If you have not installed the GPU driver, restarting the instance will cause the installation to fail.
    • User Data:
      • If you select Auto-install GPU Driver in the Image section on the Basic Configurations page, the precautions and shell script content for automatically installing CUDA and the GPU driver will be displayed in the User Data section.
      • If you do not select Auto-install GPU Driver, you can configure the installation script in User Data. For information about a script example, see Automatic installation script V2.1.
  5. Complete the Grouping configurations and confirm your order on the Preview page to complete the creation of a compute optimized instance with GPU capabilities.
    Note
    • If you configured the automatic installation script, the instance automatically installs the GPU driver after the instance is started. After the GPU driver is installed, the instance will automatically restart and the GPU will become operational.
    • The GPU driver works more stably in the Persistence Mode. The installation script enables Persistence Mode by default and adds this setting to the Linux service. This setting ensures Persistence Mode is enabled by default next time the instance is started.

Automatically install the GPU driver script

When the instance is started for the first time, cloud-init automatically runs the shell script to install the GPU driver, CUDA, and cuDNN library.

  • If you select Auto-install GPU Driver, the following versions of GPU driver, CUDA, and cuDNN library are available.
    CUDA GPU driver cuDNN Supported instance family Supported public image version (Only images that are applied and tested by Alibaba Cloud are supported)
    10.1.168 418.67 7.5.0
    • gn5
    • gn5i
    • gn6v
    • gn4
    • gn6i
    • Ubuntu 18.04
    • Ubuntu 16.04
    • CentOS 7.x
    • CentOS 6.x
    10.0.130
    • 418.67
    • 410.104
    • 7.5.0
    • 7.4.2
    • 7.3.1
    • gn5
    • gn5i
    • gn6v
    • gn4
    • gn6i
    • Ubuntu 18.04
    • Ubuntu 16.04
    • CentOS 7.x
    • CentOS 6.x
    9.2.148
    • 418.67
    • 410.104
    • 7.5.0
    • 7.4.2
    • 7.3.1
    • 7.1.4
    • gn5
    • gn5i
    • gn4
    • gn6v
    • Ubuntu 16.04
    • CentOS 7.x
    • CentOS 6.x
    9.0.176
    • 418.67
    • 410.104
    • 390.116
    • 7.5.0
    • 7.4.2
    • 7.3.1
    • 7.1.4
    • 7.0.5
    • gn5
    • gn5i
    • gn4
    • gn6v
    • Ubuntu 16.04
    • CentOS 7.x
    • CentOS 6.x
    • SUSE 12SP2
    8.0.61
    • 418.67
    • 410.104
    • 390.116
    • 7.1.3
    • 7.0.5
    • gn5
    • gn5i
    • gn4
    • Ubuntu 16.04
    • CentOS 7.x
    • CentOS 6.x
  • If you configure the installation script in the User Data section, see Automatic installation script V2.1 for the script content.
    The automatic installation script V2.1 has the following benefits:
    • Provides the latest CUDA, GPU driver, and cuDNN library.
    • After you log on to the instance, if the GPU driver is being installed, you can see the installation progress bar. If the installation succeeds, NVIDIA INSTALL OK is displayed. If the Installation fails, NVIDIA INSTALL FAIL is displayed.
    When you use the automatic installation script V2.1, you need to modify the following parameters of the installation script to specify the versions of GPU driver, CUDA, and cuDNN, for example:
    DRIVER_VERSION="410.104"
    CUDA_VERSION="10.0.130"
    CUDNN_VERSION="7.5.0"
    Note If the image runs CentOS or SUSE, the installation script uses the .run installation package. If the image runs Ubuntu, the installation script uses the .deb installation package.

Automatic installation script V2.1

#! /bin/sh

#Please input version to install
DRIVER_VERSION=""
CUDA_VERSION=""
CUDNN_VERSION=""

INSTALL_DIR="/root/nvidia"
log=${INSTALL_DIR}/nvidia_install.log

#using .deb to install driver and cuda on ubuntu OS
#using .run to install driver and cuda on ubuntu OS
nvidia_script="nvidia_install.sh"
script_download_url=$(curl http://100.100.100.200/latest/meta-data/source-address | head -1)"/opsx/ecs/linux/binary/nvidia/script/${nvidia_script}"
echo $script_download_url

mkdir $INSTALL_DIR && cd $INSTALL_DIR
wget -t 10 --timeout=10 $script_download_url && sh ${INSTALL_DIR}/${nvidia_script} $DRIVER_VERSION $CUDA_VERSION $CUDNN_VERSION