GPU-accelerated instances are a type of heterogeneous computing Elastic Compute Service (ECS) instance. GPU-accelerated instances can be created in the same manner as common ECS instances, but you must manually install the required drivers on GPU-accelerated instances. This topic describes how to create a NVIDIA GPU-accelerated instance by using the instance creation wizard in the ECS console.

Background information

GPU-accelerated instances support the following drivers:
  • GPU drivers: used to drive physical GPUs
  • GRID drivers: used to provide instances with graphics acceleration capabilities

Preparations

  1. Create an Alibaba Cloud account and complete the account information.
  2. Go to the Custom Launch tab in the ECS console.

Procedure

Step 1: Complete the settings in the Basic Configurations step

The settings in the Basic Configurations step include the basic purchase requirements (billing method, region, and zone) for the instance and the basic resources (instance type, image, and storage) required by the instance. After you complete the settings in the Basic Configurations step, click Next.

  1. Select a billing method.
    Different billing and charging rules apply to the instance based on the selected billing method. The state changes of instance resources also vary based on the billing method.
    Billing method Description References
    Subscription A billing method in which you pay for resources upfront to use over a period of time. Subscription
    Pay-As-You-Go A billing method in which you use resources first and pay for them afterward. The billing cycles of pay-as-you-go resources are accurate to the second. You can purchase and release pay-as-you-go resources on demand.
    Note We recommend that you use this billing method together with savings plans to reduce costs.
    Preemptible Instance A billing method in which you place a bid for available instance resources to create preemptible instances at a discount compared with pay-as-you-go instance pricing. Preemptible instances may be automatically released due to fluctuations in market price or insufficient resources for instance types. Preemptible instances
  2. Select a region and a zone.
    Select a region that is close to your geographical location to reduce latency. After an instance is created, the region and the zone of the instance cannot be changed. For more information, see Regions and zones.
  3. In the Instance Type section, specify parameters and select an instance type.
    1. Set Architecture to Heterogeneous Computing and set Category to Virtualization Compute Optimized Type with GPU or Compute Optimized Type with GPU. Then, select an instance type.
      Note
      • The available instance types vary based on the selected region. You can go to the ECS Instance Types Available for Each Region page to view the instance types available in each region.
      • If you have specific configuration requirements for the instance, for example, if you want the instance to have multiple bound elastic network interfaces (ENIs) or to use enhanced SSDs (ESSDs) or local disks, make sure that the selected instance type meets the requirements. For information about the features, use scenarios, and specifications of instance types, see Instance families.
      • If you want to create an instance for a specific scenario, click the Scenario-based Selection tab to view the instance types recommended for different scenarios. For example, you can set Business Scenario to AI Machine Learning to view the GPU-accelerated instance types appropriate for AI machine learning scenarios.
    2. Confirm the selected instance type next to Selected Instance Type.
    3. If you set Billing Method to Preemptible Instance, configure the Use Duration and Maximum Price for Instance Type parameters.
      The usage duration is the protection period of a preemptible instance. After the protection period ends, the instance may be released due to insufficient resources or lower bids than the market price. The following table describes the valid values of the Use Duration parameter.
      Value Description
      One Hour After the preemptible instance is created, it enters a 1-hour protection period during which it cannot be automatically released.
      None The preemptible instance is created without a protection period. Preemptible instances without a protection period are less expensive than preemptible instances with a protection period.
      The following table describes the valid values of the Maximum Price for Instance Type parameter.
      Value Description
      Use Automatic Bid Uses the real-time market price of the instance type. This price never exceeds the price of the corresponding pay-as-you-go instance. Automatic bidding can prevent the instance from being released due to lower bids than the market price, but cannot prevent the instance from being released due to insufficient resources.
      Set Maximum Price Sets a maximum price. If the real-time market price exceeds the maximum price or if resources are insufficient, the preemptible instance is released.
    4. Specify the number of instances to create.
      You can create a maximum of 100 instances each time by using the wizard. In addition, the number of instances within your account cannot exceed the quota. The quota displayed on the buy page prevails. For more information, see View and increase instance quotas.
  4. Select an image.
    Images contain information necessary to run instances. Alibaba Cloud provides you with a variety of images. The following table describes the image types.
    Image type Description References
    Public image Public images are the licensed base images provided by Alibaba Cloud. Public images for Windows Server operating systems (OSs) and mainstream Linux OSs are provided. Overview
    Custom image You can create or import your own custom images. Custom images can contain initial OS environments and customizations such as application environments and software configurations to save time in making repeated configurations. Overview
    Shared image Shared images are the custom images shared by other Alibaba Cloud accounts. You can use the images shared to you to create instances. Share or unshare custom images
    Alibaba Cloud Marketplace image An extensive range of images are provided in Alibaba Cloud Marketplace. Alibaba Cloud Marketplace images are thoroughly reviewed by Alibaba Cloud and can be used to create instances for website building and application development purposes without additional configurations. Alibaba Cloud Marketplace images
    If you select a GPU-accelerated compute-optimized instance type in the Instance Type section, GPU and GRID drivers can be installed. If you select a vGPU-accelerated instance type in the Instance Type section, only GRID drivers can be installed because the instance type is equipped with vGPUs that are generated from GPU virtualization with mediated pass-through. Meanwhile, selected images also affect how drivers are installed. Different methods are used to install different drivers.
    • GPU driver
      You can use one of the following methods to install the GPU driver:
      • Select Auto-install GPU Driver.
        Note Only some Linux public images allow the GPU driver to be automatically installed when you create instances. If you select Shared Image or Custom Image, you can install the GPU driver only after you create the instance.
        In the Image section, you can select Auto-install GPU Driver, AIACC-Training, and AIACC-Inference only after you select an image and an image version. If you select Auto-install GPU Driver, you must select a GPU driver version, a Compute Unified Device Architecture (CUDA) library version, and a NVIDIA CUDA Deep Neural Network (cuDNN) library version to install at the same time. select-autoinstall
        The following section describes GPU drivers, AIACC-Training, and AIACC-Inference and introduces the available versions of GPU drivers, CUDA libraries, and cuDNN libraries:
        • GPU drivers are used to drive physical GPUs and can work in an efficient manner when used together with the CUDA and cuDNN libraries. For a new business system, we recommend that you select the latest versions of the GPU driver, CUDA library, and cuDNN library. The following table lists the available versions of the GPU driver, CUDA library, and cuDNN library.
          CUDA GPU driver cuDNN Supported version of the public image (only images supplied and tested by Alibaba Cloud) Supported instance family
          11.0.2 450.80.02 8.0.4
          • Alibaba Cloud Linux 2
          • Ubuntu 20.04, Ubuntu 18.04, and Ubuntu 16.04
          • CentOS 8.x and CentOS 7.x
          • gn6v, gn6i, gn6e, gn5, and gn5i
          • ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
          10.2.89
          • 450.80.02
          • 440.64.00
          • 8.0.4
          • 7.6.5
          • Alibaba Cloud Linux 2
          • Ubuntu 18.04 and Ubuntu 16.04
          • CentOS 8.x, CentOS 7.x, and CentOS 6.x
          • gn6v, gn6i, gn6e, gn5, and gn5i
          • ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
          10.1.168
          • 450.80.02
          • 440.64.00
          • 8.0.4
          • 7.6.5
          • 7.5.0
          • Ubuntu 18.04 and Ubuntu 16.04
          • CentOS 7.x and CentOS 6.x
          • gn6v, gn6i, gn6e, gn5, and gn5i
          • ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
          10.0.130
          • 450.80.02
          • 440.64.00
          • 7.6.5
          • 7.5.0
          • 7.4.2
          • 7.3.1
          • Ubuntu 18.04 and Ubuntu 16.04
          • CentOS 7.x and CentOS 6.x
          • gn6v, gn6i, gn6e, gn5, and gn5i
          • ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
          9.2.148
          • 450.80.02
          • 440.64.00
          • 390.116
          • 7.6.5
          • 7.5.0
          • 7.4.2
          • 7.3.1
          • 7.1.4
          • Ubuntu 16.04
          • CentOS 7.x and CentOS 6.x
          • gn6v, gn6e, gn5, and gn5i
          • ebmgn6v, ebmgn6e, and ebmgn5i
          9.0.176
          • 450.80.02
          • 440.64.00
          • 390.116
          • 7.6.5
          • 7.5.0
          • 7.4.2
          • 7.3.1
          • 7.1.4
          • 7.0.5
          • Ubuntu 16.04
          • CentOS 7.x and CentOS 6.x
          • SUSE 12sp2
          • gn6v, gn6e, gn5, and gn5i
          • ebmgn6v, ebmgn6e, and ebmgn5i
          8.0.61
          • 450.80.02
          • 440.64.00
          • 390.116
          • 7.1.3
          • 7.0.5
          • Ubuntu 16.04
          • CentOS 7.x and CentOS 6.x
          • gn5 and gn5i
          • ebmgn5i
          Note If you want to replace the operating system after the instance is created, make sure that your selected image allows GPU drivers to be automatically installed.
        • AIACC-Training is an AI accelerator developed by Alibaba Cloud. AIACC-Training can accelerate major AI computing frameworks such as TensorFlow, PyTorch, MxNet, and Caffe to achieve significant gains in training performance. For more information, see AIACC-Training.
          Note AIACC-Training is not supported in CentOS 8, CentOS 6, SUSE Linux, or Alibaba Cloud Linux.
        • AIACC-Inference is an AI accelerator developed by Alibaba Cloud. AIACC-Inference can accelerate the major AI computing framework TensorFlow and exportable frameworks in the Open Neural Network Exchange (ONNX) format to achieve significant gains in inference performance. For more information, see AIACC-Inference.
          Note AIACC-Inference is not supported in CentOS 8, CentOS 6, SUSE Linux, or Alibaba Cloud Linux.
      • Select an Alibaba Cloud Marketplace image that is pre-installed with a GPU driver and relevant software. Alibaba Cloud Marketplace provides images that are pre-installed with operating systems, application environments, and various software. Alibaba Cloud Marketplace images are thoroughly reviewed by Alibaba Cloud to ensure quality and stability. You can use these images to create instances without additional configurations.

        Example: the NVIDIA GPU Cloud Virtual Machine Image deep learning image. The image is pre-installed with a NVIDIA GPU-specific optimized deep learning framework and an optimized environment for High Performance Computing (HPC) application containers. For more information, see Deploy an NGC environment on instances with GPU capabilities.

      • Manually install the GPU driver after you create the instance. For more information, see Manually install a GPU driver.
    • GRID driver

      No images that are pre-installed with GRID drivers are provided. You must purchase a GRID license from NVIDIA. Then, you must manually install a GRID driver and activate the license after the instance is created.

  5. Complete the storage and related settings.
    Instances obtain storage capabilities by attaching system disks, data disks, and Apsara File Storage NAS file systems. ECS provides disks and local disks to meet the requirements of different scenarios.
    Disks can be used as system disks or data disks and include ESSDs, standard SSDs, and ultra disks. For more information, see Cloud disks.
    Note The billing method of a disk that is created along with an instance is the same as that of the instance.
    Local disks can be used only as data disks. If an instance family (such as instance family with local SSDs and big data instance family) is equipped with local disks, the information of the local disks is displayed. For more information, see Local disks.
    Note Local disks cannot be attached to instances on your own.
    1. Select a system disk.
      System disks are used to install operating systems. The default capacity of a system disk is 40 GiB. However, the actual minimum capacity is related to the image. The following table describes the capacities of system disks for different images.
      Image System disk capacity (GiB)
      Linux (excluding CoreOS and Red Hat) [max{20, image size}, 500]
      FreeBSD [max {30, image size}, 500]
      CoreOS [max {30, image size}, 500]
      Red Hat [max {40, image size}, 500]
      Windows [max {40, image size}, 500]
    2. Optional:Select a data disk.
      You can create an empty disk or create a disk from a snapshot. A snapshot is a point-in-time backup of a disk. You can import data in a quick manner by creating a disk from a snapshot. When you choose a data disk, you can encrypt the disk to meet the requirements of scenarios such as data security and regulatory compliance. For more information about data encryption, see Encryption overview.
      Note The number of data disks that can be attached to a single instance is limited. For more information, see the "Elastic Block Storage (EBS) limits" section in Limits.
    3. Optional:Select a NAS file system.
      If you have a large amount of data for sharing by multiple instances, we recommend that you use a NAS file system to reduce costs in data transmission and synchronization.

      Select an existing NAS file system or click Create a General Purpose NAS File System in the NAS console. For more information about how to create a NAS file system, see Create a General-purpose NAS file system in the NAS console. After a NAS file system is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent NAS file system list. For more information about how to mount NAS file systems, see Mount NAS file systems when you purchase an ECS instance.

  6. Optional:Configure the snapshot service.
    You can use automatic snapshot policies to periodically back up disks to prevent risks such as accidental data deletion.

    Select an existing snapshot policy or click Create Automatic Snapshot Policy to create an automatic snapshot policy on the Snapshots page. For more information about how to create an automatic snapshot policy, see Create an automatic snapshot policy. After an automatic snapshot policy is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent automatic snapshot policy list.

Step 2: Complete the settings in the Networking step

You can make network and security group configurations to allow the instance to communicate with the Internet and other Alibaba Cloud resources and safeguard the instance in the network. After you complete the settings in the Networking step, click Next.

  1. Specify parameters in the Network Type and Public IP Address sections.
    Parameter Description References
    Network Type Select VPC.

    A virtual private cloud (VPC) is an isolated network exclusively dedicated for your use. You have full control over your VPC. For example, you can specify a private CIDR block and configure route tables and gateways for the VPC.

    If you do not want to use custom VPCs or vSwitches in the specified region, you can skip this operation. Then, the system creates a default VPC and a default vSwitch in that region.
    Note You can skip this operation only if you have not created VPCs in the region.

    Select an existing VPC and vSwitch. Alternatively, click go to the VPC console to go to the VPC console to create a VPC and a vSwitch. After the VPC and the vSwitch are created, go back to the ECS instance creation wizard and click the refresh icon to obtain the most recent VPC and vSwitch lists.

    Public IP Address If you selected an image of Windows 2008 R2 or earlier in the Basic Configurations step, you must select Assign Public IPv4 Address now or associate an elastic IP address (EIP) after the instance is created. This way, you can connect to the instance over other protocols such as the Remote Desktop Protocol (RDP) built into Windows, PC over IP (PCoIP), and XenDesktop HDX 3D. If you did not select Assign Public IPv4 Address or associate an EIP, you cannot connect to the instance by using a virtual network console (VNC) management terminal after the GPU driver is installed. A black screen or the startup interface persists when you attempt to connect to the instance.
    Note RDP does not support applications such as DirectX and OpenGL. You must install the VNC service and client on your own.
    Perform the following operations:
    1. Select Assign Public IPv4 Address.
    2. Set Bandwidth Billing to specify a billing method for network usage.
      • Pay-By-Bandwidth: You are charged based on the specified bandwidth. This billing method for network usage is suitable for scenarios that require stable network bandwidth.
      • Pay-By-Traffic: You are charged based on the actual data transfers. You can configure a maximum bandwidth for inbound and outbound traffic to avoid unmanageable fees caused by traffic bursts. This billing method for network usage is suitable for scenarios that require highly variable bandwidths, such as scenarios where traffic is usually low but spikes occur occasionally.
    3. Set Bandwidth or Peak Bandwidth.
    What is Elastic IP Address?
  2. Select a security group.
    A security group is a virtual firewall that is used to control the inbound and outbound traffic of instances in the security group. For more information, see Overview.

    If you do not configure parameters in a security group when you create an instance, you can skip the step. The system creates a default security group. The default security group allows inbound traffic over Secure Shell Protocol (SSH) port 22, Remote Desktop Protocol (RDP) port 3389, and Internet Control Message Protocol (ICMP). You can modify the security group configurations after the group is created.

    1. To create a security group, click Create Security Group.
      For more information about how to configure the security group, see Create a security group.
    2. Click Reselect Security Group.
    3. In the Select Security Group dialog box, select one or more security groups and click Select.
  3. Configure an ENI.
    ENIs include primary ENIs and secondary ENIs. Primary ENIs cannot be unbound from instances. They can be created and released only along with instances. Secondary ENIs can be bound to or be unbound from instances to allow traffic to be switched between instances. To create a secondary ENI when you create an instance, click the add-nic icon and select a vSwitch to which the secondary ENI belongs.
    Note You can bind only one secondary ENI when you create an instance. You can also create the secondary ENIs and bind them to an instance after the instance is created. For more information about the number of ENIs that can be bound to instances of different instance types, see Instance families.

Step 3: Complete the settings in the System Configurations step

System configurations are used to define what instance information to display in the ECS console and in the operating system or how to use the instance. System configurations include logon credentials, a hostname, and user data. After you complete the settings in the System Configurations step, click Next.

  1. Select and configure logon credentials.

    We recommend that you select Key Pair or Password. If you select Set Later, you must bind an Secure Shell (SSH) key pair or set a password by using the password reset feature before you can connect to the instance by using a VNC management terminal. You must restart the instance for the modification to take effect. If you restart the instance while the GPU driver is being installed, the GPU driver cannot be installed.

  2. Specify the instance name that you want to display in the ECS console and the hostname that can be obtained from within the operating system.
    If you want to create multiple instances, you can set sequential instance names and hostnames to facilitate management. For more information about how to configure sequential instance names and hostnames, see Batch configure sequential names or hostnames for multiple instances.
  3. Configure advanced options.
    1. Select an instance Resource Access Management (RAM) role.
      Instance RAM roles enable ECS instances to assume roles with specific access permissions. The instance can access the APIs of specific Alibaba Cloud services and manage specific Alibaba Cloud resources based on a Security Token Service (STS) temporary credential. This ensures security.

      Select an existing instance RAM role or click Create Instance RAM Role to create an instance RAM role. After an instance RAM role is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent instance RAM role list. For more information, see Bind an instance RAM role.

    2. Select an access mode of instance metadata.
      ECS instance metadata contains the information of instances in Alibaba Cloud. You can view the metadata of running instances and configure or manage the instances based on their metadata. You can access instance metadata in normal or security-hardening mode. For more information, see View instance metadata.
    3. Configure user data.
      User data can be run as scripts on instance startup to automate instance configurations, or be used as common data and passed into instances. For more information, see Manage the user data of Linux instances and Manage the user data of Windows instances.
      • If you selected Auto-install GPU Driver, AIACC-Training, or AIACC-Inference in the Basic Configurations step, the automatic installation script is displayed in the User Data field. The first time the new instance is started, cloud-init automatically runs the automatic installation script. autoinstall-script
      • If you did not select Auto-install GPU Driver, AIACC-Training, or AIACC-Inference in the Basic Configurations step, you can manually enter the automatic installation script in the User Data field. The GPU driver, AIACC-Training, or AIACC-Inference is automatically installed when the instance is started. After the GPU driver is installed, the instance is automatically restarted for the GPU driver to run. For information about how to prepare the automatic installation script, see the Automatic installation script section in this topic.
        Note The GPU driver is more stable in persistence mode. The automatic installation script automatically enables persistence mode for the GPU driver. The script also adds the corresponding commands as a Linux system service to ensure that persistence mode is automatically enabled for the GPU driver on instance startup.

Step 4: (Optional) Complete the settings in the Grouping (Optional) step

Grouping configurations provides ways such as tags and resource groups to batch manage instances. Complete the settings in the Grouping (Optional) step and click Next.

  1. Add tags.
    Each tag consists of a tag key and a tag value. You can add tags to resources that have identical characteristics, such as resources that belong to the same organization and resources that serve the same purpose. You can use tags to search for and manage resources in an efficient manner. For more information, see Overview.

    Select an existing tag, or enter a key and a value to create a tag.

  2. Select a resource group.
    Resource groups allow you to manage cross-region and cross-service resources based on your business requirements and manage the permissions of resource groups. For more information, see Resource groups.

    Select an existing resource group, or click click here to create a resource group on the Resource Group page. After a resource group is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent resource group list. For more information, see Create a resource group.

  3. Select a deployment set.
    Deployment sets support the high-availability policy. After you apply a high-availability policy to a deployment set, all the instances within the deployment set are distributed across different physical servers to ensure business availability and disaster recovery at the underlying layer.

    Select an existing deployment set or click manage the deployment set to create a deployment set. After the deployment set is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent deployment set list. For more information, see Create a deployment set.

  4. Select a dedicated host.
    A dedicated host is a cloud host whose physical resources are exclusively reserved for a single tenant. Dedicated hosts meet strict regulatory compliance requirements and support bring your own license (BYOL) when you migrate services to Alibaba Cloud.

    Select an existing dedicated host or click create a DDH to create a dedicated host. After the dedicated host is created, go back to ECS instance creation wizard and click the refresh icon to query the most recent dedicated host list. For more information, see Create a dedicated host.

  5. Select a private pool.
    After an elasticity assurance or a capacity reservation is created, the system generates a private pool to reserve resources for a specific number of instances that have specific attributes. During the validity period of the elasticity assurance or the capacity reservation, you always have access to the resources reserved in the private pool when you want to create instances. For more information, see Overview of Resource Assurance.
    Note Only pay-as-you-go instances can be created from the resources reserved by elasticity assurances or capacity reservations.
    Private pool Description
    Open The capacity in open private pools has a priority over public pool resources. If no capacity is available in private pools, the system attempts to use public pool resources.
    None The capacity in private pools is not used.
    Targeted The capacity in a specified or open private pool is used to create instances. If no capacity is available in the specified private pool, the instances cannot be created.

Step 5: Confirm the order

Before you confirm to create the instance, check the selected configurations such as the usage duration to ensure that they meet your requirements.

  1. Check the selected configurations.
    To modify a configuration, click the edit icon to go to the corresponding page. You can save the selected configurations as a template and then use the template to create instances that have identical configurations. The following table describes the buttons that can be used to save the configurations as a template.
    Button Description References
    Save as Launch Template Saves the configurations as a launch template. Then, you can create instances from this launch template without the need to make these configurations again. Create an instance by using a launch template
    View Open API Generates the API best-practice workflow and SDK examples for your reference.
    Save as ROS Template Saves the configurations as a Resource Orchestration Service (ROS) template. Then, you can create stacks from this template in the ROS console to deliver resources with a single click. Create a stack
  2. Configure the usage duration of the instance.
    • Pay-as-you-go instance: Set an automatic release time for the instance. You can also manually release the instance or set an automatic release time for the instance after it is created. For more information, see Release an instance.
    • Subscription instance: Set Duration and optionally select Auto-renewal. You can also manually renew the instance or enable auto-renewal for the instance after it is created. For more information, see Renewal overview.
  3. Read the ECS Terms of Service and Product Terms of Service. If you agree to them, select ECS Terms of Service and Product Terms of Service.
  4. View the total fees of the instance in the lower part of the page. If the fees are correct, confirm to create the instance and complete the payment.
    If you selected Auto-install GPU Driver, the GPU driver, CUDA library, and cuDNN library of the selected versions are automatically installed after the instance is created. The automatic installation process may take 10 to 20 minutes based on the internal bandwidth and the number of CPU cores of different instance types. You can connect to the instance to view the installation process. You can also view the auto_install.log file in the /root/auto_install directory after the GPU driver and the CUDA and cuDNN libraries are installed. The following table describes the display effects during the installation process.
    Installation status Display effect
    In progress The installation progress bar is displayed.
    Successful ALL INSTALL OK appears as the installation result.
    Failed INSTALL FAIL appears as the installation result.
    Notice When the GPU driver and the CUDA and cuDDN libraries are being installed, the GPUs are unavailable. To prevent installation failures and keep the instance available, do not perform operations or install other GPU-related software on the instance until the GPU driver and the CUDA and cuDDN libraries are installed.

Automatic installation script

You can use the automatic installation script in the following scenarios:
  • If you did not select Auto-install GPU Driver, AIACC-Training, or AIACC-Inference in the Basic Configurations step, you can manually enter the automatic installation script in the User Data field in the System Configurations step.
  • If you call the RunInstances operation to create an instance, you can upload the automatic installation script only by setting the UserData parameter.
The latest version of the automatic installation script is v3.3. The latest version has the following benefits:
  • Provides the latest versions of the GPU driver, CUDA library, and cuDNN library.
  • Shows the installation progress after the instance is connected.
The automatic installation script contains the following content:
#!/bin/sh

#Please input version to install
IS_INSTALL_AIACC_TRAIN=""
IS_INSTALL_AIACC_INFERENCE=""
DRIVER_VERSION=""
CUDA_VERSION=""
CUDNN_VERSION=""
IS_INSTALL_RAPIDS="FALSE"

INSTALL_DIR="/root/auto_install"

#using .deb to install driver and cuda on ubuntu OS
#using .run to install driver and cuda on ubuntu OS
auto_install_script="auto_install_v3.3.sh"

script_download_url=$(curl http://100.100.100.200/latest/meta-data/source-address | head -1)"/opsx/ecs/linux/binary/script/${auto_install_script}"
echo $script_download_url

mkdir $INSTALL_DIR && cd $INSTALL_DIR
wget -t 10 --timeout=10 $script_download_url && sh ${INSTALL_DIR}/${auto_install_script} $DRIVER_VERSION $CUDA_VERSION $CUDNN_VERSION $IS_INSTALL_AIACC_TRAIN $IS_INSTALL_AIACC_INFERENCE $IS_INSTALL_RAPIDS
Note The automatic installation script uses the .run installation package to install modules, including the GPU driver.
When you use the automatic installation script, you must set parameters to specify the versions of the GPU driver, CUDA library, and cuDNN library and specify whether to install AIACC-Training and AIACC-Inference.
  • If you want to install AIACC-Training, set IS_INSTALL_AIACC_TRAIN to TRUE. Otherwise, set IS_INSTALL_AIACC_TRAIN to FALSE.
  • If you want to install AIACC-Inference, set IS_INSTALL_AIACC_INFERENCE to TRUE. Otherwise, set IS_INSTALL_AIACC_INFERENCE to FALSE.
Example:
IS_INSTALL_AIACC_TRAIN="FALSE"
IS_INSTALL_AIACC_INFERENCE="FALSE"
DRIVER_VERSION="440.64.00"
CUDA_VERSION="10.2.89"
CUDNN_VERSION="8.0.4"