This topic describes how to create a Linux GPU-accelerated compute-optimized instance configured with a graphics processing unit (GPU) driver that supports automatic installation. We recommend that you use this type of instance in scenarios that do not require graphics computing, such as deep learning and AI.

Background information

Elastic GPU Service allows the system to install GPU drivers only when you create specific instances and use public images of the Linux OS. The instances must be of GPU-accelerated compute-optimized family types such as the GPU-accelerated compute-optimized instance family and the GPU-accelerated compute optimized ECS Bare Metal Instance family. Elastic GPU Service does not allow the system to install GPU drivers in the following scenarios:

The driver installation methods may vary based on individual use cases or the types of drivers that you want to install. For more information, see Install NVIDIA drivers.

Preparations

  1. Create an Alibaba Cloud account and complete the account information.
  2. Go to the Custom Launch tab in the ECS console.

Procedure

Step 1: Configure the parameters in the Basic Configurations step

In the Basic Configurations step, you can configure basic parameters for purchasing an instance, such as the billing method, region, and zone, and basic resources required by the instance, such as the instance type, image, and storage size. After you configure the parameters in the Basic Configurations step, click Next.

  1. Select a billing method.
    The billing method for an instance determines how the billing and charging rules are applied to the instance. The billing method also determines how the status of the resources that are deployed on the instance is changed at different points of the resource lifecycle.
    Billing method Description References
    Subscription You pay for resources before you use the resources. Subscription
    Pay-As-You-Go You pay for resources after you use the resources. The billing cycles of pay-as-you-go instances are accurate to the second. You can purchase and release instances based on your business requirements.
    Note We recommend that you use this billing method with savings plans to reduce costs.
    Preemptible Instance You pay for resources after you use the resources. The price of a preemptible instance is lower than that of a pay-as-you-go instance. However, the system may release preemptible instances due to fluctuations in the market price or insufficient resources of instance types. Preemptible instances
  2. Select a region and a zone.
    Select a region that is close to your geographical location to reduce latency. After an instance is created, the region and the zone of the instance cannot be changed. For more information, see Regions and zones.
  3. In the Instance Type section, specify parameters and select an instance type.
    1. Set Architecture to Heterogeneous Computing and set Category to Compute Optimized Type with GPU. You can also set Architecture to ECS Bare Metal Instance and set Category to GPU Type. Then, select an instance type.
      Note
      • The available instance types vary based on the selected region. To view the instance types that are available in each region, go to the ECS Instance Types Available for Each Region page.
      • If you have specific configuration requirements for the instance, for example, you want to bind multiple elastic network interfaces (ENIs), or use enhanced SSDs (ESSDs) or local disks for the instance, make sure that the instance type that you select meets the requirements. For information about the features, scenarios, and specifications of each instance type, see Instance families.
      • If you want to create instances that are used for specific scenarios, click the Scenario-based Selection tab to view the instance types that are recommended for different scenarios. For example, you can set Business Scenario to AI Machine Learning to view the GPU-accelerated instance types that are available for AI machine learning scenarios.
    2. Confirm the selected instance type next to Selected Instance Type.
    3. If you set Billing Method to Preemptible Instance, configure the Use Duration and Maximum Price for Instance Type parameters.
      The use duration is the protection period of a preemptible instance. After the protection period ends, the instance may be released due to insufficient resources or lower bids than the market price. The following table describes the valid values of the Use Duration parameter.
      Value Description
      One Hour After the preemptible instance is created, it enters a 1-hour protection period during which it cannot be automatically released.
      None The preemptible instance is created without a protection period. Preemptible instances without a protection period are less expensive than preemptible instances with a protection period.
      The following table describes the valid values of the Maximum Price for Instance Type parameter.
      Value Description
      Use Automatic Bid Uses the real-time market price of the instance type. The price never exceeds the price of the corresponding pay-as-you-go instance. Automatic bidding can prevent the preemptible instance from being released due to lower bids than the market price, but cannot prevent the instance from being released due to insufficient resources.
      Set Maximum Price Sets a maximum price. If the real-time market price exceeds the maximum price or if resources are insufficient, the preemptible instance is released.
    4. Specify the number of instances to create.
      You can create a maximum of 100 instances each time by using the wizard. In addition, the number of instances within your account cannot exceed the quota. The quota displayed on the buy page prevails. For more information, see View and increase instance quotas.
  4. Select an image.
    1. In the Image section, click Public Image, and select the Linux OS and version that you want to use.
    2. Select Auto-install GPU Driver, and determine whether to select AIACC-Training and AIACC-Inference based on your business requirements. Then, select the versions for the NVIDIA Compute Unified Device Architecture (CUDA) library, GPU driver, and NVIDIA CUDA Deep Neural Network (cuDNN) library that you want to use.
      The following information describes GPU drivers, AIACC-Training, and AIACC-Inference:
      • GPU drivers are used to drive physical GPUs and can work in an efficient manner when used with the CUDA and cuDNN libraries. If you select Auto-install GPU Driver, a GPU driver, a CUDA library, and a cuDNN library are installed at the same time. When you use images to create instances, only Linux public images of specific versions allow the system to install the GPU drivers. The following table lists the available versions.
        Note For a new business system, we recommend that you select the GPU driver, CUDA library, and cuDNN library of the latest versions.
        CUDA library version GPU driver version cuDNN library version Available Alibaba Cloud public image version Available instance family
        11.2.2 460.91.03 8.1.1
        • Alibaba Cloud Linux 2 and Alibaba Cloud Linux 3
        • Ubuntu 20.04, Ubuntu 18.04, and Ubuntu 16.04
        • CentOS 8.x and CentOS 7.x
        • gn7, gn7i, gn6v, gn6i, gn6e, gn5, and gn5i
        • ebmgn7, ebmgn7i, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
        11.0.2 460.91.03
        • 8.1.1
        • 8.0.4
        • Alibaba Cloud Linux 2
        • Ubuntu 20.04, Ubuntu 18.04, and Ubuntu 16.04
        • CentOS 8.x and CentOS 7.x
        • gn7, gn6v, gn6i, gn6e, gn5, and gn5i
        • ebmgn7, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
        10.2.89 460.91.03
        • 8.1.1
        • 8.0.4
        • 7.6.5
        • Alibaba Cloud Linux 2
        • Ubuntu 18.04 and Ubuntu 16.04
        • CentOS 8.× and CentOS 7.x
        • gn6v, gn6i, gn6e, gn5, and gn5i
        • ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
        10.1.168
        • 450.80.02
        • 440.64.00
        • 8.0.4
        • 7.6.5
        • 7.5.0
        • Ubuntu 18.04 and Ubuntu 16.04
        • Centos 7.x
        • gn6v, gn6i, gn6e, gn5, and gn5i
        • ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
        10.0.130
        • 450.80.02
        • 440.64.00
        • 7.6.5
        • 7.5.0
        • 7.4.2
        • 7.3.1
        • Ubuntu 18.04 and Ubuntu 16.04
        • Centos 7.x
        • gn6v, gn6i, gn6e, gn5, and gn5i
        • ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
        9.2.148
        • 450.80.02
        • 440.64.00
        • 390.116
        • 7.6.5
        • 7.5.0
        • 7.4.2
        • 7.3.1
        • 7.1.4
        • Ubuntu 16.04
        • Centos 7.x
        • gn6v, gn6e, gn5, and gn5i
        • ebmgn6v, ebmgn6e, and ebmgn5i
        9.0.176
        • 450.80.02
        • 440.64.00
        • 390.116
        • 7.6.5
        • 7.5.0
        • 7.4.2
        • 7.3.1
        • 7.1.4
        • 7.0.5
        • Ubuntu 16.04
        • Centos 7.x
        • SUSE 12sp2
        • gn6v, gn6e, gn5, and gn5i
        • ebmgn6v, ebmgn6e, and ebmgn5i
        8.0.61
        • 450.80.02
        • 440.64.00
        • 390.116
        • 7.1.3
        • 7.0.5
        • Ubuntu 16.04
        • Centos 7.x
        • gn5 and gn5i
        • ebmgn5i
        Note If you want to change the OS of an instance after the instance is created, make sure that the image that you use allows the system to install the GPU driver.
      • AIACC-Training is an AI accelerator that is developed by Alibaba Cloud. AIACC-Training can accelerate mainstream AI computing frameworks, such as TensorFlow, PyTorch, MxNet, and Caffe, to significantly improve training performance. For more information, see AIACC-Training.
        Note AIACC-Training is not supported in CentOS 8, CentOS 6, SUSE Linux, and Alibaba Cloud Linux.
      • AIACC-Inference is an AI accelerator that is developed by Alibaba Cloud. AIACC-Inference can accelerate the mainstream AI computing framework TensorFlow and the frameworks that can be converted in the Open Neural Network Exchange (ONNX) format to significantly improve inference performance. For more information, see AIACC-Inference.
        Note AIACC-Inference is not supported in CentOS 8, CentOS 6, SUSE Linux, and Alibaba Cloud Linux.
  5. Complete the storage and related settings.
    Instances obtain storage capabilities by attaching system disks, data disks, and Apsara File Storage NAS file systems. ECS provides cloud and local disks to meet the requirements of different scenarios.
    Cloud disks can be used as system disks or data disks and include ESSDs, standard SSDs, and ultra disks. For more information, see Disks.
    Note The billing method of a cloud disk that is created along with an instance is the same as that of the instance.
    Local disks can be used only as data disks. If an instance family (such as instance family with local SSDs and big data instance family) is equipped with local disks, the information of the local disks is displayed. For more information, see Local disks.
    Note Local disks cannot be attached to instances on your own.
    1. Select a system disk.
      System disks are used to install operating systems. The default capacity of a system disk is 40 GiB. However, the actual minimum capacity is related to the image. The following table describes the capacities of system disks for different images.
      Image System disk capacity (GiB)
      Linux (excluding CoreOS and Red Hat) [max{20, Image size}, 500]
      FreeBSD [max {30, Image size}, 500]
      CoreOS [max {30, Image size}, 500]
      Red Hat [max {40, Image size}, 500]
      Windows [max {40, Image size}, 500]
    2. Optional:Select a data disk.
      You can create an empty disk or create a disk from a snapshot. A snapshot is a point-in-time backup of a disk. You can import data in a quick manner by creating a disk from a snapshot. When you choose a data disk, you can encrypt the disk to meet the requirements of scenarios such as data security and regulatory compliance. For more information about data encryption, see Encryption overview.
      Note The number of data disks that can be attached to a single instance is limited. For more information, see the "Elastic Block Storage (EBS) limits" section in Limits.
    3. Optional:Select a NAS file system.
      If you have a large amount of data for sharing by multiple instances, we recommend that you use a NAS file system to reduce costs in data transmission and synchronization.

      Select an existing NAS file system or click Create a file system to go the NAS console to create a NAS file system. For more information about how to create a NAS file system, see Create a General-purpose NAS file system in the NAS console. After a NAS file system is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent NAS file system list. For more information about how to mount NAS file systems, see Mount NAS file systems when you purchase an ECS instance.

  6. Optional:Configure the snapshot service.
    You can use automatic snapshot policies to periodically back up disks to prevent risk such as accidental data deletion.

    Select an existing snapshot policy or click Create Automatic Snapshot Policy to create an automatic snapshot policy on the Snapshots page. For more information about how to create an automatic snapshot policy, see Create an automatic snapshot policy. After an automatic snapshot policy is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent automatic snapshot policy list.

Step 2: Configure the parameters in the Networking step

In the Networking step, you can configure parameters to allow instances to access the Internet and other Alibaba Cloud resources. This ensures the security of your instances. After you configure the parameters in the Networking step, click Next.

  1. Specify parameters in the Network Type and Public IP Address sections.
    Parameter Description References
    Network Type Select VPC.

    A virtual private cloud (VPC) is a logically isolated virtual network in Alibaba Cloud. You have full control over VPCs that belong to you. For example, you can specify a CIDR block and configure route tables and gateways for the VPC.

    If you do not want to use a custom VPC or vSwitch in the specified region when you create an instance, you can skip this operation. Then, the system creates a default VPC and a default vSwitch.
    Note You can skip this operation only if no available VPCs exist in the region where the instance is deployed.

    Select an existing VPC and vSwitch. You can also click go to the VPC console to create a VPC and a vSwitch in the VPC console. After the VPC and the vSwitch are created, go back to the ECS instance creation wizard and click the refresh icon to view the VPC and the vSwitch that you created.

    Public IP Address If you select an image of Windows 2008 R2 or earlier in the Basic Configurations step, you can select Assign Public IPv4 Address, or you can associate an elastic IP address (EIP) with the instance after the instance is created. This way, you can connect to the instance over other protocols such as the Remote Desktop Protocol (RDP) built into Windows, PC over IP (PCoIP), and XenDesktop HDX 3D. Otherwise, you cannot connect to the instance from a Virtual Network Console (VNC) client after the GPU driver is installed. A persistent black screen or startup interface appears when you attempt to connect to the instance.
    Note RDP does not support some applications such as DirectX and OpenGL applications. If you want to use these applications, you must manually install the VNC service and client.
    To assign a public IP address, perform the following operations:
    1. Select Assign Public IPv4 Address.
    2. Specify the Bandwidth Billing parameter.
      • Pay-By-Bandwidth: You are charged based on the specified bandwidth. This billing method is suitable for the scenarios that require stable network bandwidth.
      • Pay-By-Traffic: You are charged based on the traffic that you use. You can configure a peak bandwidth value to avoid excessive fees due to sudden traffic spikes. This billing method is suitable for scenarios that require highly variable bandwidth, such as the scenarios where traffic is low in most cases but spikes occasionally occur.
    3. Set Bandwidth or Peak Bandwidth based on your requirements.
    What is an EIP?
  2. Select a security group.
    A security group is a virtual firewall that is used to control the inbound and outbound traffic of instances in the security group. For more information, see Overview.

    If you do not configure parameters in a security group when you create an instance, you can skip the step. The system creates a default security group. The default security group allows inbound traffic over Secure Shell Protocol (SSH) port 22, Remote Desktop Protocol (RDP) port 3389, and Internet Control Message Protocol (ICMP). You can modify the security group configurations after the security group is created.

    1. To create a security group, click create a security group.
      For more information about how to configure the security group, see Create a security group.
    2. Click Reselect Security Group.
    3. In the Select Security Group dialog box, select one or more security groups and click Select.
  3. Configure an ENI
    ENIs include primary ENIs and secondary ENIs. Primary ENIs cannot be unbound from instances. They can be created and released only along with instances. Secondary ENIs can be bound to or unbound from instances to allow traffic to be switched between instances. To create a secondary ENI when you create an instance, click the add-nic icon and select a vSwitch to which the secondary ENI belongs.
    Note You can bind only one secondary ENI when you create an instance. You can also create secondary ENIs and bind them to an instance after the instance is created. For more information about the number of ENIs that can be bound to an instance of each instance type, see Instance families.

Step 3: Configure the parameters in the System Configurations step

In the System Configurations step, you can configure parameters to customize what GPU-accelerated instance information to display in the ECS console and in the OS or how to use the instance. For example, you can configure the Logon Credentials, Host, and User Data parameters. After you configure the parameters in the System Configurations step, click Next.

  1. Configure logon credentials.

    We recommend that you select Key Pair or Password as Logon Credentials. If you select Set Later, you must bind an Secure Shell (SSH) key pair or reset the password before you connect to the instance from a Virtual Network Console (VNC) client. Then, you must restart the instance for the logon credential to take effect. If you restart the instance when the GPU driver is being installed, the GPU driver cannot be installed.

  2. Specify the instance name that you want to display in the ECS console and the hostname that can be obtained from within the operating system.
    If you want to create multiple instances, you can set sequential instance names and hostnames to facilitate management. For more information about how to configure sequential instance names and hostnames, see Batch configure sequential names or hostnames for multiple instances.
  3. Configure advanced options.
    1. Select an instance Resource Access Management (RAM) role.
      Instance RAM roles enable ECS instances to assume roles with specific access permissions. The instance can access the APIs of specific Alibaba Cloud services and manage specific Alibaba Cloud resources based on a Security Token Service (STS) temporary credential. This ensures security.

      Select an existing instance RAM role or click Create Instance RAM Role to go to the RAM console to create an instance RAM role. After an instance RAM role is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent instance RAM role list. For more information, see Bind an instance RAM role.

    2. Select an instance metadata access mode.
      ECS instance metadata contains the information of instances in Alibaba Cloud. You can view the metadata of running instances and configure or manage the instances based on their metadata. You can view instance metadata in normal or security hardening mode. For more information, see View instance metadata.
      Instance metadata access mode Description
      Normal Mode (Compatible with Security Hardening Mode) After the instance is created, you can view the instance metadata in normal mode or in security hardening mode.
      Security Hardening Mode After the instance is created, you can view the instance metadata only in security hardening mode.
      Warning If you select Security Hardening Mode, the components of cloud-init may fail to be initialized, which affects the configurations of instances, such as metadata and user data. Proceed with caution.
    3. Configure user data.
      User data can run as a script during instance startup to automate instance configurations, or can be imported to the instance as regular data. For more information, see Manage the user data of Linux instances and Manage the user data of Windows instances.
      In the lower part of the Advanced section, an automatic installation script appears if you have selected Auto-install GPU Driver, AIACC-Training, and AIACC-Inference in the Basic Configurations step. The first time the instance is started after the instance is created, cloud-init runs the automatic installation script. autoinstall-script
      Note You can also customize an automatic installation script and import the script to install a GPU driver, AIACC-Training, and AIACC-Inference. For more information, see Customize an automatic installation script.

Step 4: (Optional) Complete the settings in the Grouping (Optional) step

Grouping configurations provide ways such as tags and resource groups to batch manage instances. After you complete the settings in the Grouping (Optional) step, click Next.

  1. Add tags.
    Each tag consists of a tag key and a tag value. You can add tags to resources that have identical characteristics, such as resources that belong to the same organization and resources that serve the same purpose. You can use tags to search for and manage resources in an efficient manner. For more information, see Overview.

    Select an existing tag, or enter a key and a value to create a tag.

  2. Select a resource group.
    Resource groups allow you to manage cross-region and cross-service resources based on your business requirements and manage the permissions of resource groups. For more information, see Resource groups.

    Select an existing resource group, or click click here to create a resource group on the Resource Group page. After a resource group is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent resource group list. For more information, see Create a resource group.

  3. Select a deployment set.
    Deployment sets support the high-availability policy. After you apply a high-availability policy to a deployment set, all the instances within the deployment set are distributed across different physical servers to ensure business availability and disaster recovery at the underlying layer.

    Select an existing deployment set or click manage the deployment set to create a deployment set. After a deployment set is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent deployment set list. For more information, see Create a deployment set.

  4. Select a dedicated host.
    A dedicated host is a cloud host whose physical resources are exclusively reserved for a single tenant. Dedicated hosts meet strict security compliance requirements and support bring your own license (BYOL) when you migrate services to Alibaba Cloud.

    Select an existing dedicated host or click create a DDH to create a dedicated host. After the dedicated host is created, go back to the ECS instance creation wizard and click the refresh icon to query the most recent dedicated host list. For more information, see Create a dedicated host.

  5. Select a private pool.
    After an elasticity assurance or a capacity reservation is created, the system generates a private pool to reserve resources for a specific number of instances that have specific attributes. During the validity period of the elasticity assurance or capacity reservation, you always have access to the resources reserved in the private pool when you want to create instances. For more information, see Overview.
    Note Only pay-as-you-go instances can be created from the resources reserved by elasticity assurances or capacity reservations.
    Private pool Description
    Open The capacity in open private pools has a priority over public pool resources. If no capacity is available in private pools, the system attempts to use public pool resources.
    None The capacity in private pools is not used.
    Targeted The capacity in a specified or open private pool is used to create instances. If no capacity is available in the specified private pool, the instances cannot be created.

Step 5: Configure the parameters in the Preview step

Before you confirm the operations to create the instance, check the configured parameters such as usage duration to ensure that the parameters meet your business requirements.

  1. Check the configurations.
    To modify the configurations for a step, click the edit icon to go to the related step. You can save the configurations as a template and create instances that have similar configurations based on the template. The following table describes the buttons that you can use to save the configurations as a template.
    Button Description References
    Save as Launch Template Saves the configurations as a launch template. Then, you can use the launch template to create instances without the need to perform repeated configuration operations. Create an instance by using a launch template
    View Open API Generates the API workflow and the SDK sample code for your reference.
    Save as ROS Template Saves the configurations as a Resource Orchestration Service (ROS) template. Then, you can use the template to create stacks in the ROS console to deliver resources in a simplified manner. Create a stack
  2. Configure the usage duration for the instance.
    • Pay-as-you-go instance: Set an automatic release time for the instance. You can also release the instance after the instance is created or set an automatic release time for the instance when you create the instance. For more information, see Release an instance.
    • Subscription instance: Set the usage duration and set whether to enable auto-renewal. You can also renew the instance after the instance is created or enable auto-renewal for the instance when you create the instance. For more information, see Renewal overview.
  3. Read the ECS Terms of Service and Product Terms of Service. If you agree to them, select ECS Terms of Service and Product Terms of Service.
  4. In the lower part of the page, view the total fees of the instance, confirm the order, and then complete the payment.
    If you select Auto-install GPU Driver, the system installs the GPU driver, CUDA library, and cuDNN library of the selected versions after the instance is created. The automatic installation process takes 10 to 20 minutes based on the internal bandwidth and the number of vCPU cores of different instance types. You can establish a remote connection to the instance to view the installation process. You can also view the auto_install.log file in the /root/auto_install directory after the GPU driver, CUDA library, and cuDNN library are installed. The following table describes the displayed information of the installation status.
    Installation status Displayed information
    In progress The installation progress bar appears.
    Successful The installation result ALL INSTALL OK appears.
    Failed The installation result INSTALL FAIL appears.
    Notice When the GPU driver, CUDA library, and cuDDN library are being installed, you cannot use the GPU-accelerated instance, perform operations on the instance, or install other GPU-related software. Otherwise, the installation may fail and the instance may become unavailable.

Customize an automatic installation script

Alibaba Cloud supports automatic installation scripts in the following scenarios:
  • You do not select Auto-install GPU Driver, AIACC-Training, and AIACC-Inference in the Basic Configurations step. You want to configure an automatic installation script in the System Configurations step.
  • You call the RunInstances operation to create a GPU-accelerated instance. As a result, you must upload an automatic installation script by specifying the UserData parameter.

To set an automatic installation script and use the script to install a GPU driver when the instance is being created, perform the following operations:

  1. Customize an automatic installation script.
    The automatic installation script contains the following content:
    #!/bin/sh
    
    #Please input version to install
    IS_INSTALL_AIACC_TRAIN=""
    IS_INSTALL_AIACC_INFERENCE=""
    DRIVER_VERSION=""
    CUDA_VERSION=""
    CUDNN_VERSION=""
    IS_INSTALL_RAPIDS="FALSE"
    
    INSTALL_DIR="/root/auto_install"
    
    #using .deb to install driver and cuda on ubuntu OS
    #using .run to install driver and cuda on ubuntu OS
    auto_install_script="auto_install.sh"
    
    script_download_url=$(curl http://100.100.100.200/latest/meta-data/source-address | head -1)"/opsx/ecs/linux/binary/script/${auto_install_script}"
    echo $script_download_url
    
    mkdir $INSTALL_DIR && cd $INSTALL_DIR
    wget -t 10 --timeout=10 $script_download_url && sh ${INSTALL_DIR}/${auto_install_script} $DRIVER_VERSION $CUDA_VERSION $CUDNN_VERSION $IS_INSTALL_AIACC_TRAIN $IS_INSTALL_AIACC_INFERENCE $IS_INSTALL_RAPIDS
    Note The automatic installation script uses the .run installation package to install modules, including the GPU driver.
    You need to add the following parameters to the script based on your business requirements.
    • Specify the versions of the GPU driver, CUDA library, and cuDNN library based on the selected instance type and image version. For more information about available versions, see Available image versions and instance families. The following sample code provides an example on how to specify the versions:
      DRIVER_VERSION="460.91.03"
      CUDA_VERSION="11.2.2"
      CUDNN_VERSION="8.1.1"
    • Specify whether to install AIACC-Training and AIACC-Inference.
      • If you want to install AIACC-Training, set IS_INSTALL_AIACC_TRAIN to TRUE. If you do not want to install AIACC-Training, set IS_INSTALL_AIACC_TRAIN to FALSE.
      • If you want to install AIACC-Inference, set IS_INSTALL_AIACC_INFERENCE to TRUE. If you do not want to install AIACC-Inference, set IS_INSTALL_AIACC_INFERENCE to FALSE.
      The following sample code provides an example on how to set the values:
      IS_INSTALL_AIACC_TRAIN="TRUE"
      IS_INSTALL_AIACC_INFERENCE="FALSE"
  2. In the System Configurations step, enter the script in the field below User Data in the Advanced section.

    After the instance is started, the system installs the GPU driver, CUDA library, and cuDNN library. The system also determines whether to install AIACC-Training and AIACC-Inference based on the script that you have customized. After the GPU driver is installed, the system restarts the instance for the GPU driver to run.

    Note The GPU driver in persistence mode is comparatively stable. The automatic installation script enables the persistence mode for the GPU driver. The script also enables the persistence mode in the Linux OS to ensure that the system enables this mode for the GPU driver during instance startup.