All Products
Search
Document Center

Elastic High Performance Computing:Submit a job in a serverless cluster

Last Updated:Oct 09, 2023

After you create a serverless cluster, you can submit jobs in the cluster. The system automatically creates elastic container instances to run the jobs. This topic describes how to submit a job in a serverless cluster.

Background information

After you submit a job in a serverless cluster, the system automatically creates elastic container instances to run the job. An elastic container instance requires the following configurations:

  • Specification: You can specify the vCPU and memory for elastic container instances. You can also specify ECS instances to meet additional requirements such as GPU or enhanced network capabilities. The billing methods vary based on your specifications. For more information, see Elastic container instances.

  • Container:: An elastic container instance in a serverless cluster comes with one container. Before you deploy the container, you must package the required environment and data into a container image and upload the image to Alibaba Cloud Container Registry (ACR).

  • Network: An elastic container instance occupies one Elastic Network Interface (ENI) of the vSwitch in the VPC to which the elastic container instance belongs. By default, the elastic container instance has an internal IP address. If you want the elastic container instance to connect to the Internet, you must associate a NAT gateway with the VPC.

  • Storage: By default, 30 GiB of temporary storage is provided free of charge for each elastic container instance. You can add more temporary storage as needed. If you need persistent storage, you can mount a NAS file system or Object Storage Service (OSS) buckets.

Prerequisites

Procedure

  1. Go to the Job page.

    1. Log on to the E-HPC console.

    2. In the upper-left corner of the top navigation bar, select a region.

    3. In the left-side navigation pane, choose Job and Performance Management > Job.

  2. In the upper part of the Job page, select the serverless cluster from the Cluster drop-down list.

  3. Click the Submit Job tab.

  4. Configure the required parameters and click Submit Job.

    When you submit a job, you must configure parameters related to the elastic container instances and the job. The system automatically creates elastic container instances to run the job based on the configurations.

    Note

    If you want to submit a job that has similar parameters, click Export to save the current configurations to an on-premises file. When you submit another job, click Import to import the existing job configurations.

    Parameter

    Description

    Job Name

    The name of the job.

    vSwitch

    The vSwitch with which you want to associate the elastic container instance.

    Image URL

    The URL of the container image that is uploaded to ACR. The URL is used to deploy the container.

    Enable Job Array

    Specifies whether to enable the job array feature of the scheduler.

    The job array feature is used to submit and manage similar jobs in batch. After you enable the job array feature, you must configure the minimum length, maximum length, and step size values of the job array. The minimum length is the first index, the maximum value is the last index, and the step size value is the interval. The default value is 1. If the minimum length is 2, the maximum value is 7, and the step size value is 2, the generated job array contains three sub-jobs that are numbered 2, 4, and 6.

    Priority

    The priority of the job. Valid values: 0 to 9. A larger value indicates a higher priority.

    Temporary Storage

    The temporary storage space added to the container. Unit: GiB.

    By default, each elastic container instance provides 30 GiB of free storage quota. If you need to store data that is greater than 30 GiB, you can use this parameter to increase the temporary storage space. The added storage is billed based on the storage space. For more information, see Temporary storage space.

    Timeout

    The validity period of the job. After the validity period expires, the job is forcibly terminated. Unit: seconds.

    Bid Strategy

    Specifies whether to create an elastic container instance of the preemptible instance type.

    • Not preemptible instances: The instances are created as pay-as-you-go instances. This is the default value.

    • Preemptible instance with a maximum bid price: The instances are created as preemptible instances that have a specified maximum price per hour.

    • Auto bidding until meeting the pay-as-you-go price: The instances are created as preemptible instances for which the market price is automatically used as the bid price. The market price can be as high as the pay-as-you-go price.

    For more information, see Create a preemptible instance.

    CPU

    The vCPUs and memory of the elastic container instances. If you do not specify the parameters, the system creates elastic container instances that have 2 vCPUs and 4 GiB of memory. For more information, see Specify the number of vCPUs and memory size to create an elastic container instance.

    Memory

    GPU

    If you want to create elastic container instances of the GPU type, you must configure the parameter to specify the number of GPUs used in the container.

    Working Path

    The working directory of the container. The system runs commands in the specified directory.

    Instance Type

    Creates elastic container instances by using the specified ECS instance type. For more information, see Specify ECS instance types to create an elastic container instance.

    Startup Command

    The startup command of the container. The command must meet the following requirements:

    • Single command without parameters: Enter the command directly, such as ls.

    • Single command with parameters: Separate the command and parameters with commas (,), for example, ls,l.

    • Multiple commands: You must use the shell commands. Separate commands and parameters with commas (,) and separate commands with semicolons (;). Example: /bin/sh,-c,ls -l;hostname. The commands are run in sequence.

    RAM Role

    The RAM role that is associated with Elastic Container Instance. For more information, see Use an instance RAM role by calling API operations.

    Variables

    The environment variables of the container.

    Mount Volume

    The data volume that is mounted to the container. OSS and NAS are supported.

    • OSS

      • Volume Mount Path: the directory that is mounted to the container.

      • OSS Bucket Name: the name of the OSS bucket. Only OSS buckets can be mounted to elastic container instances. Subdirectories or files in OSS buckets cannot be mounted to elastic container instances.

      • OSS Endpoint: the endpoint of OSS. If the bucket and cluster reside in the same region, specify the internal endpoint. If the bucket and cluster reside in different regions, specify the public endpoint.

      • OSS Path: the OSS directory that you want to mount.

      • RAM Role: the RAM role that is used to grant permissions. When you create a RAM role, set the trusted entity to Alibaba Cloud Service, the role type to Normal Service Role, and the trusted service to ECS. When you grant permissions to the RAM role, select the AliyunOSSFullAccess policy.

    • NAS

      • Volume Mount Path: the directory that is mounted to the container.

      • NAS Mount Target: the mount target of the NAS file system.

      • NAS Path: the NAS directory that you want to mount.

      • Mount options: the mount options. We recommend that you use the default value nolock,tcp,noresvport.

    Job Dependency

    Specifies whether the job depends on other jobs. If you want to add job dependencies, enter the job ID and select a dependency.

What to do next

After you submit a job, you can view the job details and the elastic container instances that the system creates.

  • On the Job page of the E-HPC console, select a cluster and click the Jobs tab. You can query jobs by setting filters, such as job status or time range.

  • On the Container Group page of the Elastic Container Instance console, you can view the elastic container instances that are created for the jobs.