A job is the basic computing unit of Elastic High Performance Computing (E-HPC). A job consists of a shell script and executable files. Before you can perform high-performance computing, you must submit a job in the E-HPC console. Jobs are run in a sequence that is determined by the specified queues and scheduler. In the E-HPC console, you can submit a job, stop a job, or view the status of a job. This topic describes how to use the E-HPC console to submit a job.
The cluster and cluster nodes are in the Running state.
A user is created. For more information, see Create a user.
Job files are ready to be imported. E-HPC allows you to import job files by using one of the following methods:
Before you submit a job, log on to the cluster and import job files by using remote transmission solutions, such as rsync and the secure copy protocol (SCP).
When you submit a job, import the job files stored in an Object Storage Service (OSS) bucket.
When you submit a job, import the job files stored in your local directory or select newly created job files.
Log on to the E-HPC console.
In the top navigation bar, select a region.
In the left-side navigation pane, choose.
On the Job page, select a cluster from the Cluster drop-down list.
Click the Submit Job tab.
On the Submit Job tab, set the parameter as needed.
The configured template based on which a job is submitted. For more information, see Manage a job template.
The name of the job. If you need to automatically download and decompress job files, name the job files after the job.
The job execution command that you want to submit to the scheduler. You can enter a command or the relative path of the script file, for example, /home/test/job.pbs. This parameter is differently set in the following scenarios:
If the script file is executable, enter its relative path, for example,
If the script file is inexecutable, enter the execution command, for example,
/opt/mpi/bin/mpirun /home/test/job.pbs. If your scheduler is PBS, add a
hyphen (-)before the command, for example,
If you added compute nodes to a queue when you created the cluster, submit the job to the queue. Otherwise, the job fails to be run. If you did not add compute nodes to a queue, the job is submitted to the default queue of the scheduler.
Number of Compute Nodes
The number of compute nodes that are used to run the job.
Number of Tasks
The number of tasks used by each compute node to run the job, that is, the number of processes.
The maximum memory that can be used when a compute node runs the job. If you do not specify this parameter, the memory is unlimited.
Maximum Running Time
The maximum running time of the job. If the actual running time exceeds the maximum running time, the job fails. If you do not specify this parameter, the running time is unlimited.
The number of threads that are used by a task. If you do not specify this parameter, the number of threads is 1.
The number of GPUs that are used when a compute node runs the job. If you specify this parameter, make sure that the compute node is a GPU-accelerated instance.
The priority of the job. Valid values: 0 to 9. A greater value indicates a higher priority. If you specified that jobs are scheduled by job priority when you set the cluster scheduling policy, jobs with a higher priority are scheduled and run first.
You can set a high priority for the jobs that you want to run first.
Enable Job Array
Specifies whether to enable the job array feature of the scheduler. A job array is a collection of similar independent jobs. You can set a job array to customize a job execution rule.
Format: X-Y:Z. X is the minimum index value. Y is the maximum index value. Z is the step size. For example, 2-7:2 indicates that three jobs need to be run and their index values are 2, 4, and 6. Default value of Z: 1.
The command that is used to perform subsequent operations on the running results of the job, for example, packaging or uploading of the generated job data to an OSS bucket.
Stdout Redirect Path
The output file path of stderr and stdout redirected by using a Linux shell. The path contains the output file name.
stdout: standard output
stderr: standard error
Cluster users must have the write permissions on the path. By default, output files are generated based on the scheduler settings.
Stderr Redirect Path
The runtime variables passed to the job. They can be accessed by using environment variables in the executable file.
Upload the job files to the cluster.
Use the job files that are stored in an OSS bucket
E-HPC allows you to import jobs files from an OSS bucket before you submit a job. You can also specify job files that are stored in an OSS bucket when you submit a job in the E-HPC console. For more information, see Import job files from an OSS bucket to a cluster. To specify job files that are stored in an OSS bucket when you submit a job in the E-HPC console, perform the following steps:
On the Use OSS Job file tab, click Select File. In the Select File dialog box, select job files and click OK.
If you want to specify ZIP files, TAR files, or GZIP job files, you must turn on Decompression and specify a command to decompress them.Note
After you select job files from an OSS bucket, a folder that has the same name as the job files (for example, JobName) is automatically created in the /home/user directory. Then, the job files are downloaded and decompressed (if necessary) to the /home/user/JobName directory.
Edit job files
Click the Edit Job Files tab.
On the Edit Job Files tab, click Cluster File Browser. In the dialog box that appears, enter the cluster username and password to log on to the cluster by using Workbench. You can create, edit, or delete job files based on your needs.
Click Submit Job in the upper-right corner of the Submit Job tab. In the dialog box that appears, enter the cluster username and password. The job is submitted to the cluster. Then, E-HPC runs the job.
After you submit a job, you can view it on the Job page.
Find the job and click Details in the Actions column. In the Job Details panel, you can view the job details, including the job name, job ID, start time, the time at which the job is last updated, and job running information.