All Products
Search
Document Center

Elastic High Performance Computing:SubmitJob

Last Updated:Apr 09, 2024

Submits a job in a cluster.

Operation description

Description

Before you submit a job in a cluster, you must upload a job file to the cluster, for example, job.sh. For more information, see CreateJobFile .

Debugging

OpenAPI Explorer automatically calculates the signature value. For your convenience, we recommend that you call this operation in OpenAPI Explorer.

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

  • Operation: the value that you can use in the Action element to specify the operation on a resource.
  • Access level: the access level of each operation. The levels are read, write, and list.
  • Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
    • The required resource types are displayed in bold characters.
    • If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
  • Condition Key: the condition key that is defined by the cloud service.
  • Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
OperationAccess levelResource typeCondition keyAssociated operation
ehpc:SubmitJobWRITE
  • All Resources
    *
    none
none

Request parameters

ParameterTypeRequiredDescriptionExample
ClusterIdstringYes

The cluster ID.

You can call the ListClusters operation to query the cluster ID.

ehpc-hz-FYUr32****
CommandLinestringYes

The command that is used to run the job.

./LammpsTest/lammps.pbs
RunasUserstringYes

The name of the user that runs the job.

You can call the ListUsers operation to query the users of the cluster.

testuser
RunasUserPasswordstringNo

The password that corresponds to the username.

12****
NamestringNo

The name of the job. The name must be 6 to 30 characters in length and start with a letter. It can contain letters, digits, and periods (.).

job1
PriorityintegerNo

The priority of the job. Valid values: 0 to 9. A larger value indicates a higher priority.

Default value: 0.

0
PackagePathstringNo

The path that is used to run the job.

./Tem
StdoutRedirectPathstringNo

The output file path of stdout.

./LammpsTest
StderrRedirectPathstringNo

The output file path of stderr.

./LammpsTest
ReRunablebooleanNo

Specifies whether the job can be rerun. Valid values:

  • true: The job can be rerun.
  • false: The job cannot be rerun.
false
ArrayRequeststringNo

The job array.

Format: X-Y:Z. The minimum index value X is the first index. The maximum index value Y is the last index. Z is the step size. For example, 2-7:2 indicates that three jobs need to be run and their index values are 2, 4, and 6.

1-10:2
VariablesstringNo

The runtime variables passed to the job. They can be accessed by using environment variables in the executable file.

[{Name:test1,Value:value1},{Name:test2,Value:value2}]
InputFileUrlstringNo

The URL of the job file that is uploaded to an Object Storage Service (OSS) bucket.

https://ehpc-hangzhou.oss-cn-hangzhou.aliyuncs.com/test-u4****/testlist_ehpc.sh
UnzipCmdstringNo

The command for file decompression. The command that is used to decompress the job files downloaded from an OSS bucket. Valid values:

  • tar xzf: Decompresses GZIP files.
  • tar xf: Decompresses TAR files.
  • unzip: Decompresses ZIP files.
tar xzf
PostCmdLinestringNo

The command to perform on the job after the job is submitted.

example.sh
ContainerIdstringNo

The ID of the containerized application. If you want to use a container application, you must specify its ID.

You can call the ListContainerApps operation to query the container application ID.

ehpc-container-uerfrfffff****
JobQueuestringNo

The name of the queue in which the job is run.

You can call the ListQueues operation to query the name of the queue.

workq
NodeintegerNo

The number of compute nodes required to run the job.

Note If the parameter is not specified, the Cpu, Task, Thread, Mem, and Gpu parameters become invalid.
2
CpuintegerNo

The number of CPU cores required by a single compute node.

2
TaskintegerNo

The number of processes created for a single compute node.

The parameter is applicable to Message Passing Interface (MPI) jobs.

2
ThreadintegerNo

The number of threads created for a single compute node.

The parameter is applicable to OpenMP jobs.

1
MemstringNo

The maximum memory usage required by a single compute node. Unit: GB, MB, or KB. The unit is case-insensitive.

1GB
GpuintegerNo

The maximum GPU usage required by a single compute node.

The parameter takes effect only when the cluster uses PBS and a compute node is a GPU-accelerated instance.

1
ClockTimestringNo

The maximum running time of the job. Valid formats:

  • hh:mm:ss
  • mm:ss
  • ss

We recommend that you use the hh:mm:ss format. If the maximum running time is 12 hours, set the value to 12:00:00.

12:00:00
JobRetry.CountintegerNo

The number of retries for the job. Valid values: 1 to 10. You can only retry jobs that are run on the PBS clusters.

Note If this parameter is left empty, the JobRetry.Priority and JobRetry.OnExitCode parameters do not take effect.
5
JobRetry.PriorityintegerNo

The priority of the job retry. Valid values: 0 to 9. A larger value indicates a higher priority.

Note If this parameter is left empty, the priority of the job retry is min {Priority of the original job +1, 9}.
1
JobRetry.OnExitCodeintegerNo

The retry condition of the job. If the exit code is the value of the parameter, the job retry is triggered.

Note If this parameter is left empty, the job retry is triggered when the exit code is not 0.
1
AsyncbooleanNo

Specifies whether to use an asynchronous link to submit the job.

Default value: false.

false

Response parameters

ParameterTypeDescriptionExample
object
JobIdstring

The ID of the job.

1.manager
RequestIdstring

The ID of the request.

04F0F334-1335-436C-A1D7-6C044FE7****

Examples

Sample success responses

JSONformat

{
  "JobId": "1.manager",
  "RequestId": "04F0F334-1335-436C-A1D7-6C044FE7****"
}

Error codes

HTTP status codeError codeError messageDescription
400InvalidParamsThe specified parameter %s is invalid.The specified parameter %s is invalid.
400NotEnabledYou have not enabled this serviceYou have not enabled this service
400InDebtYour account has overdue payments.Your account has overdue payments.
403InvalidClusterStatusThe operation failed due to invalid cluster status.The cluster status does not support the operation.
403ConflictOptA conflicting operation is running.A conflicting operation is running. Please try again later.
403UsernameExistThe username already exists.The username already exists.
403IncorrectCredentialThe username or password is incorrect.The username or password is incorrect.
403AgentError.Account.ValidateCredentialFailureUsername or password verification failed.Username or password verification failed.
404ClusterNotFoundThe specified cluster does not exist.The specified instance does not exist.
404ContainerNotFoundThe specified container does not exist.The specified container does not exist.
404ManagerNotFoundThe manager nodes do not exist or their status is abnormal.The manager nodes do not exist or their status is abnormal.
406AgentError.Job.SubmitFailureFailed to submit jobs: %sFailed to submit the jobs.
406AgentError.Job.InvalidContainerTypeUnsupported container type: %s.The container type is not supported: %s.
406AliyunErrorAn Alibaba Cloud product error occurred.An Alibaba Cloud product error occurred.
406AgentError.Account.AccountValidateCredentialFailureCannot get user info-
406AgentResponseTimeoutAgent response timeout: %s-
406AgentErrorThe agent service request failed: %sThe agent request failed.
407NotAuthorizedYou are not authorized by RAM for this request.The request is not authorized by RAM.
409PartFailurePart of the batch operation failed.Part of the batch operation failed.
500UnknownErrorAn unknown error occurred.An unknown error occurred.
503ServiceUnavailableThe request has failed due to a temporary failure of the serverThe request has failed due to a temporary failure of the server.

For a list of error codes, visit the Service error codes.

Change history

Change timeSummary of changesOperation
2023-03-07The Error code has changed. The request parameters of the API has changedsee changesets
Change itemChange content
Error CodesThe Error code has changed.
    delete Error Codes: 400
    delete Error Codes: 403
    delete Error Codes: 404
    delete Error Codes: 406
    delete Error Codes: 407
    delete Error Codes: 409
    delete Error Codes: 500
    delete Error Codes: 503
Input ParametersThe request parameters of the API has changed.
    Added Input Parameters: JobRetry.Count
    Added Input Parameters: JobRetry.Priority
    Added Input Parameters: JobRetry.OnExitCode
2022-11-15The Error code has changedsee changesets
Change itemChange content
Error CodesThe Error code has changed.
    delete Error Codes: 400
    delete Error Codes: 403
    delete Error Codes: 404
    delete Error Codes: 406
    delete Error Codes: 407
    delete Error Codes: 409
    delete Error Codes: 500
    delete Error Codes: 503