Submits a job in a cluster.
Description
Before you submit a job in a cluster, you must upload a job data file, for example, job.sh, to the cluster. For more information, see CreateJobFile.
Debugging
Request parameters
Parameter | Type | Required | Example | Description |
---|---|---|---|---|
Action | String | Yes | SubmitJob |
The operation that you want to perform. Set the value to SubmitJob. |
ClusterId | String | Yes | ehpc-hz-FYUr32**** |
The ID of the cluster. You can call the ListClusters operation to query the cluster ID. |
CommandLine | String | Yes | ./LammpsTest/lammps.pbs |
The command that is used to run the job. |
RunasUser | String | Yes | root |
The name of the user that runs the job. You can call the ListUsers operation to query the users of the cluster. |
RunasUserPassword | String | Yes | 12**** |
The password of the user. |
Name | String | No | job1 |
The name of the job. The name must be 6 to 30 characters in length and start with a letter. It can contain letters, digits, and periods (.). |
Priority | Integer | No | 0 |
The priority of the job. Valid values: 0 to 9. A large value indicates a high priority. Default value: 0 |
PackagePath | String | No | ./Tem |
The path that is used to run the job. |
StdoutRedirectPath | String | No | ./LammpsTest |
The output file path of stdout. |
StderrRedirectPath | String | No | ./LammpsTest |
The output file path of stderr. |
ReRunable | Boolean | No | false |
Specifies whether the job can be rerun. Valid values:
|
ArrayRequest | String | No | 1-10:2 |
The job array. Format: X-Y:Z. X is the minimum index value. Y is the maximum index value. Z is the step size. For example, 2-7:2 indicates that three jobs need to be run and their index values are 2, 4, and 6. |
Variables | String | No | [{Name:,Value:},{Name:,Value:}] |
The runtime variables passed to the job. They can be accessed by using environment variables in the executable file. |
InputFileUrl | String | No | https://ehpc-hangzhou.oss-cn-hangzhou.aliyuncs.com/test-u4****/testlist_ehpc.sh |
The URL of the job files that are uploaded to an Object Storage Service (OSS) bucket. |
UnzipCmd | String | No | tar xzf |
The command that is used to decompress the job files downloaded from an OSS bucket. Valid values:
|
PostCmdLine | String | No | example.sh |
The command that is used to perform subsequent operations on the job after the job is submitted. |
ContainerId | String | No | ehpc-container-uerfrfffff**** |
The ID of the container application. If you want to use a container application, you must specify its ID. You can call the ListContainerApps operation to query the container application ID. |
JobQueue | String | No | workq |
The name of the queue. You can call the ListQueues operation to query the queue name. |
Node | Integer | No | 2 |
The number of compute nodes required to run the job. Note If the parameter is not specified, the Task, Thread, Mem, and Gpu parameters become
invalid.
|
Task | Integer | No | 2 |
The number of tasks required by a single compute node. |
Thread | Integer | No | 1 |
The number of threads required by a single compute node. |
Mem | String | No | 1GB |
The maximum memory usage required by a single compute node. Unit: GB, MB, or KB. The unit is case-insensitive. |
Gpu | Integer | No | 1 |
The maximum GPU usage required by a single compute node. The parameter takes effect only when the cluster uses PBS and a compute node is a GPU-accelerated instance. |
ClockTime | String | No | 12:00:00 |
The maximum running time of the job. Valid formats:
We recommend that you use the hh:mm:ss format. If the maximum running time is 12 hours, the value is shown as 12:00:00. |
Response parameters
Parameter | Type | Example | Description |
---|---|---|---|
JobId | String | 1.manager |
The ID of the job. |
RequestId | String | 04F0F334-1335-436C-A1D7-6C044FE7**** |
The ID of the request. |
Examples
Sample requests
https://ehpc.cn-hangzhou.aliyuncs.com/?Action=SubmitJob
&ClusterId=ehpc-hz-FYUr32****
&CommandLine=./LammpsTest/lammps.pbs
&RunasUser=root
&RunasUserPassword=12****
&<Common request parameters>
Sample success responses
XML
format
HTTP/1.1 200 OK
Content-Type:application/xml
<SubmitJobResponse>
<RequestId>04F0F334-1335-436C-A1D7-6C044FE7****</RequestId>
<JobId>1.manager</JobId>
</SubmitJobResponse>
JSON
format
HTTP/1.1 200 OK
Content-Type:application/json
{
"RequestId" : "04F0F334-1335-436C-A1D7-6C044FE7****",
"JobId" : "1.manager"
}
Error codes
HttpCode | Error code | Error message | Description |
---|---|---|---|
400 | InvalidParams | The specified parameter %s is invalid. | The error message returned because the following parameter is invalid: %s. |
400 | NotEnabled | You have not enabled this service | The error message returned because the service is not activated for your account. |
400 | InDebt | Your account has overdue payments. | The error message returned because your account has overdue payments. |
403 | InvalidClusterStatus | The operation failed due to invalid cluster status. | The error message returned because the operation is not supported while the cluster is in the current state. |
403 | ConflictOpt | A conflicting operation is running. | The error message returned because an operation that conflicts with the current operation is in progress. Try again later. |
403 | UsernameExist | The username already exists. | The error message returned because the username already exists. |
403 | IncorrectCredential | The username or password is incorrect. | The error message returned because the username or password is invalid. |
403 | AgentError.Account.ValidateCredentialFailure | Username or password verification failed. | The error message returned because the username or password has failed to be verified. |
404 | ClusterNotFound | The specified cluster does not exist. | The error message returned because the specified cluster does not exist. |
404 | ContainerNotFound | The specified container does not exist. | The error message returned because the specified container application does not exist. |
404 | ManagerNotFound | The manager nodes do not exist or their status is abnormal. | The error message returned because the management node does not exist or is not running as expected. |
406 | AgentError | The agent service request failed. | The error message returned because the proxy request has failed. |
406 | AgentError.Job.SubmitFailure | Failed to submit jobs: %s | The error message returned because the following jobs have failed to be submitted: {}. |
406 | AgentError.Job.InvalidContainerType | Unsupported container type: %s. | The error message returned because the type of the specified container application is invalid: %s. |
406 | AliyunError | An Alibaba Cloud product error occurred. | The error message returned because the operation has failed to call another Alibaba Cloud service. |
406 | AgentError.Account.AccountValidateCredentialFailure | Cannot get user info | The error message returned because user information has failed to be queried. |
407 | NotAuthorized | You are not authorized by RAM for this request. | The error message returned because you are not authorized by RAM for this request. |
409 | PartFailure | Part of the batch operation failed. | The error message returned because the batch operation has failed. |
500 | UnknownError | An unknown error occurred. | The error message returned because an unknown error has occurred. Try again later. If the error persists, submit a ticket. |
503 | ServiceUnavailable | The request has failed due to a temporary failure of the server | The error message returned because the request has failed. The service is temporarily unavailable. |
For a list of error codes, visit the API Error Center.