All Products
Search
Document Center

Elastic High Performance Computing:RecoverCluster

Last Updated:Apr 15, 2024

Resets and restores a cluster.

Operation description

You can call the operation to reset and restore a cluster only when the cluster is in the Exception state. You can call the ListClusters operation to query the ID and status of a cluster. We recommend that you export all job data before you restore a cluster. When you reset and restore a cluster, take note of the following impacts:

  • The system disks of all nodes are changed. By default, new system disks are configured based on the settings that you specified when the cluster was created.
  • The data on the system disks and data disks of all cluster nodes is lost. The data includes user information, job information, scheduler queue information, and configuration data of auto-scaling queues. However, the data on Apsara File Storage NAS file systems is retained.
  • The self-managed queues in the cluster are deleted. All nodes are retained and migrated to the default queue of the cluster.

Debugging

OpenAPI Explorer automatically calculates the signature value. For your convenience, we recommend that you call this operation in OpenAPI Explorer.

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

  • Operation: the value that you can use in the Action element to specify the operation on a resource.
  • Access level: the access level of each operation. The levels are read, write, and list.
  • Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
    • The required resource types are displayed in bold characters.
    • If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
  • Condition Key: the condition key that is defined by the cloud service.
  • Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
OperationAccess levelResource typeCondition keyAssociated operation
ehpc:RecoverClusterWRITE
  • All Resources
    *
    none
none

Request parameters

ParameterTypeRequiredDescriptionExample
ClusterIdstringYes

The cluster ID. The cluster must be in the Exception state.

You can call the ListClusters operation to query the ID and status of a cluster.

ehpc-hz-FYUr32****
OsTagstringNo

The tag of the system image.

You can call the ListImages and ListCustomImages operations to query the image tags supported by Elastic High Performance Computing (E-HPC).

CentOS_7.2_64
AccountTypestringNo

The service type of the domain account. Valid values:

  • nis
  • ldap

Default value: nis.

nis
SchedulerTypestringNo

The type of the scheduler. Valid values:

  • pbs
  • slurm
  • opengridscheduler
  • deadline

Default value: pbs.

pbs
ImageOwnerAliasstringNo

The type of the image. Valid values:

  • system: public image
  • self: custom image
  • others: shared image

Default value: system.

system
ImageIdstringNo

The image ID.

You can call the ListImages and ListCustomImages operations to query the images that are supported by E-HPC.

m-bp18133n0335yq****
ClientVersionstringNo

The version of the E-HPC client. The default value is the latest version of the client.

You can call the ListCurrentClientVersion operation to query the latest version of the E-HPC client.

1.0.76

Response parameters

ParameterTypeDescriptionExample
object
TaskIdstring

The task ID.

18FB21E3-F423-4B84-BB63-D8887A29****
RequestIdstring

The request ID.

18FB21E3-F423-4B84-BB63-D8887A29****

Examples

Sample success responses

JSONformat

{
  "TaskId": "18FB21E3-F423-4B84-BB63-D8887A29****",
  "RequestId": "18FB21E3-F423-4B84-BB63-D8887A29****"
}

Error codes

HTTP status codeError codeError messageDescription
400InvalidParamsThe specified parameter %s is invalid.The specified parameter %s is invalid.
400InDebtYour account has overdue payments.Your account has overdue payments.
400OrderError.InsufficientBalanceThe account balance is insufficient. Please add funds first and try again.Your account has overdue payments. Add funds to your account and try again.
400OrderError.InstHasUnpaidOrderYour account has an unpaid order.Your account has an unpaid order. Please pay the order and try again.
400OrderError.ArrearageYour account balance is less than CNY 100. Please add funds to your account and try again.Your account balance is less than CNY 100. Add funds to your account and try again.
400OrderError.NoCardNo credit card is bound to your account.You have not bound a card. Please perform binding first.
400OrderError.InvalidPayMethodNo valid default payment method is specified for your account.No valid payment method is found. Please check again.
400OrderError.NoRealNameAuthenticationYou have not completed the real name authentication.You must complete the real-name verification first.
400OrderError.NoRealNameRegistrationReal name registration is required for instances launched in mainland China.To purchase cloud services in mainland China regions on the international site, the user must first complete real-name registration.
400OrderError.UserProfileIncompleteYou have not completed your user profile.The user has not completed personal information on the international site.
400InvalidVpcThe specified VPC is invalid.The VPC information is invalid.
400InvalidVolumeThe specified volume is invalid.The specified volume is invalid.
400InvalidSoftwareThe specified software is not supported.The requested software is not supported.
400InvalidVolumeProtocalThe specified volume protocol is invalid.The storage protocol is invalid.
400InvalidVolumeMountpointThe specified volume mount point is invalid.The specified volume mount point is invalid.
403TooManyClustersThe number of user clusters exceeds the quota.The number of user clusters exceeds the quota. By default, the number of user clusters cannot exceed three.
403TooManyComputesThe number of computing nodes exceeds the quota.The number of computing nodes exceeds the quota.
403TooManyLoginsThe maximum number of logged on nodes is exceeded.The maximum number of logged on nodes is exceeded. The default maximum value is 2.
403TooManySccThe maximum number of SCC instances is exceeded.The maximum number of SCC instances is exceeded. The default maximum value is 15.
403QuotaExceeded.PrivateIpAddressInsufficient private IP addresses in vSwitch: %s.Insufficient private IP addresses in vSwitch: %s.
403ConflictOptA conflicting operation is running.A conflicting operation is running. Please try again later.
403ImageNotSupportedThe specified image is not supported.The specified image does not exist. Change the image and try again.
404ImageNotFoundThe specified image does not exist.The specified image does not exist. Please verify the parameter.
404VolumeNotFoundThe specified volume does not exist.The specified storage does not exist. Please verify the parameter.
404VpcNotFoundThe specified VPC does not exist.The specified VPC does not exist.
404ClusterNotFoundThe specified cluster does not exist.The specified instance does not exist.
406EcsErrorAn error occurred while calling the ECS API operation.An error occurred while calling the ECS API operation.
406NasErrorNAS API request failed.Failed to request the NAS interface.
406EipErrorThe EIP API request failed.EIP API request failed.
406OrderErrorAn order request error occurred.An order request error occurred.
406FailToGenIdGenerating cluster ID failed.Failed to generate the cluster ID. Please try again.
406DbErrorA database service error occurred.Database request failed.
406AliyunErrorAn Alibaba Cloud product error occurred.An Alibaba Cloud product error occurred.
407NotAuthorizedYou are not authorized by RAM for this request.The request is not authorized by RAM.
500UnknownErrorAn unknown error occurred.An unknown error occurred.
503ServiceUnavailableThe request has failed due to a temporary failure of the serverThe request has failed due to a temporary failure of the server.

For a list of error codes, visit the Service error codes.