Elastic High Performance Computing (E-HPC) provides the auto scaling feature that can dynamically allocate compute nodes based on the configured auto scaling policy. The system can automatically add or remove compute nodes based on real-time workloads to improve cluster availability and save costs. This topic describes how to configure an auto scaling policy.
Prerequisites
When you use the auto scaling feature, take note of the following information:
The operating system of all nodes in the cluster is Linux.
The scheduler is PBS, Slurm, or Deadline.
Benefits
The auto scaling feature provides the following benefits:
Adds compute nodes based on the real-time workloads of your cluster to improve cluster availability.
Reduces the number of compute nodes to save costs without compromising cluster availability.
Stops faulty nodes and creates nodes to improve fault tolerance.
Procedure
弹性高性能计算控制台Log on to the .
In the top navigation bar, select a region.
In the left-side navigation pane, choose Elasticity > Auto Scale.
From the Cluster drop-down list on the Auto Scale page, select the cluster for which you want to configure the auto scaling policy.
In the Global Configurations section, set the parameters.
Parameter
Description
Enable Autoscale
Enable Auto Grow and Auto Shrink for all queues in a cluster.
NoteIf the settings in the Queue Configuration section are different from the settings in the Global Configurations section, the former prevails.
Compute Nodes
The range for the number of compute nodes that can be added to scale out the cluster. The upper limit is the sum of the maximum number of compute nodes configured for each queue in the cluster. The lower limit is the sum of the minimum number of compute nodes configured for each queue in the cluster.
Scale-in Time (Minute)
If the continuous idle duration of a compute node exceeds the scale-in duration, the node is released.
The continuous idle duration is the scale-in time interval multiplied by the number of consecutive idle times. By default, the scale-in interval is 2 minutes. The consecutive idle times of a compute node are the number of consecutive times that the compute node is idle during the resource scale-in check.
Image Type
The image type of the compute nodes that you want to add to the cluster. Only the images that are compatible with the image of the original compute nodes in the cluster are supported.
Exceptional Nodes
Select the nodes that you want to exclude from auto scaling.
If you want to retain a compute node, you can set the node as an exceptional node. Then, the node is not released even if it is idle.
In the Queue Configuration section, click Edit to set the parameters.
Parameter
Description
Auto Grow and Auto Shrink
Specifies whether to enable Auto Grow and Auto Shrink. By default, both switches are turned off.
NoteIf the settings in the Queue Configuration section are different from the settings in the Global Configurations section, the former prevails.
Queue Compute Nodes
The range of the number of compute nodes in the queue. Valid values:
Maximum Nodes: The maximum number of compute nodes that can be added ranges from 0 to 500.
Minimal Nodes: The minimum number of compute nodes that can be retained ranges from 0 to 50.
Prefix of Hostnames
The hostname prefix of the compute nodes. The prefix is used to distinguish between the nodes of different queues.
Maximum Scale-out Nodes in Each Round
The maximum number of compute nodes that can be added in each round of scale-out. The default value 0 indicates that the maximum number is not limited.
We recommend that you specify the parameter to control your costs on compute nodes.
If you set the parameter to A and you want to add B nodes, nodes are added based on the following rules:
If B is less than or equal to A, B nodes are added.
If B is greater than A, A nodes are added.
NoteIn addition to the parameter, the number of nodes in a cluster is also limited by the specified maximum number of nodes that can be added in a single queue and the specified maximum number of nodes that can be added in the cluster.
Minimum Scale-out Nodes in Each Round
The minimum number of compute nodes that must be added in each round of scale-out. The default value 1 indicates that at least one node must be added.
In some scenarios, you may need to add at least a certain number of nodes to ensure that the business can run as expected. Therefore, you can set the minimum number of nodes that must be added in each round. If the number of available ECS instances is less than the specified minimum number of nodes and the number of required nodes, the cluster is not scaled out to reduce waste.
If you set the parameter to A and you want to add B nodes, nodes are added in the following scenarios:
Assume that B is less than or equal to A. If the number of available ECS instances is greater than or equal to B, B nodes are added. If the number of available ECS instances is less than B, the cluster is not scaled out.
Assume that B is greater than A. If the number of available ECS instances is greater than or equal to B, B nodes are added. If the number of available ECS instances is less than B and greater than or equal to A, A nodes are added. If the number of available ECS instances is less than A, the cluster is not scaled out.
Hostname Suffix
The suffix of the hostname. The suffix is used to distinguish between the nodes of different queues.
Image Type
The image type of the nodes that you want to add in a single queue. You can specify different image types for different queues.
Image ID
The ID of the image to which the nodes that you want to add in a single queue belong. You can specify different image IDs for different queues.
NoteThis parameter is valid only for the current queue. If the image type or image ID is unspecified, the image type of the nodes that you want to add is the same as that specified in the global configurations. If the image type is unspecified in the global configurations, the image type of the nodes that you want to add is the same as the default image type of the cluster.
Configuration List
Each configuration list includes the configurations of the compute nodes that you want to add. The following configurations are displayed in this section:
Zone: a zone in the region where the cluster resides.
vSwitch ID: the vSwitch that is bound to the VPC of the cluster in the selected zone.
Instance Type: the instance type of the compute nodes that you want to add in a single queue.
NoteIf multiple instance types are configured in the queue, the cluster is scaled out based on the available instance types, task quantity, and GPU quantity in order. For example, each node in a queue must have at least 16 cores to meet your business requirements. The queue has nodes with 8 cores, 16 cores, and 32 cores. Then, ECS instances with 16 cores are automatically added to the queue. If no ECS instance with 16 cores is available, instances with 32 cores are automatically added to the queue.
Bid Strategy: the bidding method configured for the nodes that you want to add.
Maximum Price per Hour: You must set a maximum hourly price only when Bid Strategy is set to Preemptible instance with maximum bid price.
Read and select Alibaba Cloud International Website Product Terms of Service, and click OK.
Optional. View the auto scaling diagram of the cluster.
The auto scaling diagram shows the changes in the number of nodes over time during the auto scaling process based on the auto scaling policy that you configured. The diagram also shows the time consumed by node scale-in and scale-out at key points in time.
NoteYou can set the number of simulated concurrent nodes in the auto scaling diagram to simulate the changes of compute nodes during auto scaling.