Elastic High Performance Computing (E-HPC) allows you to manage compute nodes that run different jobs or perform different tasks by adding them to different queues. By grouping compute nodes in queues, you can filter and schedule compute nodes more flexibly to optimize job execution efficiency. This topic describes how to use queues to manage compute nodes by group, including creating queues, deleting queues, and editing queue configurations.
Queues are an important dimension in resource monitoring. You can view the overall load and performance of compute nodes by queue on the Monitoring page. For more information, see View the monitoring information.
Prerequisites
The cluster is in the Running state.
When you delete a queue, no compute nodes are in the queue.
Create a queue
Go to the Cluster Details page.
Log on to the E-HPC console.
In the left part of the top navigation bar, select a region.
In the left-side navigation pane, click Cluster.
On the Cluster List page, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose .
Click Create Queue. On the Create Queue page, configure the parameters as required.
The following tables describe the parameters.
Basic Settings
Parameter
Description
Queue Name
Enter a queue name. The name must meet the following requirements:
The name is 1 to 15 characters in length.
The name contains one or more of the following character types: uppercase letters (A to Z), lowercase letters (a to z), digits (0 to 9), and underscores (_).
Automatic queue scaling
Specify whether to enable automatic scaling for the queue. After you turn on Automatic queue scaling, you can select Auto Grow and Auto Shrink based on your business requirements.
After you enable automatic scaling for the queue, the system automatically adds or removes compute nodes based on the configurations and real-time workloads.
Queue Compute Nodes
Specify the number of compute nodes in the queue.
If you do not enable automatic scaling for the queue, configure the initial number of compute nodes in the queue.
If you enable automatic scaling for the queue, configure the minimum and maximum number of compute nodes in the queue.
ImportantIf you set the Minimal Nodes parameter to a non-zero value, the queue retains the number of nodes based on the value that you specify during cluster scale-in. Idle nodes are not released. We recommend that you specify the Minimal Nodes parameter with caution to prevent resource waste and unnecessary costs due to idle nodes in the queue.
Select Queue Node Configuration
If you enable Automatic queue scaling or set Initial Number of Nodes to a value larger than 0, you must configure the following parameters to enable the system to create compute nodes for the queue:
Configuration item
Description
Inter-node interconnection
Select a mode to interconnect nodes. Valid values:
VPCNetwork: The compute nodes communicate with each other over virtual private clouds (VPCs).
eRDMANetwork: If the instance types of compute nodes support eRDMA interfaces (ERIs), the compute nodes communicate with each other over eRDMA networks.
NoteOnly compute nodes of specific instance types support ERIs. For more information, see Overview and Enable eRDMA on an enterprise-level instance.
Virtual Switch
Specify a vSwitch for the nodes to use. The system automatically assigns IP addresses to the compute nodes from the available vSwitch CIDR block.
Instance type Group
Click Add Instance and select an instance type in the panel that appears.
If you do not enable Automatic queue scaling, you can add only one instance type. If you enable Automatic queue scaling, you can add multiple instance types.
ImportantYou can select multiple vSwitches and instance types as alternatives in case that instances fail to be created due to inventory issues. When you create a compute node, the system attempts to create the node in the sequence of specified instance type and zone. For example, the system first attempts to create a node based on the instance type that you specify in sequence in the zone where the first vSwitch resides. The specifications of a created instance may vary based on the inventory.
Auto Scale
Configuration item
Description
Scaling Policy
Select a scaling policy. Currently, only Supply Priority Strategy is supported. This policy indicates that compute nodes that meet the specifications requirements are created in the specified zones in the order of the configured vSwitches.
Maximum number of single expansion nodes
Specify the number of nodes to be added or removed in each scale-out or scale-in cycle. The default value 0 specifies that the number is unlimited.
We recommend that you configure this parameter to control your costs on compute nodes.
Prefix of Hostnames
Specify the hostname prefix for the compute nodes. The prefix is used to distinguish between the nodes of different queues.
Hostname Suffix
Specify the hostname suffix for the compute nodes. The suffix is used to distinguish between the nodes of different queues.
Instance RAM role
Bind a Resource Access Management (RAM) role to the nodes to enable the nodes to access Alibaba Cloud services.
We recommend that you select the default role AliyunECSInstanceForEHPCRole.
Click Save.
Click the
icon to refresh the queue list. If the queue appears in the list, you have successfully created the queue.
Configure a queue
To prevent impacts on ongoing business, we recommend that you configure queues when your business is idle.
Go to the Cluster Details page.
Log on to the E-HPC console.
In the left part of the top navigation bar, select a region.
In the left-side navigation pane, click Cluster.
On the Cluster List page, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose .
Find the queue that you want to manage and click Edit in the Actions column.
On the Edit Queue page, configure the parameters that are described in the following tables.
Basic Settings
Parameter
Description
Automatic queue scaling
Automatic queue scaling is turned off by default. After you turn on the switch, you can select Auto Grow and Auto Shrink based on your business requirements.
NoteIf the configurations of a queue are different from the global configurations of a cluster, the configurations of the queue take precedence.
Queue Compute Nodes
The range of the number of compute nodes in the queue.
Minimum Nodes: The minimum number of compute nodes ranges from 0 to 1000. The value may affect the effect of the scale-in.
Maximum Nodes: The maximum number of compute nodes ranges from 0 to 5000. The value may affect the effect of the scale-out.
ImportantIf you set the Minimal Nodes parameter to a non-zero value, the queue retains the number of nodes based on the value that you specify during cluster scale-in. Idle nodes are not released. We recommend that you specify the Minimal Nodes parameter with caution to prevent resource waste and unnecessary costs due to idle nodes in the queue.
The maximum number of nodes in the queue cannot exceed the maximum number of nodes in the cluster.
Select Queue Node Configuration
If you enable Automatic queue scaling or set Initial Number of Nodes to a value larger than 0, you must configure the following parameters to enable the system to create compute nodes for the queue:
Configuration item
Description
Inter-node interconnection
Select a mode to interconnect nodes. Valid values:
VPCNetwork: The compute nodes communicate with each other over virtual private clouds (VPCs).
eRDMANetwork: If the instance types of compute nodes support eRDMA interfaces (ERIs), the compute nodes communicate with each other over eRDMA networks.
NoteOnly compute nodes of specific instance types support ERIs. For more information, see Overview and Enable eRDMA on an enterprise-level instance.
Virtual Switch
Specify a vSwitch for the nodes to use. The system automatically assigns IP addresses to the compute nodes from the available vSwitch CIDR block.
Instance type Group
Click Add Instance and select an instance type in the panel that appears.
If you do not enable Automatic queue scaling, you can add only one instance type. If you enable Automatic queue scaling, you can add multiple instance types.
ImportantYou can select multiple vSwitches and instance types as alternatives in case that instances fail to be created due to inventory issues. When you create a compute node, the system attempts to create the node in the sequence of specified instance type and zone. For example, the system first attempts to create a node based on the instance type that you specify in sequence in the zone where the first vSwitch resides. The specifications of a created instance may vary based on the inventory.
Auto Scale
Configuration item
Description
Scaling Policy
Select a scaling policy. Currently, only Supply Priority Strategy is supported. This policy indicates that compute nodes that meet the specifications requirements are created in the specified zones in the order of the configured vSwitches.
Maximum number of single expansion nodes
Specify the number of nodes to be added or removed in each scale-out or scale-in cycle. The default value 0 specifies that the number is unlimited.
We recommend that you configure this parameter to control your costs on compute nodes.
Prefix of Hostnames
Specify the hostname prefix for the compute nodes. The prefix is used to distinguish between the nodes of different queues.
Hostname Suffix
Specify the hostname suffix for the compute nodes. The suffix is used to distinguish between the nodes of different queues.
Instance RAM role
Bind a Resource Access Management (RAM) role to the nodes to enable the nodes to access Alibaba Cloud services.
We recommend that you select the default role AliyunECSInstanceForEHPCRole.
Click Save.
Click the
icon to refresh the queue list and view the information in the Auto Scaling Configurations column. If the information is updated, you have successfully edited the scaling configurations.
Delete queues
Before you delete a queue, make sure that the queue does not have compute nodes. Otherwise, you cannot delete the queue.
To prevent impacts on ongoing business, we recommend that you delete queues when your business is idle.
Go to the Cluster Details page.
Log on to the E-HPC console.
In the left part of the top navigation bar, select a region.
In the left-side navigation pane, click Cluster.
On the Cluster List page, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose .
Use one of the following methods to delete queues:
To delete a single queue, find the queue and click Delete in the Actions column.
To delete multiple queues at a time, select the queues and click Batch Delete in the lower part of the page.
In the message that appears, confirm the queue information and click OK.
Click the
icon to refresh the queue list. If the queue disappears in the list, you have successfully deleted the queue.