This topic describes how to create and manage a public cloud cluster of Standard Edition in the Elastic High Performance Computing (E-HPC) console to help you get started with E-HPC.
Prerequisites
A service-linked role for E-HPC is created. The first time you log on to the E-HPC console, you are prompted to create a service-linked role for E-HPC.
A virtual private cloud (VPC) and a vSwitch are created. For more information, see Create and manage a VPC and Create a vSwitch.
Apsara File Storage NAS (NAS) is activated. A NAS file system and a mount target are created. For more information, see Create a file system and Manage mount targets.
Create a cluster
Go to the Create Cluster page.
On the Create Cluster page, configure the parameters in the following steps:
Cluster Configuration
Basic Settings
Parameter
Example
Description
Region
China (Hangzhou)
The region where you want to create a cluster.
Network and Availability Zone
VPC: vpc-bp1opxu1zkhn00g****
vSwitch: vsw-bp1ljgg5tjrs62n64****
The VPC in which your cluster is deployed and the vSwitch to which the cluster belongs.
NoteThe nodes in the cluster use the IP addresses in the vSwitch. Make sure that the number of available IP addresses is greater than that of cluster nodes.
Security Group
Select Automatically create a normal security group.
A security group is used to manage the inbound and outbound traffic of nodes in a cluster. The system automatically creates rules for the security group that is automatically created to enable communication between the nodes in the cluster.
Select the type of the security group that is automatically created based on your business requirements. For more information about the differences between basic and advanced security groups, see Basic security groups and advanced security groups.
Select a cluster type
This section describes how to create a public cloud cluster of Standard Edition. A cluster of this type consists of one management node and multiple compute nodes. You must select the type of the scheduler and configure the management node.
Parameter
Example
Description
Series
Select Standard Edition.
The series of the cluster.
Deployment Mode
Select Public cloud cluster.
The deployment mode of the cluster.
Cluster Type
Select SLURM.
The scheduler type of the cluster. Common schedulers in HPC scenarios are supported. Examples: Slurm and OpenPBS.
Management node
Instance Family: General-purpose Type g6
Instance Type: ecs.g6.large
Image: CentOS 7.6 64 bit
Storage: System Disk40G ESSD PL0
Hyper-Threading: Enable
The ECS instance in which the scheduler and domain account service are deployed. Select appropriate configurations for the management node based on your business scenario and cluster size.
Payment Details
The billing method of the management node. For more information, see Instance types.
Pay-as-you-go: You are charged based on the actual usage duration. Preemptible instances are not supported.
Subscription: You are charged based on a monthly or yearly basis.
Instance Type
The instance specifications of the management node that you can select based on your business requirements. We recommend that you specify the instance specifications of the management node based on the number of compute nodes.
If the number of compute nodes in the cluster is less than or equal to 100, we recommend that you select 16 or more vCPUs and 64 GiB or more of memory.
If the number of compute nodes in the cluster is less than or equal to 500, we recommend that you select 32 or more vCPUs and 128 GiB or more of memory.
If the number of compute nodes in the cluster is greater than 500, we recommend that you select 64 or more vCPUs and 256 or more GiB of memory.
Image
The image used to deploy the management node. Different images support different schedulers. The actual information that is displayed prevails.
Storage
The system disk specification of the management node and whether to attach a data disk to the management node. For more information about the disk type and performance level, see Disks.
Hyper-Threading
By default, Hyper-Threading is enabled. If your business requires better performance, you can disable Hyper-Threading.
Compute Node and Queue
Basic Settings
Parameter
Example
Description
Automatic queue scaling
Off
Specifies whether to enable Automatic queue scaling. After you turn on Automatic queue scaling, you can select Auto Grow and Auto Shrink based on your business requirements.
After you enable Automatic queue scaling, the system automatically increases or decreases compute nodes based on the configurations or the real-time load.
Queue Compute Nodes
5
The number of nodes in the queue.
If you do not enable Automatic queue scaling, configure the initial number of compute nodes in the queue.
If you enable Automatic queue scaling, configure the minimum and maximum number of compute nodes in the queue.
ImportantIf you set the Minimal Nodes parameter to a non-zero value, the queue retains the number of nodes based on the value that you specify during cluster scale-in. Idle nodes are not released. We recommend that you specify the Minimal Nodes parameter with caution to prevent resource waste and unnecessary costs due to idle nodes in the queue.
Select Queue Node Configuration
Parameter
Example
Description
Inter-node interconnection
Select VPCNetwork.
The network connection mode between compute nodes.
VPCNetwork: The compute nodes communicate with each other over VPCs.
eRDMANetwork: If the instance types of compute nodes support Elastic RDMA interfaces (ERIs), the compute nodes communicate with each other over elastic Remote Direct Memory Access (eRDMA) networks.
NoteOnly compute nodes of specific instance types support ERIs. For more information, see Overview and Configure eRDMA on an enterprise-level instance.
Virtual Switch
vsw-bp1ljgg5tjrs62n64****
The vSwitch to which the node belongs. The system automatically assigns an IP address to the compute node from the available vSwitch CIDR block.
Instance type Group
Instance Family: General-purpose Type g6
Instance Type: ecs.g6.large
Image: CentOS 7.6 64 bit
Storage: System Disk40G ESSD PL0
Hyper-Threading: Enable
Click Add Instance and select Instance Type.
If you do not enable Automatic queue scaling, you can add only one instance type. If you enable Automatic queue scaling, you can add multiple instance types.
Low Latency Deployment Set
Select Disable.
A deployment set provides a deployment strategy for deploying instances on physical servers. For more information, see Deployment set.
Shared File Storage
Parameter
Example
Description
Type
Select General-purpose NAS.
The type of the file system that you want to mount.
General-purpose NAS
Extreme NAS
Parallel file CPFS
File System
0e9104**** (Capacity NFS)
The ID and mount point of the file system that you want to mount. Make sure that the file system has sufficient mount points.
File System Directory
0e9104****-tpd33.cn.hangzhou.nas.aliyuncs.com
The directory of the file system that you want mount.
Mount Options
Select Mount over NFSv3.
The mount protocol.
Software and Service Component
You do not need to specify this parameter. By default, a logon node is configured.
Confirm configuration
Confirm the configurations and configure the cluster name and logon credentials.
Parameter
Example
Description
Cluster Name
E-HPC-test
The name of the cluster. The cluster name is displayed on the Cluster page to facilitate identification.
Login Credentials
Select Custom Password.
The credentials used to log on to the cluster. Only Custom Password is supported.
Set Password and Repeat Password
Ehpc12****
The password of the cluster. By default, the password is used for root users to log on to all nodes in the cluster.
Check the billing information, read and select Services and Agreements, and then click Create Cluster.
If a cluster named
E-HPC-testappears on the Cluster page and is in the Running status, the cluster is created.
Create a user
After you create the cluster, you must create a user to submit jobs in the cluster.
On the User Management page, click Add User.
In the Add User dialog box, configure the parameters and click Confirm. The following table describes the parameters.
Parameter
Example
Description
Username
test.user
The user name.
The name can contain 6 to 30 characters.
The name must start with a letter.
The name can contain letters, digits, and periods
(.).
Role Permissions
Sudo Permissions Group
Regular Permissions Group: suitable for regular users that only submit and debug jobs.
Sudo Permissions Group: suitable for administrators who need to manage clusters. In addition to submitting and debugging jobs, users who have sudo permissions can run sudo commands to install software and restart nodes.
ImportantExercise caution when you grant sudo permissions to users. A cluster may not run as expected if a user who has sudo permissions perform a misoperation, such as deleting an E-HPC software-stack module by mistake.
Password and Repeat Password
Ehpc12****
The password required if the user wants to log on to the cluster by using the password. Follow the on-screen instructions to specify the parameters.
Scale out a cluster
On the Cluster List page, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose .
Click Add Node. On the Add Node page, configure the following parameters.
Basic Settings
Parameter
Example
Description
Destination Queue
comp
Select a queue that you created in the cluster.
Nodes
10
Specify the number of nodes that you want to add to the cluster.
Node Configurations
Parameter
Example
Description
Select Node Type
Create Node
Valid value: Create Node.
Inter-node interconnection
VPCNetwork
The network connection mode between nodes.
VPCNetwork: The compute nodes communicate with each other over VPCs.
eRDMANetwork: If the instance types of compute nodes support ERIs, the compute nodes communicate with each other over eRDMA networks.
NoteOnly compute nodes of specific instance types support ERIs. For more information, see Overview and Configure eRDMA on an enterprise-level instance.
Virtual Switch
vsw-bp1ljgg5tjrs62n64****
The vSwitch to which the node belongs. The system automatically assigns an IP address to the compute node from the available vSwitch CIDR block.
Instance type Group
Instance Family: General-purpose Type g6
Instance Type: ecs.g6.large
Image: CentOS 7.6 64 bit
Storage: System Disk40G ESSD PL0
Hyper-Threading: Enable
Click Add Instance and select Instance Type.
If you do not enable Automatic queue scaling, you can add only one instance type. If you enable Automatic queue scaling, you can add multiple instance types.
Select I have learned that "deletion protection" is enabled by default for added nodes to prevent the nodes from being affected by queue scaling activities. I understand that I can disable deletion protection for the nodes or manually delete the nodes to avoid unnecessary costs. and click Confirm Add.
You can view the status of the scaled-out nodes in the node list on the Node page. If the nodes are in the Running status, the cluster is scaled out.
Submit a job
On the details page of the cluster, click Job Management in the left-side navigation pane.
Click Create Job.
On the Create Job page, configure the parameters and click Confirm Create.
NoteSpecify the parameters in the following table and retain the default settings for other parameters. For more information, see Submit a job.
Parameter
Required
Example
Description
Job Name
Yes
testjob
The name of the job.
Scheduler Queue
Yes
comp
The name of the queue in which the job is run.
Run Command
Yes
/home/test.user/testjob.slurmThe job execution command that you want to submit to the scheduler. You can enter a command or the relative path of the script file.
If the script file is executable, enter its relative path. Example:
/home/test.user/testjob.slurm.If the script file is not executable, enter the execution command. Example:
/opt/mpi/bin/mpirun /home/test/job.slurm.
Delete a compute node
You can delete the compute nodes that you no longer require in a cluster.
Select one or more compute nodes that you want to delete from the node list.
Click Delete in the lower part of the node list.
Read the displayed message and then click Confirm.
Release a cluster
If you no longer need a cluster, you can release the cluster.
On the Cluster Details page, click More in the upper-right corner, and then select Release the cluster.
In the message that appears, click OK.
References
You can use a cluster template to quickly create a cluster in which GROMACS is pre-installed and submit jobs by using the E-HPC Portal. For more information, see Use GROMACS to analyze jobs.