In this article you will learn the procedure to create and configure an E-HPC cluster, methods to check the configuration list and how to use advanced configuration.
Log on to the E-HPC console. To access this console, you must first have a registered account. To complete the real-name registration in accordance with the latest rules and regulations in mainland China, click Free Account.
Select E-HPC > Cluster, select the region (for example, China East 1 (Hangzhou)), and click Create cluster.
When creating, managing, or using E-HPC clusters, do not use the ECS console to operate any single cluster, unless this operation is required. You must perform all the related operations in the E-HPC console only.
In the hardware configuration, set the zone, Pricing Model, Deploy Mode, Compute Node, Management Node, and Login Node.
To ensure efficent network communication between the E-HPC nodes, all the nodes of a cluster must be in the same zone. For more information, see Regions and zones. If the target region is unavailable when you create an E-HPC cluster, see Why can the E-HPC cluster not be activated in specified regions?
The pricing model is the billing method that is used for ECS instances in the cluster, which does not include the billing of elastic IP addresses and Network Attached Storage (NAS). Pricing models include the Subscription, Pay-As-You-Go, and Spot Instance billing methods.
Standard: Deploy the login node, the management node, and the compute node separately. Assign two or four high-availability instances to the management node.
Tiny: Deploy login and management services on the same instance, and deploy the compute node on multiple instances.
One-box: Deploy all services on the same instance in a cluster, and select local storage or NAS. NAS supports cluster expansion.
The E-HPC cluster mainly consists of the following nodes:
Compute node: Executes high-performance computing.
Management node: This node contains two independent sub-nodess:
Job scheduling sub-node: The scheduler that is used to deploy jobs.
Account management sub-node: The domain account management software to deploy the cluster.
- Login node: Supports public IP addresses. You can remotely log on to this node, and operate the HPC cluster using commands.
The job scheduling sub-node handles jobs only, and the domain account management node handles account information only. Therefore, we recommend that you select common enterprise-level instances, such as the sn1ne instance with a maximum of four CPUs, to ensure high availability.
The cluster performance depends on the hardware configuration of the compute node.
The login node is configured as the development environment, and provides all the necessary resources and shared cluster testing environment for software development and debugging by all users of this cluster. Therefore, we recommend that you configure the login node using an equal or higher ratio of CPU to memory, compared with the compute node. For more information about instance types, see Configurations.
Click Next to start configuring the software.
Specify the image type, operating system, scheduler, and applications.
The operation system options vary, depending on the image type. The operating system is deployed on all nodes in this cluster.
The scheduler is the job scheduling software that is deployed in the HPC cluster. The job scripts and parameters that are used to submit jobs in the cluster vary, depending on the scheduler.
The application is the HPC software that is deployed in the HPC cluster. HPC provides a variety of applications, such as GROMACS, OpenFOAM, and LAMMPS, including the corresponding software and operation dependencies. The specified applications are pre-installed in the cluster after you create this cluster.
Specify the cluster name and logon password.
The name in the basic configuration is the cluster name. The system displays this name in the list of clusters. You can search this list for a required cluster.
In the login settings, enter the password to be used to log on to the cluster. You can use this password to remotely log on to the login node over Secure Shell (SSH) as the root user.
Select Alibaba Cloud International Website Product Terms of Service check box and click OK.
You can view the configuration list next to the steps for creating the cluster. The configuration list only displays the common configuration by default. Select Advanced configuration to display the advanced configuration.
Click Show Topology from the upper-side of the configuration page to show or hide the topological relationship for the current configuration.
The topological relationship includes the virtual private cloud (VPC) name, VSwitch name, NAS instance name, and the configuration and the number of instances of the login node, management node, and compute node.
To check the status of this cluster, you can return to the Cluster page about 20 minutes after creating the cluster.The cluster has been created if all the nodes in this cluster are in Running status. Now, you can log on to this cluster to perform related operations. For more information, see Use a cluster.
Follow the previous steps to create a common E-HPC cluster. You can also perform an advanced configuration to specify more parameters. Click Advanced configuration at the lower-end of the Hardware Configuration and Software Configuration page.
Choose Create cluster > Hardware Configuration, and click Advanced configuration at the lower-end of the Hardware Configuration page to specify the parameters.
You can create the VPC and VSwitch in the Alibaba Cloud VPC console, and the security group in the Alibaba Cloud ECS console. Then, select the required VPC, VSwitch, and security group in the network configuration. You can also click Create VPC or Create VSwitch (for subnet) to go to the related console and create the corresponding component.
If you have not created a VPC and VSwitch, the system sets the VPC CIDR block to 192.168.0.0/16 and the VSwitch CIDR block to 192.168.0.0/20 by default.
If you have created the VPC, create the VSwitch in the required zone to continue with the steps for creating the cluster.
If you have created several VPCs and VSwitches, the system selects the first VPC and the first VSwitch to create the cluster. Make sure that the number of available IP addresses under the VSwitches is more than the number of nodes in the cluster. You can also select any VPC and VSwitch that you have created in Advanced configuration.
E-HPC stores all user data, user management data, job sharing data, and other information on the storage instance. All nodes in the cluster can access this information. Currently, E-HPC uses Network Attached Storage (NAS) to store the information. To use NAS, specify the mount point and remote directory. For more information, see Terms.
If you have not created the NAS instance and mount point in the current zone, the system uses the default NAS instance and mount point in the zone when you create the cluster.
If you have created several NAS instances and mount points in the current zone, the system selects the first NAS instance and the first mount point in the zone when you create the cluster. If no mount point is available to this NAS instance in the zone, the system creates a mount point when you create the cluster. Make sure that the number of all mount points that are created for this NAS instance among all zones has not reached the threshold before a mount point is created.
Select Create cluster > Software Configuration, and click Advanced configuration at the lower-end of the Software Configuration page to specify the advanced software parameters.
Specify the script to be executed automatically after you deploy the cluster. The script URL is the address where the specified script is located. The script is stored in Object Storage Service (OSS). You can enter the URL of the OSS file that contains this script. Arguments are the command-line parameters that are required to execute the script.
You can select a domain service and target software, as shown in the list.
You can select the pre-installed E-HPC application according to the dependent software such as mpich or openmpi. The suffix of an application name indicates the dependent software. If you select the software with the “-gpu” suffix, make sure that the compute node uses the GPU instance. Otherwise, the cluster cannot be created, or the application may not run correctly.