All Products
Search
Document Center

Elastic High Performance Computing:Create and manage clusters in the E-HPC console

Last Updated:May 27, 2025

This topic describes how to create and manage a public cloud cluster of Standard Edition in the Elastic High Performance Computing (E-HPC) console to help you get started with E-HPC.

Prerequisites

  • A service-linked role for E-HPC is created. The first time you log on to the E-HPC console, you are prompted to create a service-linked role for E-HPC.

  • A virtual private cloud (VPC) and a vSwitch are created. For more information, see Create and manage a VPC and Create a vSwitch.

  • Apsara File Storage NAS (NAS) is activated. A NAS file system and a mount target are created. For more information, see Create a file system and Manage mount targets.

Create a cluster

  1. Go to the Create Cluster page.

  2. On the Create Cluster page, configure the parameters in the following steps:

    1. Cluster Configuration

      • Basic Settings

        Parameter

        Example

        Description

        Region

        China (Hangzhou)

        The region where you want to create a cluster.

        Network and Availability Zone

        • VPC: vpc-bp1opxu1zkhn00g****

        • vSwitch: vsw-bp1ljgg5tjrs62n64****

        The VPC in which your cluster is deployed and the vSwitch to which the cluster belongs.

        Note

        The nodes in the cluster use the IP addresses in the vSwitch. Make sure that the number of available IP addresses is greater than that of cluster nodes.

        Security Group

        Select Automatically create a normal security group.

        A security group is used to manage the inbound and outbound traffic of nodes in a cluster. The system automatically creates rules for the security group that is automatically created to enable communication between the nodes in the cluster.

        Select the type of the security group that is automatically created based on your business requirements. For more information about the differences between basic and advanced security groups, see Basic security groups and advanced security groups.

      • Select a cluster type

        This section describes how to create a public cloud cluster of Standard Edition. A cluster of this type consists of one management node and multiple compute nodes. You must select the type of the scheduler and configure the management node.

        Parameter

        Example

        Description

        Series

        Select Standard Edition.

        The series of the cluster.

        Deployment Mode

        Select Public cloud cluster.

        The deployment mode of the cluster.

        Cluster Type

        Select SLURM.

        The scheduler type of the cluster. Common schedulers in HPC scenarios are supported. Examples: Slurm and OpenPBS.

        Management node

        • Instance Family: General-purpose Type g6

        • Instance Type: ecs.g6.large

        • Image: CentOS 7.6 64 bit

        • Storage: System Disk40G ESSD PL0

        • Hyper-Threading: Enable

        The ECS instance in which the scheduler and domain account service are deployed. Select appropriate configurations for the management node based on your business scenario and cluster size.

        • Payment Details

          The billing method of the management node. For more information, see Instance types.

          • Pay-as-you-go: You are charged based on the actual usage duration. Preemptible instances are not supported.

          • Subscription: You are charged based on a monthly or yearly basis.

        • Instance Type

          The instance specifications of the management node that you can select based on your business requirements. We recommend that you specify the instance specifications of the management node based on the number of compute nodes.

          • If the number of compute nodes in the cluster is less than or equal to 100, we recommend that you select 16 or more vCPUs and 64 GiB or more of memory.

          • If the number of compute nodes in the cluster is less than or equal to 500, we recommend that you select 32 or more vCPUs and 128 GiB or more of memory.

          • If the number of compute nodes in the cluster is greater than 500, we recommend that you select 64 or more vCPUs and 256 or more GiB of memory.

        • Image

          The image used to deploy the management node. Different images support different schedulers. The actual information that is displayed prevails.

        • Storage

          The system disk specification of the management node and whether to attach a data disk to the management node. For more information about the disk type and performance level, see Disks.

        • Hyper-Threading

          By default, Hyper-Threading is enabled. If your business requires better performance, you can disable Hyper-Threading.

    2. Compute Node and Queue

      • Basic Settings

        Parameter

        Example

        Description

        Automatic queue scaling

        Off

        Specifies whether to enable Automatic queue scaling. After you turn on Automatic queue scaling, you can select Auto Grow and Auto Shrink based on your business requirements.

        After you enable Automatic queue scaling, the system automatically increases or decreases compute nodes based on the configurations or the real-time load.

        Queue Compute Nodes

        5

        The number of nodes in the queue.

        • If you do not enable Automatic queue scaling, configure the initial number of compute nodes in the queue.

        • If you enable Automatic queue scaling, configure the minimum and maximum number of compute nodes in the queue.

          Important

          If you set the Minimal Nodes parameter to a non-zero value, the queue retains the number of nodes based on the value that you specify during cluster scale-in. Idle nodes are not released. We recommend that you specify the Minimal Nodes parameter with caution to prevent resource waste and unnecessary costs due to idle nodes in the queue.

      • Select Queue Node Configuration

        Parameter

        Example

        Description

        Inter-node interconnection

        Select VPCNetwork.

        The network connection mode between compute nodes.

        • VPCNetwork: The compute nodes communicate with each other over VPCs.

        • eRDMANetwork: If the instance types of compute nodes support Elastic RDMA interfaces (ERIs), the compute nodes communicate with each other over elastic Remote Direct Memory Access (eRDMA) networks.

          Note

          Only compute nodes of specific instance types support ERIs. For more information, see Overview and Configure eRDMA on an enterprise-level instance.

        Virtual Switch

        vsw-bp1ljgg5tjrs62n64****

        The vSwitch to which the node belongs. The system automatically assigns an IP address to the compute node from the available vSwitch CIDR block.

        Instance type Group

        • Instance Family: General-purpose Type g6

        • Instance Type: ecs.g6.large

        • Image: CentOS 7.6 64 bit

        • Storage: System Disk40G ESSD PL0

        • Hyper-Threading: Enable

        Click Add Instance and select Instance Type.

        If you do not enable Automatic queue scaling, you can add only one instance type. If you enable Automatic queue scaling, you can add multiple instance types.

        Low Latency Deployment Set

        Select Disable.

        A deployment set provides a deployment strategy for deploying instances on physical servers. For more information, see Deployment set.

    3. Shared File Storage

      Parameter

      Example

      Description

      Type

      Select General-purpose NAS.

      The type of the file system that you want to mount.

      • General-purpose NAS

      • Extreme NAS

      • Parallel file CPFS

      File System

      0e9104**** (Capacity NFS)

      The ID and mount point of the file system that you want to mount. Make sure that the file system has sufficient mount points.

      File System Directory

      0e9104****-tpd33.cn.hangzhou.nas.aliyuncs.com

      The directory of the file system that you want mount.

      Mount Options

      Select Mount over NFSv3.

      The mount protocol.

    4. Software and Service Component

      You do not need to specify this parameter. By default, a logon node is configured.

    5. Confirm configuration

      Confirm the configurations and configure the cluster name and logon credentials.

      Parameter

      Example

      Description

      Cluster Name

      E-HPC-test

      The name of the cluster. The cluster name is displayed on the Cluster page to facilitate identification.

      Login Credentials

      Select Custom Password.

      The credentials used to log on to the cluster. Only Custom Password is supported.

      Set Password and Repeat Password

      Ehpc12****

      The password of the cluster. By default, the password is used for root users to log on to all nodes in the cluster.

  3. Check the billing information, read and select Services and Agreements, and then click Create Cluster.

    If a cluster named E-HPC-test appears on the Cluster page and is in the Running status, the cluster is created.

Create a user

After you create the cluster, you must create a user to submit jobs in the cluster.

  1. On the User Management page, click Add User.

  2. In the Add User dialog box, configure the parameters and click Confirm. The following table describes the parameters.

    Parameter

    Example

    Description

    Username

    test.user

    The user name.

    • The name can contain 6 to 30 characters.

    • The name must start with a letter.

    • The name can contain letters, digits, and periods (.).

    Role Permissions

    Sudo Permissions Group

    • Regular Permissions Group: suitable for regular users that only submit and debug jobs.

    • Sudo Permissions Group: suitable for administrators who need to manage clusters. In addition to submitting and debugging jobs, users who have sudo permissions can run sudo commands to install software and restart nodes.

      Important

      Exercise caution when you grant sudo permissions to users. A cluster may not run as expected if a user who has sudo permissions perform a misoperation, such as deleting an E-HPC software-stack module by mistake.

    Password and Repeat Password

    Ehpc12****

    The password required if the user wants to log on to the cluster by using the password. Follow the on-screen instructions to specify the parameters.

Scale out a cluster

  1. On the Cluster List page, find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, choose Nodes and Queues > Node.

  3. Click Add Node. On the Add Node page, configure the following parameters.

    • Basic Settings

      Parameter

      Example

      Description

      Destination Queue

      comp

      Select a queue that you created in the cluster.

      Nodes

      10

      Specify the number of nodes that you want to add to the cluster.

    • Node Configurations

      Parameter

      Example

      Description

      Select Node Type

      Create Node

      Valid value: Create Node.

      Inter-node interconnection

      VPCNetwork

      The network connection mode between nodes.

      • VPCNetwork: The compute nodes communicate with each other over VPCs.

      • eRDMANetwork: If the instance types of compute nodes support ERIs, the compute nodes communicate with each other over eRDMA networks.

        Note

        Only compute nodes of specific instance types support ERIs. For more information, see Overview and Configure eRDMA on an enterprise-level instance.

      Virtual Switch

      vsw-bp1ljgg5tjrs62n64****

      The vSwitch to which the node belongs. The system automatically assigns an IP address to the compute node from the available vSwitch CIDR block.

      Instance type Group

      • Instance Family: General-purpose Type g6

      • Instance Type: ecs.g6.large

      • Image: CentOS 7.6 64 bit

      • Storage: System Disk40G ESSD PL0

      • Hyper-Threading: Enable

      Click Add Instance and select Instance Type.

      If you do not enable Automatic queue scaling, you can add only one instance type. If you enable Automatic queue scaling, you can add multiple instance types.

  4. Select I have learned that "deletion protection" is enabled by default for added nodes to prevent the nodes from being affected by queue scaling activities. I understand that I can disable deletion protection for the nodes or manually delete the nodes to avoid unnecessary costs. and click Confirm Add.

    You can view the status of the scaled-out nodes in the node list on the Node page. If the nodes are in the Running status, the cluster is scaled out.

Submit a job

  1. On the details page of the cluster, click Job Management in the left-side navigation pane.

  2. Click Create Job.

  3. On the Create Job page, configure the parameters and click Confirm Create.

    Note

    Specify the parameters in the following table and retain the default settings for other parameters. For more information, see Submit a job.

    Parameter

    Required

    Example

    Description

    Job Name

    Yes

    testjob

    The name of the job.

    Scheduler Queue

    Yes

    comp

    The name of the queue in which the job is run.

    Run Command

    Yes

    /home/test.user/testjob.slurm

    The job execution command that you want to submit to the scheduler. You can enter a command or the relative path of the script file.

    • If the script file is executable, enter its relative path. Example: /home/test.user/testjob.slurm.

    • If the script file is not executable, enter the execution command. Example: /opt/mpi/bin/mpirun /home/test/job.slurm.

Delete a compute node

You can delete the compute nodes that you no longer require in a cluster.

  1. Select one or more compute nodes that you want to delete from the node list.

  2. Click Delete in the lower part of the node list.

  3. Read the displayed message and then click Confirm.

Release a cluster

If you no longer need a cluster, you can release the cluster.

  1. On the Cluster Details page, click More in the upper-right corner, and then select Release the cluster.

  2. In the message that appears, click OK.

References

You can use a cluster template to quickly create a cluster in which GROMACS is pre-installed and submit jobs by using the E-HPC Portal. For more information, see Use GROMACS to analyze jobs.