All Products
Search
Document Center

Elastic High Performance Computing:Scale out a cluster

Last Updated:Mar 21, 2025

If the compute nodes of an Elastic High Performance Computing (E-HPC) cluster cannot meet your business requirements, you can scale out the cluster by adding compute nodes.

Prerequisites

  • The following table lists the recommended management node configurations and job queues for different cluster sizes. For more information, see Overview.

    Number of compute nodes

    Specifications of management nodes

    Job quantity

    100 or less compute nodes

    • 16 or more vCPUs

    • 64 or more GiB of memory

    • Less than 5,000 queued jobs

    • Less than 10,000 uncompleted jobs

    500 or less compute nodes

    • 32 or more vCPUs

    • 128 or more GiB of memory

    • Less than 10,000 queued jobs

    • Less than 20,000 uncompleted jobs

    More than 500 compute nodes

    • 64 or more vCPUs

    • 256 or more GiB of memory

    • Less than 10,000 queued jobs

    • Less than 20,000 uncompleted jobs

  • A vSwitch is available in the region where the compute nodes that you want to add reside. For more information, see Work with vSwitches.

  • Sufficient Elastic Compute Service (ECS) instance quotas exist in the region where the compute nodes that you want to add reside. For more information, see Manage ECS quotas.

Procedure

  1. Log on to the E-HPC console.

  2. In the top navigation bar, select a region.

  3. In the left-side navigation pane, click Cluster.

  4. On the Cluster page, find the cluster that you want to scale out, and click Resize.

  5. In the Resize panel, complete the basic configurations.

    During scale-out, you can add new compute nodes, existing compute nodes, or on-premises compute nodes. Select the corresponding tab to add compute nodes.

    • Create Node

      Configure the parameters listed in the following table and click Buy Now.

      Parameter

      Description

      Zone

      The zone where the compute nodes that you want to add reside. The compute nodes that you want to add and the cluster can reside in different zones.

      vSwitch ID

      The vSwitch of the compute nodes that you want to add.

      Pricing Model

      Specify the billing method of the compute nodes that you want to add. Valid values:

      • Subscription: You can purchase or renew compute nodes by week, month, or year.

      • Pay-As-You-Go: You are charged for compute nodes on an hourly basis.

      • Preemptible Instance: Preemptible instances are a type of on-demand instances that are offered at a discounted price compared with pay-as-you-go instances.

      For more information, see Overview.

      Quantity And Type of Instances to be Added

      Select the number and specifications of the compute nodes that you want to add.

      You can add a maximum of 500 compute nodes. To add more than 500 compute nodes, submit a ticket.

      Image Type

      The following image types are supported: public image, custom image, shared image, Alibaba Cloud Marketplace image, and community image. The image types that you can select depend on the specified region and whether the current Alibaba Cloud account has available image resources.

      Image

      Select the image that you want to deploy on the compute nodes based on the image type. Take note of the following information:

      • The image OS of the compute nodes that you want to add is the same as that of the compute nodes in the cluster. For example, if the operating system of the compute nodes in the cluster is CentOS, only a CentOS image can be selected.

      • The major OS version of the compute nodes that you want to add is the same as that of the existing compute nodes in the cluster. For example, if the OS version of the existing compute nodes in the cluster is CentOS 7.x, the version of the compute nodes that you want to add must be CentOS 7.x.

      • If you set Image Type to Custom Image, the custom image that you select must be created from compute nodes, not from logon nodes or management nodes. Otherwise, exceptions may occur when you scale out the cluster.

      Assign Public IP Address

      If the node that you want to add need to access the Internet, you can turn on the switch. Configure the billing method for the bandwidth, and maximum outbound bandwidth.

      Queue

      Specify the queue to which the compute nodes are added.

      Hostname Prefix and Hostname Suffix

      Configure a prefix and suffix for the hostnames of the compute nodes based on your needs. This facilitates the management of multiple compute nodes.

      System Disk

      Specify the type and size of the system disk used by the compute nodes that you want to add.

      Add Data Disk

      If you want to attach additional data disks to the compute nodes, click Add Data Disk and configure the type, size, and quantity for the new data disks.

    • Existing Node

      Select one or more existing compute nodes from the instance list and click OK.

      Note

      Before you add existing compute nodes, you must stop the compute nodes. If no node is found, go to the ECS console to check the status of the compute nodes.

    • On-premises Node

      Select a queue, enter the information about the on-premises compute nodes one by one or import the information at a time, and then click OK.

      Note

      On-premises compute nodes cannot be moved among queues. If you do not use the default queue, make sure that a queue is created to manage the on-premises compute nodes.

      Enter or import the following information:

      • Hostname: the hostname of the compute node.

      • Node ID/IP Address: the IP address of the compute node.

      • CPU: the number of CPUs of the compute node that is viewed by running the lscpu command.

      • Memory: the memory size of the compute node.

Verify the result

After you scale out the cluster, the added compute nodes automatically install and initialize the applications in the cluster. The original compute nodes are not affected. To query the status of the added compute nodes, choose Resource Management > Nodes in the left-side navigation pane. Select the cluster that you scaled out from the Cluster drop-down list. Select Compute Node from the Node Type drop-down list. If the compute nodes are in the Running state, the cluster is scaled out.