All Products
Search
Document Center

Elastic High Performance Computing:Scale out a hybrid cloud cluster

Last Updated:Jun 17, 2024

When your business grows, compute nodes may soon be used up in your hybrid cloud cluster. In this case, you can scale out the cluster by adding compute nodes to it to increase its computing power.

Prerequisites

  • A vSwitch is configured in the region of the compute nodes that you want to add. For more information, see Create and manage a vSwitch.

  • Sufficient unused Elastic Compute Service (ECS) instance quotas are available in the region of the compute nodes that you want to add. For more information, see View and increase instance quotas.

Procedure

  1. Log on to the E-HPC console.

  2. In the top navigation bar, select a region.

  3. In the left-side navigation pane, click Cluster.

  4. On the Cluster page, find the cluster that you want to scale out and click Resize.

  5. Select a scale-out method and configure the parameters based on your business requirements.

    Important

    The displayed scale-out methods vary based on the cluster type, allowing you to scale out by creating new nodes or adding existing or offline nodes. Click the corresponding tab to scale out your cluster based on your requirements. When you scale out by creating nodes, you can select Ordinary Scale-out or Multi-Zone Scale-Out for a hybrid cloud cluster that uses a custom or deadline scheduler. For hybrid cloud clusters that use other types of schedulers, you can select only Create Node to scale out the hybrid cloud clusters.

    Create Node

    Configure the parameters listed in the following table and click Buy Now.

    Parameter

    Description

    Zone

    The zone of the compute nodes that you want to add. The compute nodes that you want to add and the cluster can reside in different zones.

    vSwitch ID

    The vSwitch of the compute nodes that you want to add.

    Pricing Model

    The billing method of the compute nodes that you want to add. Valid values:

    • Subscription: You can purchase or renew compute nodes by week, month, or year.

    • Pay-As-You-Go: You are charged for compute nodes on an hourly basis.

    • Preemptible Instance: Preemptible instances are a type of on-demand instances that are offered at a discounted price compared with pay-as-you-go instances.

    For more information, see Overview of instances.

    Quantity And Type of Instances to be Added

    Select the number and instance type of the compute nodes that you want to add.

    You can add a maximum of 500 compute nodes. If you want to add more than 500 compute nodes, submit a ticket.

    Image Type and Image

    The following image types are supported: public image, custom image, shared image, Alibaba Cloud Marketplace image, and community image. The image types that you can select depend on the specified region and zone and whether the current Alibaba Cloud account has available image resources. The image types that are displayed on the console take precedence.

    Select the image that you want to deploy on the compute nodes based on the image type. Take note of the following items:

    • The image OS of the compute nodes that you want to add is the same as that of the compute nodes in the cluster. For example, if the operating system of the compute nodes in the cluster is CentOS, only a CentOS image can be selected.

    • The major OS version of the compute nodes that you want to add is the same as that of the existing compute nodes in the cluster. For example, if the OS version of the existing compute nodes in the cluster is CentOS 7.x, the version of the compute nodes that you want to add must be CentOS 7.x.

    • If you set Image Type to Custom Image, the custom image that you select must be created from compute nodes, not from logon nodes or management nodes. Otherwise, exceptions may occur when you scale out the cluster.

    Assign Public IP Address

    If the node that you want to add needs to access the Internet, you can turn on the switch. Configure the billing method for the bandwidth, and maximum outbound bandwidth.

    Whether to use DNS

    Specify whether to use Alibaba Cloud DNS to resolve the domain mains of the compute nodes. Only Alibaba Cloud DNS PrivateZone is supported.

    Queue

    The queue to which the compute nodes are added.

    eRDMA

    After you turn on eRDMA, the compute nodes are bound to remote direct memory access (RDMA) elastic network interfaces (ENIs) and can use high-performance RDMA network services. For more information, see Overview of eRDMA.

    Hostname Prefix and Hostname Suffix

    Configure a prefix and suffix for the hostnames of the compute nodes based on your needs. This facilitates the management of multiple compute nodes.

    System Disk

    Select the type and size of the system disk used by the compute nodes that you want to add, and configure whether to enable Hyper-Threading (HT).

    Note

    By default, HT is enabled for all ECS instances. For specific ECS instance types, you can disable HT for better performance. For more information, see Instance type limits and Disable HT for compute nodes.

    Data Disk

    If you want to attach additional data disks to the compute nodes, click Add Data Disk and configure the type, size, and quantity for the new data disks.

    Ordinary Scale-out

    Configure the parameters listed in the following table and click Buy Now.

    Parameter

    Description

    Zone

    The zone where the scale-out nodes reside. The compute nodes that you want to add and the cluster can reside in different zones.

    vSwitch ID

    The ID of an available vSwitch in the specified zone. Unavailable vSwitches are not displayed in the drop-down list.

    The scale-out nodes are deployed in a vSwitch.

    Pricing Model

    • Subscription: You can purchase or renew compute nodes by week, month, or year.

    • Pay-As-You-Go: You are charged for compute nodes on an hourly basis.

    • Preemptible Instance: Preemptible instances are a type of on-demand instances that are offered at a discounted price compared with pay-as-you-go instances.

    For more information, see Overview.

    Quantity And Type of Instances to be Added

    The number of compute nodes to be added to the cluster. You can add a maximum of 200 compute nodes.

    To increase the quota, submit a ticket.

    Image Type

    The image type of the scale-out nodes.

    Valid values: Public Image, Custom Image, and Share Image.

    Important

    If you set Image Type to Custom Image, the custom image that you select must be created from compute nodes instead of logon nodes or management nodes. Otherwise, exceptions may occur when you scale out the cluster.

    Image

    Select the image of the scale-out nodes. The image must meet the following conditions:

    • The operating system of the scale-out node image is the same as that of the nodes in the cluster. For example, if the operating system of the compute nodes in the cluster is CentOS, only a CentOS image can be selected.

    • The major OS version of the compute nodes that you want to add is the same as that of the existing compute nodes in the cluster. For example, if the OS version of the existing compute nodes in the cluster is CentOS 7.x, the version of the compute nodes that you want to add must be CentOS 7.x.

    Assign Public IP Address

    If the scale-out nodes need to access the Internet, you can turn on the switch. If you do not turn on the switch, the scale-out nodes can access only the resources in the VPC.

    Whether to use DNS

    Specify whether to use Alibaba Cloud DNS to resolve the domain names of the compute nodes. Only Alibaba Cloud DNS PrivateZone is supported.

    Queue

    The queue of the scale-out nodes. If you do not specify this parameter, the default queue is selected.

    eRDMA

    After you turn on eRDMA, the compute nodes are bound to remote direct memory access (RDMA) elastic network interfaces (ENIs) and can use high-performance RDMA network services. For more information, see Overview.

    Hostname Prefix and Hostname Suffix

    Specify a prefix and suffix to facilitate compute node management.

    System Disk

    The system disk size of the scale-out nodes. By default, an ultra disk is used. If no ultra disk is available in the current zone, an SSD disk is used.

    Data Disk

    If you want to attach additional data disks to the compute nodes, click Add Data Disk and configure the type, size, and quantity for the new data disks.

    Multi-Zone Scale Out

    Configure the parameters listed in the following table and click Buy Now.

    Parameter

    Description

    Quantity And Type of Instances to be Added

    The number of compute nodes to be added to the cluster. You can add a maximum of 200 compute nodes.

    To increase the quota, submit a ticket.

    Zone and vSwitch ID

    The zone where the scale-out nodes reside. The E-HPC console displays all zones that have available instance types that you specified. The policy of scale-out nodes varies based on their billing methods. You can select a policy based on your business requirements. The following items describe the policies:

    • Priority-oriented: The policy applies when the billing method of scale-out nodes is pay-as-you-go. The order in which zones and vSwitches are specified determines their priorities. E-HPC preferentially attempts to select compute nodes from the zone where the vSwitch with the highest priority resides. If the attempt fails, E-HPC attempts to select compute nodes from the zone where the vSwitch with the next highest priority resides.

    • Cost-oriented: The policy applies when scale-out nodes are preemptible instances. Scale-out is performed at lower cost and release rates.

    Scale-out Method

    A hybrid cloud cluster supports the following scale-out methods. You can select one based on the number and storage performance of scale-out nodes.

    • Single Scale-out: If you need to add no more than 50 compute nodes, we recommend that you select this method.

    • Batch Scale-out: If you need to add more than 50 compute nodes, we recommend that you select this method to prevent your business from being affected.

    Total Batches

    The number of batches to create nodes. For example, if you want to scale out 100 nodes in two batches, then 50 nodes are scaled out in each batch.

    If you set Scale-out Method to Batch Scale-out, you must configure Total Batches. Valid values: 1 to 10.

    Interval

    The interval between two consecutive batches. Valid values: 60 to 600. Unit: seconds.

    If you set Scale-out Method to Batch Scale-out, you must configure Interval.

    Pricing Model

    • Pay-As-You-Go: You are charged for compute nodes on an hourly basis.

    • Preemptible Instance: Preemptible instances are a type of on-demand instances that are offered at a discounted price compared with pay-as-you-go instances.

    For more information, see Overview.

    Image Type

    The image type of the scale-out nodes.

    Important

    If you set Image Type to Custom Image, the custom image that you select must be created from compute nodes instead of logon nodes or management nodes. Otherwise, exceptions may occur when you scale out the cluster.

    Image

    Select the image of the scale-out nodes. The image must meet the following conditions:

    • The operating system of the scale-out node image is the same as that of the nodes in the cluster. For example, if the operating system of the compute nodes in the cluster is CentOS, only a CentOS image can be selected.

    • The major OS version of the compute nodes that you want to add is the same as that of the existing compute nodes in the cluster. For example, if the OS version of the existing compute nodes in the cluster is CentOS 7.x, the version of the compute nodes that you want to add must be CentOS 7.x.

    Assign Public IP Address

    If the scale-out nodes need to access the Internet, you can turn on the switch. If you do not turn on the switch, the scale-out nodes can access only the resources in the VPC.

    Queue

    The queue of the scale-out nodes. If you do not specify this parameter, the default queue is selected.

    Hostname Prefix and Hostname Suffix

    Specify a prefix and suffix to facilitate compute node management.

    System Disk

    The system disk size of the scale-out nodes. By default, an ultra disk is used. If no ultra disk is available in the current zone, an SSD disk is used.

Result

After you scale out the cluster, the added compute nodes automatically install and initialize the applications in the cluster. The original compute nodes are not affected.

To query the status of the added compute nodes, choose Resource Management > Nodes in the left-side navigation pane. Select the cluster that you scaled out from the Cluster drop-down list. Select Compute Node from the Node Type drop-down list. If the compute nodes are in the Running state, the cluster is scaled out.