All Products
Search
Document Center

Elastic High Performance Computing:Create a cluster by using the wizard

Last Updated:Mar 18, 2024

When you create an Elastic High Performance Computing (E-HPC) cluster, you must configure its hardware, software, and basic settings. This topic describes how to create a cluster by using the wizard in the E-HPC console.

Prerequisites

Background information

A cluster provides computing resources and storage resources. You can submit jobs, debug jobs, store results, and view results in the cluster. Before you create and use an E-HPC cluster, take note of the following information:

  • You can create up to three clusters in a region. To create more clusters, submit a ticket.

  • You are charged E-HPC service fees and other resource fees when you create a cluster. For more information, see Billable items.

  • Do not use the Elastic Compute Service (ECS) console to manage nodes in the cluster. We recommend that you manage the nodes in a cluster in the E-HPC console rather than the Elastic Compute Service (ECS) console.

Step 1: Configure hardware settings

When you create a cluster, you must configure the hardware settings of the cluster. The hardware settings determine the performance of a cluster, including the region, deployment mode, number of nodes, network type, and storage.

You can configure the hardware settings based on your business requirements.

  1. Log on to the E-HPC console.

  2. In the left part of the top navigation bar, select a region.

  3. In the left-side navigation pane, click Cluster.

  4. On the Cluster page, click Create Cluster.

  5. In the Hardware Configurations step, configure the hardware settings. The following table describes the parameters that you can configure.

    Parameter

    Description

    Availability Zone

    The zone to which the cluster belongs.

    Note

    To ensure efficient communication between E-HPC nodes, make sure that all nodes reside in the same region and zone. For more information, see Regions and zones.

    Pricing Model

    The billing method of nodes in the cluster. The billing method does not apply to elastic IP addresses and NAS file systems.

    • Subscription: You can purchase or renew a node by week, month, or year.

    • Pay-As-You-Go: You are charged for nodes on an hourly basis.

    • Preemptible Instance: Only compute nodes support preemptible instances. Both of the management nodes and logon nodes support only the pay-as-you-go billing method.

    For more information, see ECS billing method overview.

    Deploy Mode

    The deployment mode of the cluster. Valid values:

    • Standard: The logon node, management nodes, and compute nodes are deployed separately.

    • Tiny: The logon node and management nodes are deployed on the same instance. Compute nodes are deployed separately.

    Important

    If you want to use the Open Grid Scheduler (SGE), you must deploy the cluster in Tiny mode.

    Node type and quantity

    Specify the instance type and the number of nodes based on the deployment mode.

    • Specify instance types based on your business requirements. If you want to use the cluster to perform molecular dynamics computing, you can select the GPU type to accelerate analysis. For more information, see Specifications and Best practices for instance type selection.

      Note

      To create a cluster that is equipped with YiTian processors, select an instance type that is equipped with YiTian processors. For example, you can select ecs.g8m.large. The g8m instance family is in invitational preview. You can go to the g8m Instance Free Trial Application Form page to apply for a free trial use.

    • We recommend that you specify the instance specifications of management nodes based on the number of compute nodes.

      • If the number of compute nodes in the cluster is less than or equal to 100, we recommend that you select 16 or more vCPUs and 64 GiB or more of memory.

      • If the number of compute nodes in the cluster is less than or equal to 500, we recommend that you select 32 or more vCPUs and 128 GiB or more of memory.

      • If the number of compute nodes in the cluster is more than 500, we recommend that you select 64 or more vCPUs and 256 GiB or more of memory.

    • A logon node is configured as the development environment. A logon node provides the required resources and a testing environment to cluster users for software development and debugging. We recommend that you configure a logon node by using a CPU-to-memory ratio that is higher than or equal to the CPU-to-memory ratio of compute nodes.

    System Disk

    The cloud disk type and capacity of all node system disks. Valid values: 40 to 2000. Unit: GB.

    Note

    To configure a system disk with a capacity of more than 500 GB, submit a ticket.

  6. Expand the Advanced Configurations section. In the Advanced Configurations section, configure the network and storage settings.

    Parameter

    Description

    Authorized Instance Configurations

    Enabled

    Bind a RAM role to a node. This way, you can access Alibaba Cloud services on the node.

    Important

    By default, the feature is disabled. To enable the feature, submit a ticket.

    After the ticket is approved, perform the following operations based on your user type:

    • Alibaba Cloud account: Click Switch to RAM for authorization to authorize the current user to use the default RAM role.

    • RAM user: Log on to the RAM console by using an Alibaba Cloud account and select one of the following methods to grant permissions to the RAM user.

      • Add the following custom policy and attach the policy to the RAM user. For more information, see Create custom policies and Grant permissions to a RAM user.

        {
            "Version": "1",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Action": [
                        "ram:PassRole",
                        "ram:ListRoles"
                    ],
                    "Resource": "*"
                },
                {
                    "Effect": "Allow",
                    "Action": "ecs:AttachInstanceRamRole",
                    "Resource": "*"
                }
            ]
        }
      • Grant the AliyunRAMFullAccess permission to the RAM user.

        The AliyunRAMFullAccess permission is used to manage RAM users and permissions. This permission grants more privileges compared to a custom policy. For more information, see Create a RAM user and authorize the RAM user to access Log Service.

    Role Name

    The RAM role that you want to bind to the node. We recommend that you select the default role AliyunECSInstanceForEHPCRole.

    Node Type

    The type of the node to which you want to bind the RAM role. Valid values:

    • Scheduling Node

    • Domain Account Node

    • Logon Node

    • Compute Node

    Note

    If you select Compute Node, compute nodes that are added during scale-out activities are automatically bound to the specified RAM role.

    Resource Group

    Resource Group

    The resource group to which the cluster nodes belong. You can use the resource group to manage multiple cluster nodes that belong to your account in a centralized manner.

    Networking

    EIP

    An elastic IP address (EIP) is a public IP address that you can separately purchase and own. If you want to access the cluster from a static IP address, you can purchase and bind an EIP to the logon node of the cluster.

    • Use: An EIP is automatically created and bound to the logon node. You can access the cluster over the Internet.

    • Do Not Use: You can access the cluster only over a VPC.

    Note

    You are charged for using EIP resources. For more information, see Billing overview.

    VPC and vSwitch

    The VPC in which the cluster resides. Different VPCs are logically isolated from each other. You can create and manage E-HPC clusters in a VPC.

    By default, the first VPC and vSwitch in the VPC and vSwitch drop-down lists are selected. Make sure that the number of available IP addresses is greater than the number of cluster nodes.

    You can click Create VPC and Create vSwitch (for subnet) to create a VPC and a vSwitch. For more information, see Create and manage a VPC and Create and manage a vSwitch.

    Create Security Group

    You can configure security group rules to manage the inbound and outbound traffic of nodes in the security group.

    • If you turn on the switch, you must enter a new security group name in the Security Group Name field.

    • If you turn off the switch, you need to select an existing security group from the Select Security Group drop-down list.

    Storage

    Configure by Directory

    • If you turn off Configure by Directory, only one file system is configured for the cluster.

    • If you turn on Configure by Directory, file systems are mounted to the directories of all nodes. This improves the shared storage capacity of the cluster.

    Type

    The type of the file system. Valid values:

    • General-purpose NAS

    • Extreme NAS

    File System ID and Mount Point

    By default, the first file system and mount point in the File System ID and Mount Point drop-down lists are selected. Make sure that the file system has sufficient mount points.

    You can click Create a file system and Create mount point to create a file system and a mount point.

    Mount Configurations

    If you mount a General-purpose NAS file system, you can select a mount protocol. Valid values: Mount over NFSv3 and Mount over NFSv4.

    Remote Directory

    The remote directory to which the file system is mounted.

Step 2: Configure software settings

Software settings include the image and scheduler that are installed on the nodes and the domain account service that manages the cluster and cluster users.

  1. After you configure the hardware settings, click Next.

  2. In the Software Configurations step, configure the software settings. The following table describes the parameters that you can configure.

    Parameter

    Description

    Image Type and Image

    Select an image type based on your business requirements. Valid values:

    • Public Image

    • Custom Image

    • Shared Images

    • Alibaba Cloud Marketplace Image

    • Community Image

    If you set Image Type to Custom Image, take note of the following limits:

    • E-HPC supports CentOS images and custom images that are created based on Alibaba Cloud images. When you import an image, make sure that Check After Import is selected. Otherwise, the image cannot be identified in the E-HPC console.

    • You cannot use an existing image that was generated for another cluster. Otherwise, compute nodes may not run as expected after the current cluster is created.

    • You cannot modify the yum repository configurations of the operating system in a custom image. Otherwise, the cluster cannot be created or scaled out.

    • The mount directory of the custom image cannot be the /home directory or /opt directory.

    After you select an image type, you can select the image that you want to use. Different images apply to different operating systems. The system deploys cluster nodes based on the image that you select.

    Important

    The system automatically displays available images based on the region that you select, the available image resources, and the images that are supported by the node instance type.

    Scheduler

    Schedulers help you manage jobs, and are deployed on E-HPC clusters.

    E-HPC supports multiple schedulers. However, different schedulers apply to different image types. The E-HPC console displays the schedulers that are supported by the specified image type.

    Domain Service

    The domain account service based on which the cluster and cluster users are managed. nis and ldap are supported.

    VNC

    If you turn on VNC, the system automatically enables the Virtual Network Computing (VNC) service. You can access the E-HPC console on another computer by using VNC.

  3. Configure the queue and post-installation script settings.

    Parameter

    Description

    Queue Config

    Create New Queue

    E-HPC allows you to categorize compute nodes that run different jobs or perform different tasks by adding the nodes to different queues. Jobs are run in a sequence that is determined by the specified queues and scheduler.

    • Default Queue: The compute nodes of the cluster are automatically added to the default queue of the specified scheduler. For example, the default queue of PBS is workq, and the default queue of slurm is comp.

    • New Queue: You must enter a queue name in the Queue Name field. The queue is automatically created, and the specified compute nodes are added to the queue.

    Post-Install Script

    Script URL

    The URL that is used to download the script after the cluster is created.

    Note

    You can download the script over HTTP or HTTPS. We recommend that you save the script in a public Object Storage Service (OSS) bucket.

    Arguments

    The runtime parameters of the script. For more information, see Configure an installation script.

Step 3: Configure basic settings

  1. After you configure the software settings, click Next.

  2. In the Basic Configurations step, configure the basic settings. The following table describes the parameters that you can configure.

    Parameter

    Description

    Cluster Name

    The name of the cluster. The cluster name is displayed on the Cluster page.

    Logon Password and Repeat Password

    The password of the cluster. The password is required when you use SSH to remotely access the logon node of the cluster. The username is root.

  3. In the Configuration List section, check the parameters that you configured. Read and select Alibaba Cloud International Website Product Terms of Service and click OK.

Check the results

After you create the cluster, you can check the status of the cluster on the Cluster page. If the cluster and all cluster nodes are in the Running state, the cluster is created.

References