All Products
Search
Document Center

Create a cluster by using the wizard

Last Updated: May 19, 2022

When you create a cluster, you need to configure the hardware settings, software settings, and basic settings of the cluster. This topic describes how to create a cluster by using the wizard in the Elastic High Performance Computing (E-HPC) console.

Prerequisites

    Background information

    A cluster provides computing resources and storage resources for later job submission, debugging, result storage, and result check. Before you create and use an E-HPC cluster, take note of the following information:

    • You can create up to three clusters in a region. To create multiple clusters, submit a ticket.

    • You are charged for creating a cluster. Fees include E-HPC service fees and other resource fees. For more information, see Billing overview.

    • After you create an E-HPC cluster, do not use the Elastic Compute Service (ECS) console to manage nodes. We recommend that you use the E-HPC console to manage nodes.

    Step 1: Configure hardware settings

    When you create a cluster, you must configure the hardware settings of the cluster. Hardware settings determine the performance of a cluster, including zone, deployment method, number of nodes, network type, and storage.

    You can specify hardware parameters based on your needs. If you need to use the cluster to perform molecular dynamics computing, you can select the GPU type to accelerate analysis.

    1. Log on to the E-HPC console.

    2. In the upper-left corner of the top navigation bar, select a region.

    3. In the left-side navigation pane, click Clusters.

    4. On the Cluster page, click Create Cluster.

    5. In the Hardware Configurations step, configure the hardware settings.

      Parameter

      Description

      Availability Zone

      The zone to which the cluster belongs.

      Note

      To ensure efficient communications between E-HPC nodes, all nodes reside in the same region and the same zone. For more information, see Regions and zones.

      Pricing Model

      The billing method of nodes in the cluster. The billing method does not apply to elastic IP addresses and NAS file systems.

      • Subscription: You can purchase or renew a compute node by week, month, or year.

      • Pay-As-You-Go: You are charged for compute nodes on an hourly basis.

      • Preemptible Instance: Only compute nodes are preemptible instances. Management nodes and logon nodes are pay-as-you-go instances.

      For more information, see Billing method overview.

      Deploy Mode

      The deployment mode of the cluster. Valid values:

      • Standard: The logon node, management nodes, and compute nodes are separately deployed.

      • Tiny: The logon node and management nodes are deployed on one node. Compute nodes are separately deployed.

      Node type and quantity

      The instance type and number of nodes. For more information, see Specifications and Best practices for instance type selection.

      • Compute Node: Compute nodes are used to run high-performance computing jobs. The overall performance of an E-HPC cluster depends on the compute node settings.

      • Management Node: Management nodes are used to schedule jobs and manage domain accounts.

      • Login Node: By default, an elastic IP address is attached to a logon node. You can remotely log on to a logon node and manage the cluster by running commands.

      Notice

      A logon node is configured as the development environment. It provides necessary resources and a testing environment for cluster users to facilitate software development and debugging. Therefore, we recommend that you configure a logon node by using an equal or higher ratio of CPU to memory than compute nodes.

      System Disk

      The cloud disk type and capacity of all node system disks. Valid values: 40 to 2000 GB.

      Note

      500 If you need to configure a system disk of more than 500 GB, submit a ticket.

    6. Click Advanced Configurations. In the Advanced Configurations section, specify parameters based on your needs.

      Parameter

      Description

      Ram role instance settings

      Enabled

      If you bind RAM roles to the logon node or management nodes, you can access Alibaba Cloud services on these nodes.

      By default, the feature is disabled. To enable it, submit a ticket.

      After the ticket is approved, perform the following operations based on your user type:

      • Alibaba Cloud account: Click Switch to RAM for authorization to authorize the current user to use the default RAM role.

      • RAM user: Log on to the RAM console by using an Alibaba Cloud account and select one of the following methods to grant permissions to the RAM user.

        • Add the following custom policy and attach the policy to the RAM user. For more information, see Authorize a RAM user to use an instance RAM role.

          {
              "Version": "1",
              "Statement": [
                  {
                      "Effect": "Allow",
                      "Action": "ram:ListRoles",
                      "Resource": "*"
                  }
              ]
          }
        • Grant the AliyunRAMReadonlyAccess permission to the RAM user. For more information, see Authorize a RAM user to use an instance RAM role.

          Note

          AliyunRAMReadOnlyAccess is the read-only permissions on all RAM resources, including the permissions to view users, groups, and authorization information. The authorization scope of this permission is greater than that of the custom policy.

          {
              "Version": "1",
              "Statement": [
                  {
                      "Action": [
                          "ram:Get*",
                          "ram:List*",
                          "ram:GenerateCredentialReport"
                      ],
                      "Resource": "*",
                      "Effect": "Allow"
                  }
              ]
          }

      Role Name

      The RAM role that you want to bind to the nodes.

      Note

      We recommend that you select the default role (AliyunECSInstanceForEHPCRole).

      Node Type

      You can bind RAM roles to following types of nodes:

      • Scheduling node

      • Account node

      • Logon node

      Resources Group

      Resource Group

      The resource group to which the cluster nodes belong. You can use the resource group to manage multiple cluster nodes that belong to your account in a centralized manner.

      Network Configuration

      EIP

      An EIP is a public IP address that can be separately purchased and owned. If you want to use an EIP to log on to the cluster for a long time, you can bind it to the logon node of the cluster.

      • Enable: An EIP is automatically created and bound to the logon node. You can access the cluster over the Internet.

      • Disable: You can access the cluster only over a virtual private cloud (VPC).

      Fees are incurred when you use EIP resources. For more information, see Billing overview.

      VPC

      The VPC where the cluster resides. Different VPCs are logically isolated with each other. You can create and manage E-HPC clusters in a VPC.

      If you do not specify these parameters, the first VPC and vSwitch in the VPC and vSwitch drop-down lists are selected by default. Make sure that the number of available IP addresses is greater than that of cluster nodes.

      You can also click Create VPC and Create vSwitch (for subnet) to create a VPC and a vSwitch. For more information, see Work with VPCs and Work with vSwitches.

      vSwitch

      Create Security Group

      You can configure security group rules to control the inbound and outbound traffic of nodes in the security group.

      • If you turn on the switch, you need to enter a new security group name in the Security Group Name field.

      • If you turn off the switch, you need to select an existing security group in the Select Security Group drop-down list.

      Storage

      Configure By Directory

      E-HPC stores all user data, scheduler data, and job sharing data on a NAS file system. All nodes in the cluster can access the data.

      • If you turn off the switch, only one file system is configured for the cluster.

      • If you turn on the switch, file systems are mounted for the directories of all nodes to improve the shared storage capability of the cluster.

      File System Type

      The type of the NAS file system. Valid values:

      • General Purpose

      • Extreme

      File System ID

      If you do not specify these parameters, the first file system and mount target in the File System ID and Mount Point drop-down lists are selected by default. Make sure that the file system has sufficient mount targets.

      You can also click Create a file system and Create mount point to create a file system and a mount target. For more information, see Create a file system and Manage mount targets.

      Mount Point

      Remote Directory

      The remote directory to which the file system is mounted.

    Step 2: Configure software settings

    Software settings include the image, scheduler, and mainstream E-HPC software that are installed on the nodes. You can select software based on your business requirements. If you need to perform molecular dynamics, we recommend that you install gromacs-gpu 2018.1, openmpi 3.0.0, cuda-toolkit 9.0, and vmd 1.9.3.

    1. After you configure hardware settings, click Next. In the Software Configurations step, configure software settings.

      Parameter

      Description

      Image type and image

      Different image types apply to different operating systems. The operating systems of all the nodes in a cluster are the same.

      You can select Public Image, Custom Image, or Shared Image.

      If you set Image Type to Custom Image, take note of the following limits:

      • E-HPC allows you to modify only an image provided by Alibaba Cloud.

      • You cannot use an existing image that was generated for another cluster. Otherwise, compute nodes may not run as expected after the current cluster is created.

      • You cannot modify the yum source configurations of the operating system in the custom image. Otherwise, the cluster cannot be created or scaled out.

      • The mount directory of the custom image cannot be the /home directory or /opt directory.

      Scheduler

      Schedulers are software that handle multiple jobs. A scheduler is deployed on an E-HPC cluster. E-HPC supports multiple schedulers. However, different schedulers apply to different image types. The console displays the schedulers supported by the specified image type.

      Domain Service

      The management service of the domain account. The cluster and users who use the cluster are managed based on this service. Valid values: nis and ldap.

    2. Click Advanced Configurations. In the Advanced Configurations section, specify parameters based on your needs.

      Parameter

      Description

      Queue Config

      Create New Queue

      E-HPC allows you to classify the compute nodes that run different jobs or perform different tasks by adding them to different queues. Jobs are run in a sequence that is determined by specified queues and scheduler.

      • Default Queue: The compute nodes of the cluster are automatically added to the default queue of the specified scheduler. For example, the default queue of PBS is workq, and that of slurm is comp.

      • New Queue: You must enter a queue name in the Queue Name field. The queue is automatically created and the specified compute nodes are added to it.

      Post-Install Script

      VNC

      If you turn on the VNC switch, the system automatically deploys a visualization service. You can directly access the E-HPC console on another computer by using Virtual Network Computing (VNC).

      Script URL

      The URL that is used to download the script after the cluster is created.

      Note

      You can download the script over HTTP or HTTPS. We recommend that you save the script in a public Object Storage Service (OSS) bucket.

      Arguments

      The runtime parameters of the script. For more information, see Configure the installation script.

      Software Version

      EMR Version

      The version of E-HPC.

      Other Software

      The high-performance computing (HPC) software that you need to install on the E-HPC cluster. After the cluster is created, the selected HPC software is installed in the specified NAS file system. For more information, see Overview.

      Notice

      You can select only the dependent software of the specified cluster specifications, for example, mpich or openmpi. The suffixes of the available software indicate the identities of the software. If you want to select a piece of software that is suffixed with gpu, make sure that your compute nodes are GPU instances. Otherwise, the cluster cannot be created, or the software may not run as expected.

    Step 3: Configure basic settings

    1. After you configure software settings, click Next. In the Basic Configurations step, configure basic settings.

      Parameter

      Description

      Cluster Name

      The name of the cluster. The cluster name is displayed on the Cluster page.

      Logon Password

      The password of the cluster. This password is used when you remotely use SSH to access the logon node of the cluster. The username is root.

      Repeat Password

      Enter the password again.

    2. In the Configuration List section, check the settings that you configured. Read and select Alibaba Cloud International Website Product Terms of Service, and click OK.

    Result

    After you create the cluster, you can check its status on the Cluster page. If the cluster and all cluster nodes are in the Running state, the cluster is created.

    What to do next