All Products
Search
Document Center

Elastic High Performance Computing:Create a Standard Edition cluster for the public cloud

Last Updated:Mar 13, 2026

Standard Edition clusters for the public cloud are deployed in a cloud environment. They consist of components such as ECS (Elastic Compute Service) instances and shared file systems. You are responsible for maintaining the availability of cluster services. This topic describes how to create a cluster for the public cloud in the console.

Background information

A Standard Edition E-HPC cluster for the public cloud consists of the following components:

  • Control plane node: An ECS instance that deploys the scheduler and domain account service to manage job scheduling and user information.

  • Compute nodes: Multiple ECS instances that can be managed in queues. These nodes support scaling and are used to run jobs.

  • Logon node: An ECS instance that deploys the Login component and is bound to an EIP (Elastic IP address). It is used for remote connections to the cluster.

  • Shared storage: Supports mounting NAS and CPFS file systems to share data, such as job data and software data.

Important
  • When you create an E-HPC cluster, the system automatically creates resources such as ECS instances. This may incur fees. For more information, see Billing overview.

  • After you create an E-HPC cluster, do not adjust individual cluster nodes in the ECS console except in special cases. Perform operations in the E-HPC console.

For more information about E-HPC clusters, see Cluster overview.

Prerequisites

  • A service-linked role is created. The first time you log on to the E-HPC console, the system prompts you to create a service-linked role for E-HPC.

  • A VPC (Virtual Private Cloud) and a vSwitch are created. For more information, see Create a VPC and Create a vSwitch.

  • Activate the NAS service and create a NAS file system and a mount target. For more information, see Create a file system and Add a mount target.

Manual creation

Step 1: Go to the Create Cluster page

Go to the Create Cluster page.

Step 2: Configure the cluster

On the Cluster Configuration page, configure the cluster network, type, scheduler, and other settings.

  • Basic settings

    Parameter

    Description

    Region

    Select the region where the cluster resides.

    Network and Zone

    Select the VPC and vSwitch for the cluster.

    Note

    The nodes in the cluster use IP addresses from the selected vSwitch. Make sure that the number of available IP addresses in the vSwitch is greater than the number of required nodes.

    Security Group

    A security group controls the inbound and outbound traffic of the cluster and its nodes. A security group that is automatically created by the system has rules added to it to ensure communication between nodes in the cluster.

    Select the type of security group to create automatically as needed. For information about the differences between basic and advanced security groups, see Basic security groups and advanced security groups.

  • Cluster type

    This type of cluster consists of one control plane node and multiple compute nodes. You can select the scheduler type for the cluster deployment and configure the control plane node.

    Configuration Item

    Description

    Series

    Select Standard Edition.

    Deployment mode

    Select Public Cloud Cluster.

    Cluster Type

    Select the scheduler type for the cluster. Schedulers commonly used in HPC scenarios are supported, including Slurm and OpenPBS.

    Control Plane Node

    The control plane node is an ECS instance that has the scheduler and domain account service deployed. Select the appropriate configurations for the control plane node based on your business scenario and cluster size.

    • Billing method

      Select how to pay for the control plane node. For more information about billing, see Instance type billing.

      • Pay-as-you-go: This is a post-paid method. You are billed based on the actual usage duration. Spot instances are not supported.

      • Subscription: This is a pre-paid method. You are billed on a monthly, or yearly basis.

    • Instance type

      Select an appropriate instance type for the control plane node. The recommended instance types for control plane nodes vary based on the cluster size:

      • If the number of compute nodes is less than or equal to 100, we recommend an instance type with at least 16 vCPUs and 64 GiB of memory.

      • If the number of compute nodes is greater than 100 and less than or equal to 500, we recommend an instance type with at least 32 vCPUs and 128 GiB of memory.

      • If the number of compute nodes is greater than 500, we recommend an instance type with at least 64 vCPUs and 256 GiB of memory.

    • Image

      After you select an image type, you can select an image. Different images correspond to different operating systems. The system deploys cluster nodes based on the image you select.

      Note

      Custom images have the following limits:

      • Custom images created from official Alibaba Cloud images and imported CentOS images are supported. When you import an image, select Run Detection After Import. Otherwise, the image cannot be detected in the E-HPC console.

      • You cannot use custom images created from existing E-HPC cluster nodes. Otherwise, an error occurs when you create compute nodes for the cluster.

      • Do not modify the yum source configuration of the operating system in a custom image. Otherwise, you cannot create or scale out a cluster.

      • The mount paths of a custom image (the paths where NAS file systems are mounted using the mount command) cannot include the /home and /opt directories.

    • Storage

      Select the system disk specifications for the control plane node and whether to attach a data disk. For more information about disk types and performance, see Cloud disk overview.

    • Hyper-threading

      CPU hyper-threading is enabled by default. If your business scenario requires better performance, you can disable CPU hyper-threading.

    Note

    After the cluster is created, the control plane node is automatically attached to the instance RAM role AliyunECSInstanceForEHPCRole. This role supports core features such as automatic scaling. Do not detach or replace this role in the ECS console. To grant more API call permissions, see E-HPC service role.

  • Custom options

    Parameter

    Description

    Scheduler

    Select the scheduler software to deploy based on the selected cluster type and the image configured for the control plane node.

    Domain account

    Select the domain account service to deploy for the cluster.

    Domain name resolution

    Keep the default value.

    Cluster post-processing script

    This script is used to process result data or perform other subsequent operations after a cluster compute job is complete.

    Maximum number of nodes

    The maximum number of nodes that the cluster can contain. This parameter and the maximum number of cores control the cluster size.

    Maximum number of cores

    The maximum number of cores that the cluster can contain. This parameter and the maximum number of nodes control the cluster size.

    Cluster deletion protection

    Set whether to enable deletion protection for the cluster. If you enable this feature, you must disable it before you can release the cluster. This prevents accidental cluster releases.

  • Resource group

    Resource groups are used to manage resources in groups. For more information, see Resource groups. By default, the cluster belongs to the default resource group. You can change this as needed.

Step 3: Configure compute nodes and queues

On the Compute Nodes and Queues page, configure the queues.

Queues are used to manage compute nodes in groups. You can specify a queue when you run a job. By default, a cluster has one queue (the comp queue). You can click Add More Queues to add more queues. Configure the following information for a single queue:

  • Basic settings

    Configuration Item

    Description

    Queue auto scaling

    Select whether to enable Auto Scaling. If you enable it, you can then choose whether to enable Auto Scale-out and Auto Scale-in as needed.

    After you enable automatic scaling, the system automatically adds or removes compute nodes based on the configuration and real-time workload.

    Number of nodes in queue

    Set the number of nodes in the queue.

    • If queue auto scaling is disabled, configure the initial number of compute nodes for the queue.

    • If queue auto scaling is enabled, configure the minimum and maximum number of nodes allowed in the queue.

      Important

      If you change the minimum number of nodes to a non-zero value, the queue retains that minimum number of nodes during a scale-in, even if the nodes are idle. Set the minimum number of nodes with caution to avoid resource waste and unnecessary costs from idle nodes remaining after a scale-in.

  • Select queue node configuration

    If queue auto scaling is enabled, or if it is disabled but the initial number of nodes is not 0, you must configure the following information so that the system can create compute nodes.

    Configuration item

    Description

    Node interconnect

    Select the network connection method between nodes.

    • VPC Network: Nodes communicate with each other over the VPC network.

    • eRDMA Network: If the nodes use instance types that support Elastic RDMA Interface (ERI), they can communicate over the elastic Remote Direct Memory Access (eRDMA) network.

      Note

      Only some node instance types support ERI. For more information, see eRDMA overview and Enable eRDMA on an enterprise-level instance.

    Use preset node pool

    Select a created preset node pool. The system automatically selects IP addresses and hostnames from the unassigned preset nodes in the pool to create compute nodes.

    Note

    Using a preset node pool for scale-out allows for the rapid reuse of pre-allocated resources. For more information, see Use a preset node pool in a cluster.

    vSwitch

    Select the vSwitch to which the nodes belong. The system automatically assigns IP addresses to the nodes from the available vSwitch CIDR blocks.

    Instance type group

    Click Add Instance Type to select the instance types for the nodes.

    If automatic scaling is disabled, you can add only one instance type. If automatic scaling is enabled, you can add multiple instance types.

    Important

    You can select multiple vSwitches and multiple instance types as backups to avoid instance creation failures due to inventory issues. When creating compute nodes, the system starts from the zone of the first vSwitch and tries to create instances in the order of the specified instance types until the required number of nodes is met. The instance types of the successfully created instances may vary with inventory changes.

  • Auto scaling

    Configuration item

    Description

    Scaling policy

    Select a scaling policy. Currently, only the Supply-prioritized Policy is supported. This means the system will try to create compute nodes that meet the specification requirements in order from the corresponding zones, following the configured vSwitch order.

    Maximum number of nodes per scaling activity

    The maximum number of nodes to add or remove in each scale-out or scale-in cycle. The default value is 0, which means there is no limit.

    If you have cost requirements, you can set this value to ensure that the number of scaled-out nodes does not exceed your expectations.

    Hostname prefix

    The starting characters of the node hostname, used to mark and distinguish nodes.

    Hostname suffix

    The ending characters of the node hostname, used to mark and distinguish nodes.

    Host RAM role

    Attach a RAM role to the nodes so they can get permissions to access Alibaba Cloud services.

    We recommend that you select the default role AliyunECSInstanceForEHPCRole created by the system.

Step 4: Configure shared file storage

On the Shared File Storage page, complete the storage configuration.

By default, the /home and /opt directories of the control plane node have a file system mounted as a shared storage directory. If you want to mount a file system for other directories, click Add More Storage and complete the relevant configurations. The following file system information needs to be configured for a single directory:

Note

The /home and /opt directories do not currently support mounting different file system directories.

Parameter

Description

Type

Select the type of file system to mount.

  • General-purpose NAS: Mount a General-purpose NAS file system.

  • Extreme NAS: Mount an Extreme NAS file system.

  • Parallel File CPFS: Mount a CPFS file system using the NFS protocol.

File system

Select the file system ID and mount target to mount. Make sure the file system has available mount targets.

File system directory

Enter the file system directory to mount.

Mount option

Select the mount protocol.

Step 5: Configure software and service components

On the Software and Service Components page, configure the software and service components.

  • Click Add Software. In the dialog box that appears, select the software to install. E-HPC provides software commonly used in the HPC industry. You can select as needed.

  • Click Add Service Component. In the dialog box that appears, select a service component and configure its parameters.

    Note

    Currently, only the Login component is supported.

    Public cloud clusters are configured with the Login component by default for remote connection to the cluster over the public network. The component parameters are described as follows:

    Configuration

    Configuration Item

    Description

    Custom parameters for the Login component

    SSH

    Set the port number, protocol, and allowed IP CIDR block for connecting to the cluster through Secure Shell (SSH).

    VNC

    Set the port number, protocol, and allowed IP CIDR block for connecting to the cluster through VNC.

    CLIENT

    Set the port number, protocol, and allowed IP CIDR block for connecting to the cluster through a client.

    Component deployment resources

    EIP Instance

    Bind an EIP to the ECS instance where the Login component is deployed so you can connect to the cluster over the public network. You can automatically create or select an existing EIP.

    ECS Instance

    Set the instance type for the ECS instance used to deploy the Login component.

    Note

    After the logon node is created, it is automatically attached to the instance RAM role AliyunECSInstanceForEHPCRole. This role allows features such as the Web Portal to function correctly. Do not detach or replace this role in the ECS console. To grant more API call permissions, see E-HPC service role.

Step 6: Confirm configuration

On the Confirm Configuration page, confirm the configuration information and set the cluster name and logon credential.

Configuration

Description

Cluster name

Enter a name. This name is displayed in the cluster list to help you find and identify the cluster.

Passwordless logon

Set whether the root user can log on without a password when switching from the control plane node to a compute node.

Important

Enabling this feature configures a one-way passwordless logon from the control plane node to all compute nodes for the root user. It does not support passwordless logon from compute nodes to the control plane node. Proceed with caution.

Logon credential

Select the credential for logging on to the cluster. Currently, only Custom Password is supported.

Set password, Confirm password

Enter the password for logging on to the cluster. All nodes in the cluster use this password as the logon password for the root user by default.

After completing the configuration, read the Terms of Service, confirm the fee information, and then click Create Cluster.

Template creation

E-HPC supports creating clusters quickly and in batches using templates. A template defines the basic parameters required to create a cluster. You can choose a cluster template provided by E-HPC or write your own custom template.

Use a public template to create a cluster

  1. Go to the Cluster List page.

    1. Log on to the E-HPC console.

    2. In the left part of the top navigation bar, select a region.

    3. In the left-side navigation pane, click Cluster.

  2. On the Cluster List page, click Cluster Template.

  3. In the dialog box that appears, select the template to use and click Create Cluster for that template.

    image

  4. Confirm the configuration information and enter the cluster name and other details.

    • In the Configuration Summary section, the default configuration provided by the template is displayed. If you want to modify the configuration, click Edit and modify the corresponding configuration items.

    • In the Management Settings section, complete the configuration as prompted on the page.

  5. Read the terms of service, confirm the fee information, and then click Create Cluster.

Use a custom template to create a cluster

  1. Write a custom template locally.

    This topic uses the following template as an example. Modify the parameters as needed.

    ### Basic cluster settings
    Region: cn-hangzhou                            # The region of the cluster. Optional. If left empty, the console automatically fills in the current region.
    ClusterName: "TestClusterName"                 # The name of the cluster. Optional. If left empty, a name is automatically generated based on the cluster type, such as SLURM-Region-DATESTAMP.
    ClusterDescription: "XXXXX"                    # The description of the cluster. Optional.
    ClusterCategory: "Standard"                    # The cluster edition. Required. Valid values: ['Standard', 'Serverless', 'SuperComputing'].
    ClusterVpcId: ""                               # The VPC ID of the cluster. Optional. If left empty, the console automatically selects a valid value in the region.
    ClusterVSwitchId: ""                           # The vSwitch ID of the cluster's head node. Optional. If left empty, the console automatically selects a valid value for the VPC ID.
    IsEnterpriseSecurityGroup: true                # Specifies whether to use an advanced security group. This parameter takes effect only when SecurityGroupId is empty. Optional. Default value: false, which indicates that a basic security group is used.
    SecurityGroupId: sg-bp1gje9ip78z7v6zy203       # If left empty, a security group is automatically created. Optional. Default value: empty, which indicates that a security group is automatically created.
    ClusterCustomConfiguration:                    # The custom post-installation script for the cluster. Optional.
      Script: oss://                               # The OSS path of the script file.
      Args: arg1 arg2                              # The parameters to pass to the script.
    MaxCount: 1000                                 # The maximum number of nodes in the cluster. Optional. Default value: 1000.
    MaxCoreCount: 100000                           # The maximum number of vCPUs in the cluster. Optional. Default value: 10000.
    DeletionProtection: true                       # Deletion protection for the cluster. Optional. Default value: true, which enables deletion protection.
    ResourceGroupId: rg-acfm2xumdifd3ri            # The resource group of the cluster. Optional. If left empty, the console automatically selects a valid value under the account.
    Tags:                                          # The tags of the cluster. Optional.
      - Key: String
        Value: String
    
    ### Cluster control service settings
    Manager:                                     # Head node module
      Scheduler:                                 # Scheduler service module
        Type: "SLURM"                            # Optional. The scheduler type. Default value: SLURM.
        Version: "22"                            # Optional. The scheduler version. Default value: 22.
      DirectoryService:                          # Account service module
        Type: "NIS"                              # Optional. The account module type. Default value: NIS.
        Version: "x.x.x"                         # Optional. The domain account version.
      DNS:                                       # DNS service module
        Type: "NIS"                              # Optional. The DNS service type. Default value: NIS.
        Version: "x.x.x"                         # Optional. The DNS service version.
      ManagerNode:                               # Head node instance
        InstanceType: "ecs.c7.xlarge"            # The instance type. Optional. Required for unmanaged clusters.
        ImageId: "m-xxxxxx"                      # The instance image. Optional. Required for unmanaged clusters.
        InstanceChargeType: "PostPaid"           # The billing method for the instance. Optional. The head node supports only PostPaid and Subscription. Default value: PostPaid.
        PeriodUnit: "Month"                      # The unit of the subscription duration. Optional. Required only if the billing method is Subscription.
        Period: 1                                # The subscription duration. Optional. Required only if the billing method is Subscription.
        AutoRenew: false                         # Specifies whether to enable auto-renewal. Optional. Required only if the billing method is Subscription.
        AutoRenewPeriod: 1                       # The auto-renewal duration. Optional. Required only if the billing method is Subscription.
        SpotStrategy: "SpotWithPriceGo"          # The bidding policy for the instance. Optional. The spot instance type. This parameter is invalid for the head node.
        SpotPriceLimit: 0.5                      # The maximum price for the spot instance. This parameter is invalid for the head node.
        Duration: 1                              # The retention period of the spot instance. This parameter is invalid for the head node.
        SystemDisk:                              # Optional. System disk parameters.
          Category: "cloud_essd"                 # Optional. Default value: cloud_essd.
          Size: 40                               # Optional. Default value: 40.
          Level: "PL0"                           # Optional. Default value: PL0.
        DataDisks:                               # Optional. Data disk parameters.
          - Category: "cloud_essd"               # Optional. Default value: cloud_essd.
            Size: 40                             # Optional. Default value: 40.
            Level: "PL0"                         # Optional. Default value: PL0.
            DeleteWithInstance: false            # Optional. Default value: false.
        EnableHT: false                          # Optional. Default value: true.
    
    ### Cluster compute queue and node configurations                      # Optional. Queue configurations.
    Queues:                                      # 
      - Name: workq                              # Optional. Queue 1.
        EnableScaleOut: false                    # Optional. Specifies whether to enable scale-out. Default value: false.
        EnableScaleIn: false                     # Optional. Specifies whether to enable scale-in. Default value: false.
        MinCount: 0                              # Optional. The minimum number of nodes in the queue.
        MaxCount: 500                            # Optional. The maximum number of nodes in the queue.
        InitialCount: 0                          # Optional. The initial number of nodes in the queue.
        InterConnect: erdma                      # Optional. The network interconnect type for nodes in the queue. VPC and eRDMA are supported.
        VSwitchIds:                              # Optional. The list of vSwitches for the queue.
          - "vsw-xxxxxxx"
          - "vsw-xxxxxxx"        
        ComputeNodes:                            # Optional. The configurations of compute nodes in the queue.
          - InstanceType: "ecs.c7.xlarge"            # The instance type. Optional. Required for unmanaged clusters.
            ImageId: "m-xxxxxx"                      # The instance image. Optional. Required for unmanaged clusters.
            InstanceChargeType: "PostPaid"           # The billing method for the instance. Optional. The head node supports only PostPaid and Subscription. Default value: PostPaid.
            PeriodUnit: "Month"                      # The unit of the subscription duration. Optional. Required only if the billing method is Subscription.
            Period: 1                                # The subscription duration. Optional. Required only if the billing method is Subscription.
            AutoRenew: false                         # Specifies whether to enable auto-renewal. Optional. Required only if the billing method is Subscription.
            AutoRenewPeriod: 1                       # The auto-renewal duration. Optional. Required only if the billing method is Subscription.
            SpotStrategy: "SpotWithPriceGo"          # The bidding policy for the instance. Optional. The spot instance type. This parameter is invalid for the head node.
            SpotPriceLimit: 0.5                      # The maximum price for the spot instance. This parameter is invalid for the head node.
            Duration: 1                              # The retention period of the spot instance. This parameter is invalid for the head node.
            SystemDisk:                              # Optional. System disk parameters.
              Category: "cloud_essd"                 # Optional. Default value: cloud_essd.
              Size: 40                               # Optional. Default value: 40.
              Level: "PL0"                           # Optional. Default value: PL0.
            DataDisks:                               # Optional. Data disk parameters.
              - Category: "cloud_essd"               # Optional. Default value: cloud_essd.
                Size: 40                             # Optional. Default value: 40.
                Level: "PL0"                         # Optional. Default value: PL0.
                DeleteWithInstance: false            # Optional. Default value: false.
            EnableHT: false                          # Optional. Default value: true.   
        AllocationStrategy: "PriorityInstanceType"   # Optional. The automatic scaling policy. Can be supply-prioritized or cost-prioritized.
        RamRole: "xxxxxx"                            # Optional. The name of the RAM role used by the nodes.
        HostNamePrefix: "xxxxx"                        # Optional. The hostname prefix.
        HostNameSuffix: "xxxxx"                        # Optional. The hostname suffix.
        KeepAliveNodes:                                # Optional. The list of exception nodes.
          - compute000
          - compute001
          - compute002
    ### Cluster shared storage
    SharedStorage:
      - MountDirectory: "/home"                    # Optional. The mount directory in the cluster.
        FileSystemId: "xxxx"                       # Optional. The file system ID of the shared storage.
        NASDirectory: "/"                          # Optional. The directory of the shared file system.
        MountTargetDomain: "xxxxxx"                # Optional. The mount target.
        ProtocolType: "NFS"                        # Optional. The protocol.
        MountOptions: "xxxxx"                      # Optional. The mount options.
    ### Cluster software
    AdditionalPackages:                            # Optional. The list of software for the cluster.
      - Name: "LAMMPS"                             # Software name
        Version: "xxxx"                            # Software version
      - Name: "Gromacs"
        Version: "xxx"
    ### Cluster components
    Addons:
      - Name: "LoginNode"                          #  Component name    
        Version: "xxxxxx"                          #  Component version
        ServicesSpec: "JSON String"                #  Custom parameters for the component service.
        ResourcesSpec: "JSON String"               #  Custom resources for the component service.
  2. Go to the Cluster List page.

    1. Log on to the E-HPC console.

    2. In the left part of the top navigation bar, select a region.

    3. In the left-side navigation pane, click Cluster.

  3. On the Cluster List page, click Cluster Template.

  4. In the dialog box that appears, click Import Local Template to upload the template file you edited locally.

  5. In the Cluster Template Edit dialog box that appears, confirm that the custom template information is correct, and then click Confirm Template and Create.

  6. On the Create Cluster page, confirm that the configuration information is correct, and then click Create Cluster.

References

After you create a cluster, create users to submit jobs. For more information, see User management and Job overview.