All Products
Search
Document Center

Elastic High Performance Computing:Create a public cloud managed cluster

Last Updated:Feb 28, 2026

Create an Elastic High Performance Computing (E-HPC) managed cluster to run HPC workloads on Alibaba Cloud. In a managed cluster, E-HPC provisions and maintains the management node. You only manage compute nodes and job queues.

Important

Creating an E-HPC cluster automatically provisions resources such as ECS instances, which incur fees. For details, see Billing overview.

Cluster architecture

A managed cluster consists of three components:

  • Compute nodes: ECS instances that run jobs. Compute nodes belong to scalable queues. The number of compute nodes can grow or shrink based on workload demand.

  • Logon node: A single ECS instance with the Login addon deployed and an elastic IP address (EIP) bound for remote access.

  • Shared file system: A NAS or Cloud Parallel File Storage (CPFS) file system shared across all nodes for job and application data.

Important

Do not use the ECS console to manage nodes in an E-HPC cluster unless necessary. Use the E-HPC console instead.

For more information, see Cluster overview.

Prerequisites

Before you begin, make sure that you have:

Procedure

Step 1: Open the Create Cluster page

Go to the Create Cluster page in the E-HPC console.

Step 2: Configure the cluster

On the Cluster Configuration step, configure network, cluster type, and scheduler settings.

Basic settings

ParameterDescription
RegionRegion where the cluster is created.
Network and Availability ZoneVPC and vSwitch for the cluster. Nodes use IP addresses from the vSwitch. Make sure the vSwitch has more available IP addresses than the number of cluster nodes.
Security groupControls inbound and outbound traffic for cluster nodes. Select one of the following options: Automatically create a normal security group, Automatically create enterprise security groups, or Select Existing Security Group. The system automatically creates rules for inter-node communication. A single basic security group can contain up to 2,000 nodes. For larger clusters, use advanced security groups. See Basic security groups and advanced security groups.

Cluster type

A managed cluster separates the management node from compute nodes. E-HPC creates and maintains the management node.

ParameterDescription
SeriesSelect Managed Edition.
Deployment ModeSelect Public cloud cluster.
Cluster TypeSelect Slurm (only supported option).

Custom options

ParameterDescription
SchedulerScheduler software to deploy. Only Slurm 22 is supported.
Domain AccountDomain account service for the cluster. Only NIS (Network Information Service) is supported for managed clusters.
Domain name resolutionUse the default value.
Maximum number of cluster nodesMaximum number of nodes the cluster can contain. Works with Maximum number of cores in the cluster to control cluster size.
Maximum number of cores in the clusterMaximum number of vCPUs available to compute nodes. Works with Maximum number of cluster nodes to control cluster size.
Cluster Deletion ProtectionPrevents accidental cluster deletion. When enabled, the cluster cannot be released until you disable this setting.

Resource group

Assign the cluster to a resource group. By default, clusters belong to the default resource group. For more information, see Resource groups.

Step 3: Configure compute nodes and queues

On the Compute Node and Queue step, set up queues and compute nodes.

Compute nodes are organized into queues. When you submit a job, specify the target queue. Each cluster has a default queue named comp. To add queues, click Add more queues.

Configure the following parameters for each queue:

Basic settings

ParameterDescription
Automatic queue scalingEnable or disable automatic scaling. After you enable this feature, select Auto Grow and/or Auto Shrink to automatically add or remove compute nodes based on workload.
Queue Compute NodesSet the initial, maximum, and minimum node counts. Without auto-scaling: set the initial number. With auto-scaling: set the minimum and maximum.
Important

Setting Minimal Nodes to a non-zero value retains that number of nodes during scale-in, even when idle. Set this value carefully to avoid unnecessary costs.

Queue node configuration

Configure node specifications if auto-scaling is enabled or the initial node count is greater than 0.

ParameterDescription
Inter-node interconnectionCommunication mode between compute nodes. Options: VPC Network (standard VPC networking) or eRDMA Network (eRDMA (elastic Remote Direct Memory Access) networking, for instance types that support Elastic RDMA Interfaces (ERIs)). See eRDMA overview and Configure eRDMA on an enterprise-level instance.
Use Preset Node PoolSelect a reserved node pool to reuse pre-allocated resources during scale-out. See Use reserved node pools in clusters.
Virtual SwitchvSwitch for compute nodes. The system assigns IP addresses from the vSwitch CIDR block.
Instance type GroupClick Add Instance to select instance types. Without auto-scaling: one instance type. With auto-scaling: multiple instance types.
Important

Specify multiple vSwitches and instance types as fallbacks for inventory shortages. The system attempts to create nodes in the order of specified instance types and zones. The first vSwitch determines the initial zone.

Auto scale

Configure the following parameters when automatic scaling is enabled.

ParameterDescription
Scaling PolicyOnly Supply Priority Strategy is supported. Nodes are created in specified zones in the order of configured vSwitches.
Maximum number of single expansion nodesNodes to add or remove per scaling cycle. Default 99. Configure this parameter to control your costs on compute nodes.
Prefix of HostnamesHostname prefix that distinguishes nodes in different queues.
Hostname SuffixHostname suffix that distinguishes nodes in different queues.
Instance RAM roleRAM role that grants nodes access to Alibaba Cloud services. Select a role from the dropdown. The default AliyunECSInstanceForEHPCRole role is recommended.

Step 4: Configure shared file storage

On the Shared File Storage step, configure the file system shared across cluster nodes.

By default, the file system is mounted to the /home and /opt directories of the management node as shared storage. To mount a file system to another directory, click Add more storage.

Note

You cannot mount different file system directories to /home and /opt.

ParameterDescription
TypeFile system type: General-purpose NAS, Extreme NAS, or Parallel file CPFS.
File SystemID and mount point of the file system. Make sure the file system has sufficient mount points.
File System DirectoryDirectory of the file system to mount.
Mount OptionsMount protocol settings.

Step 5: Configure software and addons

On the Software and Service Component step, install software and configure addons.

  1. Click Add software. In the dialog box, select the HPC applications to install.

  2. Click Add Service Component. In the dialog box, select and configure an addon.

Note

Only the Login addon is supported. It is enabled by default for public cloud clusters to allow remote access over the internet.

The Login addon has the following parameters:

CategoryParameterDescription
Custom parametersSSHPort number, protocol, and allowed CIDR blocks for SSH connections.
Custom parametersVNCPort number, protocol, and allowed CIDR blocks for VNC connections.
Custom parametersWeb PortalPort number, protocol, and allowed CIDR blocks for client connections.
Addon deployment resourcesEIPEIP bound to the Login addon ECS instance for internet access. Select an existing EIP or create a new one.
Addon deployment resourcesECS InstanceInstance type for the ECS instance that runs the Login addon.

Step 6: Confirm and create

On the Confirm configuration step, verify the cluster settings and specify a name and credentials.

ParameterDescription
Cluster NameName displayed on the Cluster page for identification.
Login CredentialsAuthentication method. Only Custom Password is supported.
Set Password and Repeat PasswordPassword for the root user to log on to all nodes in the cluster.

Read the service agreement, confirm the fees, and click Create Cluster.

What's next

After the cluster is created, create a cluster user to submit jobs. See Manage users and Job overview.