All Products
Search
Document Center

Elastic High Performance Computing:FAQ

Last Updated:Jul 31, 2025

This topic answers frequently asked questions (FAQs) about Elastic High Performance Computing (E-HPC).

Cluster-related questions

Why am I unable to create clusters in some regions?

In some cases, you cannot create an E-HPC cluster in a region or zone where E-HPC is supported. A possible reason is that the resources required to create a cluster are not supported in the region or the resource inventory is insufficient. For example:

  • In the current region, File Storage NAS (NAS) file systems cannot be created or no NAS file system is available. In this case, you cannot mount a NAS file system on your cluster.

  • The region or zone does not have Elastic Compute Service (ECS) instance types that match the compute nodes in your E-HPC cluster, or the ECS instances are insufficient.

In this case, we recommend that you select another region to create an E-HPC cluster.

Can I manage nodes in the ECS console?

No, you cannot manage nodes in the ECS console.

Each node is an ECS instance. However, the E-HPC console provides additional deployment processes, including but not limited to the following processes:

  • E-HPC allows you to specify the number of cluster nodes for different instance types. In this case, E-HPC creates ECS instances in batches for various types of nodes.

  • After the ECS instance that corresponds to each node is created, E-HPC deploys the management system.

  • E-HPC pre-installs the selected software and dependencies on your ECS instances by using the management system.

  • E-HPC configures a job scheduler on a management node.

All the preceding processes depend on E-HPC. If you manage nodes in the ECS console, exceptions may occur in the clusters or nodes. In addition, the cluster resources may be unavailable. Therefore, you cannot manage nodes in the ECS console.

How do nodes communicate with ECS instances over the internal network?

The communication between a node and the ECS instance over the internal network depends on whether the node and the ECS instance reside in the same virtual private cloud (VPC).

  • If your node and the ECS instance that you purchase are in the same VPC, the node and the instance can communicate with each other in the VPC.

  • If your node and the ECS instance that you purchase are in different VPCs, you must connect the VPCs to allow the node and the ECS instance to communicate. You can use the Cloud Enterprise Network (CEN) or VPC to configure instance communication within or between private networks.

Why am I unable to log on to an E-HPC cluster by using SSH?

You may be unable to log on to an E-HPC cluster by using SSH due to various reasons. Troubleshoot the issue based on the actual scenario.

  1. Check whether the username or password is valid.

  2. Check whether the on-premises network of the client or carrier network is connected.

  3. Check whether the security group rule of the logon node allows access to the required ports, such as the default port 22 of SSH.

  4. Run the iptables -nvL --line-number command to check whether a firewall is enabled or firewall rules are configured for the logon node.

If the issue persists, you can use Virtual Network Computing (VNC) to connect to the cluster. For more information, see FAQ about connecting to ECS instances.

Why does it take so long to log on to an E-HPC cluster equipped with NIS by using SSH?

Problem description

  • It requires a long period of time to log on to a node by using SSH or switch between nodes. In some cases, the logon fails.

  • I cannot restart sshd and the error message Failed to activate service 'org.freedesktop.systemd1': timed out is returned.

Cause

This is a bug of systemd that may occur when Network Information Service (NIS) is used.

Solution

  1. Log on to the node as a root user.

  2. Check the content of the /etc/nsswitch.conf file:

    cat /etc/nsswitch.conf

    If [NOTFOUND=return] is not displayed in the passwd, shadow, and group parameters, perform the following steps. Sample code:

    passwd:     files sss nis
    shadow:     files sss nis
    group:      files sss nis
  3. Optional. Upgrade glibc:

    yum update glibc
  4. Update the nsswitch configuration file.

    1. Open the nsswitch.conf file:

      vim /etc/nsswitch.conf
    2. Modify the following content in the nsswitch.conf file and save the file:

      passwd:      files sss nis [NOTFOUND=return]
      shadow:      files sss nis [NOTFOUND=return]
      group:       files sss nis [NOTFOUND=return]

How do I perform account verification to purchase Alibaba Cloud services in the Chinese mainland?

If you want to purchase and use Alibaba Cloud services in the Chinese mainland, you must complete account verification. Then, you can use existing resources, purchase resources, and renew resources. If you select a region in the Chinese mainland when you purchase Alibaba Cloud services, the system checks whether you completed account verification. If you did not complete account verification, an error message appears on the buy page. For more information, see When is account verification required on the Alibaba Cloud international site?.

Image-related questions

Why is the image for the custom service node of the cluster CentOS 7.6?

When you create an E-HPC cluster, the default operating system for the custom service node is CentOS 7.6. You can select a different image based on your needs. For an existing E-HPC cluster, you can change the image by uninstalling and then reinstalling the custom service of the management cluster.

  1. Go to the E-HPC console, find the cluster that you want to modify, and then click the cluster name to go to the cluster details page.

  2. On the cluster details page, click Custom Service, find the node that you want to delete, and then click Delete on the right.image

  3. Click Add Custom Component, select Login (Version 1.0), and click Edit Instance Specifications in the ECS Instance section.image

  4. After selecting the instance specifications, click OK to add the custom component.

What types of images are supported?

An image includes the operating system and configuration data for your business. It is used to create the Elastic Compute Service (ECS) instances that make up an E-HPC cluster. E-HPC supports the following types of images: public images, custom images, and community images. For more information, see Overview.

Important
  • Alibaba Cloud Linux, CentOS, and Ubuntu public and custom images are supported. The image types that you can select vary based on the region, instance type, and whether the current Alibaba Cloud account has available image resources. All available image types are displayed in the console.

    You can call the ListAvailableImages operation to learn the supported images.

    OS type

    Supported versions

    Alibaba Cloud Linux

    Alibaba Cloud Linux 3 Pro

    Alibaba Cloud Linux 3

    Alibaba Cloud Linux 2

    CentOS

    CentOS 7.2 to 7.9

    Ubuntu

    Ubuntu 22.04

    Ubuntu 20.04

  • The schedulers, domain account services, and supported shared storage and software vary based on images.

  • Public images: images provided by Alibaba Cloud.

  • Custom images: images created from ECS instances or snapshots, or images imported from your on-premises computer.

  • Community images: images that are released on the image platform of Alibaba Cloud Community.

Why am I unable to select a custom image?

When you create or scale out an E-HPC cluster, or configure an auto scaling policy, you may be unable to select a custom image due to the following reasons:

  • Your Alibaba Cloud account does not have a custom image in the current region. For more information, see Overview.

  • The operating system of the custom image is not supported by E-HPC.

  • The instance type of the node does not support the custom image.

  • When you configure the auto scaling policy, you must specify the same image in the global settings and the queue settings.

Why am I unable to scale out or create an E-HPC cluster by using a custom image?

When you scale out or create an E-HPC cluster by using a custom image, the cluster may fail to be scaled out or created due to the following limits:

  • You cannot modify the yum source configurations of the operating system in the custom image.

  • The /home directory or /opt directory cannot be the mount directory of the custom image or a symbolic link.

  • If the /etc/fstab file of the custom image contains the mounting information about a file system, such as nfs, make sure that the cluster can access the file system or reside in the same VPC as the file system. Otherwise, you must delete the mounting information from the /etc/fstab file before you scale out or create the cluster.

  • You must keep the group whose account group ID is 1000 in the custom image.

  • The size of the system disk must be greater than or equal to that of the custom image.

Can I import custom images?

You can import only CentOS images to E-HPC. For information about how to import an image, see Import an image.

When you import an image, make sure that Check After Import is selected. Otherwise, the image may not be recognized in the E-HPC console.

导入镜像..png

Software-related questions

How do I manually install business software in an E-HPC cluster?

E-HPC clusters use File Storage NAS to share data between compute nodes. Therefore, you can manually install business software by using one of the following methods:

  • Install business software in the /opt directory. In this case, all cluster users can access and use the business software.

  • Install business software in the home directory of a cluster user. Generally, only the cluster user can access and use the business software.

Important

When you install some software, you must also install drivers or specific runtime environments, such as GPU drivers and YUM packages, on each compute node. After you install software on a compute node, you can use the custom image that is created based on the compute node to add more compute nodes. This way, the software can be automatically installed on all compute nodes.

Auto-scaling node-related questions

The instance has been released, but I cannot delete the node in the console.

Problem description: For using spot instances for auto scaling, if an instance is upon expiration while it still has unfinished tasks, the scheduler might fail to delete the instance. This results in a node deletion failure.

Solution: In an auto-scaling scenario, nodes are cleaned up after a while. Wait for the nodes to exit the BusyNodes status and can delete them.

Storage-related questions

How do I configure a remote mount directory for a NAS file system?

When you create an E-HPC cluster, you can specify a mount target and remote mount directory for a NAS file system. For example, your cluster has the following settings:

ClusterId=ehpc-mrZSoWf**** # The ID of the cluster.
VolumeMountpoint=045324****-m****. cn-hangzhou.nas.aliyuncs.com# The mount target of the NAS file system.
RemotePath=/ # # The remote directory of the NAS file system. 

To mount the NAS file system on the nodes when you create an E-HPC cluster, perform the following steps.

Note

You can specify a remote mount directory based on your business requirements. Before you mount a NAS file system, you must create a mount target and a mount directory.

  1. Create two subdirectories in the root directory:

    /ehpc-mrZSoWf****/opt
    /ehpc-mrZSoWf****/home
  2. When or after you create an E-HPC cluster, configure the remote directory based on your business requirements.

    The following sample code provides an example on how to mount different directories. For more information, see Create a cluster by using the wizard and Manage storage resources.

    / # Mount the NAS file system on the /ehpcdata directory.
    /ehpc-mrZSoWf****/home    # Mount the NAS file system on the /home directory.
    /ehpc-mrZSoWf****/opt # # Mount the NAS file system on the /opt directory.

Quota-related questions

What is the maximum number of clusters that I can create?

You can create up to three clusters in a region. To increase the quota, submit a ticket.

What is the maximum number of nodes that I can create in an E-HPC cluster?

You can create up to 500 nodes in an E-HPC cluster or add 500 compute nodes at the same time. To increase the quota, submit a ticket.

Permission-related questions

What is role-based authorization?

Resource Access Management (RAM) provides a service-linked role named AliyunServiceRoleForEHPC for E-HPC. This role is used to authorize E-HPC to access associated cloud resources. E-HPC can assume the AliyunServiceRoleForEHPC role to access ECS, VPC, and NAS.

If the AliyunServiceRoleForEHPC role is not attached to your account, you must first complete role authorization. For more information, see Manage a service-linked role.

Why am I unable to log on to the console to view cluster information as a RAM user?

If the RAM user is not granted the AliyunEHPCReadOnlyAccess permission, the Switch to RAM for authorization message appears. You must grant the AliyunEHPCReadOnlyAccess permission to the RAM user. Then, you can view cluster information as a RAM user.

To create an E-HPC cluster, user, or job, you must grant the AliyunEHPCFullAccess and AliyunNASFullAccess permissions to the RAM user. For more information, see Grant permissions to a RAM user.