A cluster is a set of nodes capable of high-performance computing. Compared with a single node, a cluster provides high performance, auto scaling, high stability, and high reliability. A cluster contains resources such as an elastic IP address, a scheduler, cluster nodes, domain accounts, cluster users, and runtime software. This topic describes the terms that are related to a cluster. This topic also describes the features of a cluster.
Elastic High Performance Computing (E-HPC) provides two deployment modes for a cluster, including Standard and Tiny. Each cluster contains management nodes, compute nodes, and a logon node. Each node is an ECS instance.
Standard: The logon node, management nodes, and compute nodes are separately deployed.
Tiny: The logon node and management nodes are deployed on one node. Compute nodes are separately deployed.
The following table describes the features of the nodes.
The logon node is the only node that an ordinary cluster user can manage. You can debug, compile, and install software, and submit jobs on a logon node.
Management nodes include scheduler nodes and domain account nodes.
Compute nodes are used to run high-performance computing jobs.
Creating: The cluster is in its initial state, indicating that an ECS instance is being created.
Uninitialized: The cluster is being installed.
Initializing: The cluster is being initialized. The root user is being initialized at the same time.
Running: The cluster is running after it is created. You can use the cluster only when it is in the Running state.
Exception: A cluster enters the Exception state when management nodes are deleted or stopped, or the scheduler is logged off. You can try to restore the cluster. If the cluster fails to be restored, submit a ticket.
Releasing: The cluster is being shut down and released.
An image provides the information that all cluster nodes require. E-HPC allows you to create a cluster based on a public image, custom image, or shared image. Different images specify different schedulers and software. Some images do not support software. For more information, see Image overview.
Schedulers are used to schedule jobs. E-HPC supports multiple schedulers. However, different schedulers apply to different image types. The console displays the schedulers supported by the specified image type.
PBS, PBS 19, Slurm, Slurm 19, Slurm 20, Open Grid Scheduler, and Deadline
CentOS 6 has reached its end of life (EOL). The Linux community no longer maintains this operating system version. We recommend that you upgrade your operating system to CentOS 7 or later. For more information, see Change the CentOS 6 source address.
PBS, Grid Engine, Cube, and Deadline
Windows Server 2019, Windows Server 2016, Windows Server 2012, and Windows Server 2008 (64-bit)
To submit, debug, and run jobs on a cluster, you must create a user. You can grant two types of permissions to users when you create the users. For more information, see Create a user.
E-HPC provides the following types of permissions for users:
Ordinary permissions: applicable to ordinary users that only need to submit and debug jobs.
Sudo permissions: applicable to the administrator who needs to manage the cluster. In addition to submitting and debugging jobs, users who have sudo permissions can run sudo commands to install software and restart nodes.Note
You can create a root user only when you create a cluster. We recommend that you do not use a root user to submit jobs. Otherwise, cluster data may be damaged due to job script misoperations.
Domain account service
E-HPC supports NIS and LDAP.
Network Information Service (NIS) provides centralized identity management. You can create a user on the NIS server. After a new node is added to NIS, you can use the user to log on to the node without the need to create a user on each node.
Lightweight Directory Access Protocol (LDAP) is used to authenticate E-HPC users. You can authorize and group users by using LDAP so that users have different permissions.
E-HPC provides mainstream computing applications, runtime libraries, and MPI libraries. You can install the software based on your business requirements. For more information, see Manage cluster software.