To meet increasing demand for HPC capability, Alibaba Cloud offers Elastic High Performance Computing (E-HPC) as a HPCaaS public cloud service. Based on existing infrastructure, this product provides users with an all-in-one high-performance computing service cloud platform.
Alibaba Cloud E-HPC supports: - Infrastructure as a Service (IaaS) with high-performance CPU and heterogeneous computing GPU instances, - Platform as a Service (PaaS) with a high-performance computing software stack, and - Software as a Service (SaaS) with application template customization.
E-HPC is applicable to organizations in education and scientific research fields with the demand for large-scale computing capabilities. E-HPC supports applications such as HPC, AI, and large-scale data analysis.
Compared with traditional supercomputing centers and self-built HPC clusters, E-HPC provides an accessible, elastic, secure, interconnectable high-performance computing service on the public cloud.
E-HPC allows you to create ECS/EGS computing clusters and cluster managers and deploy high-performance computing environments and application programs with one-click. This allows users to quickly develop applications with superior computing capability and ease their computing burden.
E-HPC increases or reduces computing nodes in the cluster manager based on requirements or task queue usage. E-HPC can automatically identify load performance requirements during runtime to enable elastic computing of cluster nodes.
E-HPC is protected by multi-tenant security isolation of the highest level that is provided by ECS, EGS, and VPC.
E-HPC seamlessly integrates with other products and services of Alibaba Cloud via the user console.
E-HPC compared wth traditional supercomputing centers
You can purchase E-HPC resources as they are required. E-HPC is ready for use upon purchase.
E-HPC is available in a range of options. You can select the most appropriate computing resource ratio (for example, EGS instance) according to your application needs.
E-HPC fully integrates with Alibaba Cloud products, greatly improving data security and ensuring high availability.
E-HPC compared with self-built HPC clusters
E-HPC assists you to save greatly on capital expenditure, including but not limited to the following costs: hardware system, software license, server room construction, electric power consumption, and cooling, as well as daily O&M.
E-HPC frees you from the need for hardware upgrades.
With robust scalability, E-HPC can integrate with all Alibaba Cloud products.
E-HPC compared with self-built cloud server clusters
Provides a user-friendly software management service covering all supporting software stack functions of HPC applications. E-HPC can be upgraded along with the HPC community (OEM/OSV/ISV/user/developer/open-source community).
Optimizes software license scheduling to help you save costs.
Provides a service for automatically scaling of cluster nodes.
The following table lists the instance configurations recommended for E-HPC. NOTE: The instances listed as follows are the instances with the highest specification configurations for each model. When you create a cluster, you may choose from available configurations.
|ecs.sccgn5.16xlarge||64 vcores E5-2682 v4 2.5GHz||512GB||Nvidia Tesla P100 x 8||25 Gbps RoCE + DPDK|
|ecs.gn5-c8g1.4xlarge||16 vcores E5-2682 v4 2.5GHz||120GB||Nvidia Tesla P100 x 8||25 Gbps|
|ecs.scch5.16xlarge||64 vcores Skylake Xeon Platinum 3.1 GHz||192GB||N/A||25 Gbps RoCE + DPDK|
|ecs.c5.16xlarge||64 vcores Skylake Xeon Platinum 8163 2.5GHz||256GB||N/A||20 Gbps|
|ecs.sn1ne.8xlarge||32 vcores E5-2682 v4 2.5GHz||64GB||N/A||10 Gbps + DPDK|
E-HPC is billed for the following items: ECS instances, E-HPC service, Network Attached Storage (NAS), and external Internet traffic of login nodes.
NOTE: If you have a large cluster, you can contact us by submitting a ticket in order to purchase at a discounted rate.
Cost of ECS Instances
When you activate an E-HPC cluster, each node in the cluster is an ECS instance and the price varies depending on the selected ECS instance hardware configuration. When you create an E-HPC cluster, purchase of Alibaba Cloud ECS is automatically completed. You don't need to prepare ECS instances in advance. If you are entitled to a discount for ECS, you can enjoy the same discount when buying ECS instances for E-HPC activation.ECS Pricing
Cost of E-HPC Service
E-HPC provides multi-dimensional management services for clusters, including elastic adjustment of node quantity, OpenAPI support, monitoring and alerting, O&M tools, and automatic server O&M in the background. E-HPC is free of charge during the present beta release phase.
Cost of NAS Space
NAS is used as the shared storage space for all nodes in an E-HPC cluster. For pricing details, please refer to NAS product documents.
Cost for External Network Traffic of the Login Nodes
After an E-HPC cluster is created, login nodes are bound with EIPs and allocated public bandwidth by default. Traffic is billed by usage per hour. Fees are only charged for outbound traffic. Inbound traffic is free.
For example, if you use 10GB of outbound public traffic in an hour, the charge is 10GB × price per GB ($ amount per hour). Traffic fees vary from region to region.EIP Billing
Total Cost for E-HPC cluster
The total cost of an E-HPC cluster is determined by the total price displayed on the price calculator when you create the cluster.
HPCaaS Cloud Computing Platform
Flexible Creation and Agile Scaling
Based on Alibaba Cloud's ECS, EGS, VPC and NAS products, you can create parallel computing clusters in any scale via E-HPC with ease. You can also dynamically adjust the nodes of the cluster and the storage according to your needs.
No queueing: Once you create the E-HPC cluster, you will receive the corresponding amount of ECS/EGS/NAS and related RDS/OSS services immediately.
Elastic scaling: After the initial creation of the E-HPC cluster, users can dynamically scale the cluster via scheduling, without limits imposed by physical clusters.
Secure isolation: O&M space and user space are separated. Data isolation between multiple users is guaranteed by ECS/EGS/VPC native isolation features.
Using E-HPC Via the Management Console
The E-HPC management console provides a web-based user interface that can be used to access and create E-HPC clusters. Using the E-HPC management console, you can provision, create and configure E-HPC corresponding components (nodes/NAS/users etc.).E-HPC Console
For more details about how to create and use the E-HPC cluster via the management console, please refer to:E-HPC Quick Start
Please refer to the E-HPC documentation for more details about how E-HPC operates, as well as how to adjust and manage E-HPC components.E-HPC User Manual
E-HPC API Reference
You can simply use E-HPC APIs to efficiently provision and manage E-HPC clusters.E-HPC API Overview
The following links for the E-HPC documentation, API, and related resources will help you to leverage all E-HPC features and its enormous computational power.
How to login to the E-HPC cluster?
After completing E-HPC cluster creation, a corresponding EIP is allocated for each login node.
Click the 'node' tag in the left panel of the E-HPC management console. Select the cluster name from the 'cluster' drop-down menu. Then select the 'Login node' from the node type drop-down menu. The corresponding EIP address of the login nodes will be displayed in the 'IP addr/ID' section of the table.
With the preset root password, users can access the login node via any SSH client. Bash is the set login shell by default.
Why can I not create an E-HPC cluster in certain regions?
Generally, failure of E-HPC cluster activation in certain regions and zones is caused by the following factors:
NAS is not deployed in the region; therefore, shared storage cannot be created for the E-HPC cluster.
The region and zone do not have the ECS instance type in line with the computing node configuration of the E-HPC cluster (for example, no GPU instance).
The region and zone do not have sufficient resources for creating the nodes required by the E-HPC cluster.
Select another region for cluster creation if no special requirements are given.
Why shouldn't I manage E-HPC cluster nodes via ECS the management console?
Though E-HPC clusters are created based on ECS, E-HPC has additional deployment procedures, including but not limited to:
E-HPC activates ECS in batches for all node types based on the predefined node ratio and the specified instance configuration. For example, the ratio of control nodes to login nodes to computing nodes is 4:1:n when high availability is activated.
E-HPC deploys a control system after ECS is activated for all nodes.
E-HPC uses the control system to pre-install the selected software and dependent packages.
E-HPC configures the server and client of the job scheduler based on node types.
The preceding operations and related services depend on the E-HPC cluster deployment procedures. If you operate nodes in the ECS console, exceptions such as cluster creation failure, nodes offline, and scheduling system failure may occur. Therefore, you are advised not to use the ECS console to operate E-HPC cluster nodes, unless for troubleshooting purposes.
For special requirements, you can utilize the ECS console for node operation only under the guidance of E-HPC development and maintenance engineers.
What is high availability?
High availability (HA) refers to an E-HPC feature where a single-node failure does not cause the failure of the entire cluster. Both the cluster master node (scheduling node) and domain account management node support high availability configuration. For example, the PBS pro scheduling node and the NIS server node are both duplicated as a master node + slave node, and when any of the master nodes in an E-HPC cluster fail, the cluster automatically switches to the corresponding slave node.