Talk about whether HPC can trigger the next technology boom-Alibaba Cloud Developer Community

The expansion of demand is pushing forward technological changes. Technological changes have further enriched our lives. Nowadays, big data and cloud computing technologies are exhausted. However, cross-border competition and technological integration force us to constantly change blood and replenish energy to adapt to changes again and again. What is the next technology hotspot that can be predicted after the technological boom of artificial intelligence, virtual reality, and the Internet of Things? High Performance Computing (HPC)? Facing the current situation of technology and industry integration, HPC should be perfect. In addition, I also prefer to regard HPC as a technology that is closely integrated with various industries.

The HPC mentioned here is not only a simple pile of computing, network, storage, HPC software and other devices, but also focuses on the development of HPC, mainstream players, technology trends and the future of HPC. Looking back on history, traditional HPC mainly focuses on six scenarios, including simulation, physical chemistry, life science, rendering, exploration and meteorology. The specific application environment of upper HPC cluster is also relatively fixed.

With the development of big data, cloud computing and technology and the integration of the industry, HPC has also undergone some changes in its classification at various levels. According to the target market of HPC applications, it can be divided into HPC commercial market and HPC scientific research market.

However, I personally think this division is relatively too broad. The industry adopts a large number of easy-to-understand division methods based on the traditional HPC (mainly in the above six scenarios),HPDA high-performance data analysis, HPC Anywhere, and HyperScale.

HPC Anywhere also integrates HPC with the cloud, such as Panzura, Ctera, Avere, Nasumi, and other Cluster File Gateway Storage Vendors. They provide a high-speed local distributed NAS system for docking AWS, public Cloud Object Storage systems such as Azure and some low-speed NAS products provide a buffer layer through these gateways. You can set policies to allow data to flow between gateways and other storage systems, NAS or Object storage is directly connected to the cluster gateway or even to the public cloud.

HPC systems involve storage, computing nodes, networks, HPC software, L1 layer cooling, data centers, power supply, and other complex devices. However, from a technical point of view, server and network standards are relatively unified, the products designed by each manufacturer are basically at the same level in other aspects except for large differences in management. However, the storage device is the easiest to improve the competitiveness of the solution due to different standards.

HPC storage is a dedicated storage solution to solve the performance bottleneck of traditional serial storage in HPC application environments. The capacity, performance, and IOPS of the HPC storage side are weakly related to the size and performance of the HPC computing side. HPC storage focuses on cost performance, low cost, and occupied space. In typical HPC application scenarios, common business models are as follows:

IOPS is actually the same as OPS in the IOR testing tool. The IOR tool is used in HPC BenchMark testing. The main reason is that the IOR is suitable for both bandwidth and OPS, A variety of parameters are provided to simulate different business I/O models.

If we look back at the development history of HPC storage technology, regardless of the classification, we can see that a few years ago, HPC architecture was a typical 3-layer architecture, that is, computing node memory, parallel file systems and archive storage. Parallel file systems have the greatest impact on HPC performance, which in a sense determines the storage performance of pFS and even the entire HPC. Therefore, for ultra-large-scale HPC clusters, when thousands of computing nodes need to be Checking Point simultaneously, generally, parallel file systems based on NL_SAS disks are a little overwhelmed, so you need to add a layer of high-speed and large-capacity (relative to Memory) Cache to pFS.

The emergence of Burst Buffer technologies and products has changed the HPC computing architecture. Campaign storage is like hot data backup, which provides more options for data lifecycle management. Personally, I think Burst Buffer technology is just a technical transition. If Hybird storage improves system performance, however, SSD is still quite expensive at present, burst Buffer can better meet the ultimate performance requirements in ultra-large HPC scenarios. When the ratio of performance to capacity space is between 20MB/TB and 200MB/TB, Burst Buffer can be said to be very suitable. With simple configuration adjustment, there is basically no capacity or performance overconfiguration, moreover, it can bring the value of SSD into play.

If there is no Burst buffer, pFS is required to host all performance layers, and pFS is required to host memory data during Checking Point. Another function of Burst Buffer is that under the surge IO model, burst Buffer can be used as a high-performance layer to provide performance together with pFS, and small I/O can be merged and optimized. At present, DDN, Cray, EMC, etc. have been supported, and IBM will also support them soon. For Burst buffer solutions of DDN, Cray, and EMC, please refer to my previous articles Burst Buffer why technologies are so popular in HPC.

Let's take a look at the main players in the HPC industry, mainly at servers and storage. Server manufacturers basically have the same market share in the HPC industry as the entire server product. Technically, the server has nothing to say. The performance of adding memory, CPU, and interface card will be improved. It is also easy to improve the density through high-density nodes.

HPC storage vendors are divided into three types: server vendors, traditional storage vendors, and storage vendors focusing on the HPC industry. We can see from these vendors that although IBM is a server or a traditional storage vendor, with the help of GPFS, its market share is relatively large. As Lustre gives up the Enterprise Edition and brings uncertainty to customers, the market share of GPFS will also rise. Currently, DDN has achieved Burst Buffer IME, leading the HPC industry in many scenarios with high performance, NVMe SSD, and high density.

Now, let's take a look at Burst Buffer IME DDN products. IME products support three types.

IME240 uses 2U commercial servers. A single product provides 20GB bandwidth, 48 NVMe SSD, 800GB bandwidth, and 1.8TB bandwidth. Five IME240 instances, with 1.8TB of full disk configuration, provide 100GB of bandwidth, 300TB of capacity, and 80% of capacity usage.

IME14KX is based on a dedicated SFA14KX platform and a 4U device. It supports flexible configuration of NVMe disks. Up to 48 disks are supported. The performance ranges from 10Gb/s to 50Gb/s of full configuration. The scalability is the same as that of IME240. It can be expanded to 32 nodes with a performance of 1.6TB/s.

IME SOFTWARE-ONLY is a pure SOFTWARE product. It can be flexibly deployed on existing hardware.

In fact, I personally think that in terms of market share and project distribution of HPC systems, ultra-large scale and small scale should be half, but in the enterprise market, small and medium-sized customers are usually accessible to integrators or agents, while manufacturers can directly contact or look at the ultra-large HPC market. Therefore, it is particularly important for manufacturers to follow the pace of HPC technology. The manufacturers I mentioned above that already have and will soon have Burst Buffer are all leading HPC industry and technology manufacturers, such as DDN and IBM.

Panasas and Seagate are two professional storage vendors focusing on HPC industry. At present, they have not made great efforts in Burst Buffer, but they have been making unremitting efforts in NL_SAS disks and HPC proprietary storage. Panasas technology attracted me from the very beginning with HPC. ActiveStor 8, 9, 11, and 12 adopt data control separation. The built-in controller of the disk can maximize the performance of HDD disks, unfortunately, for some reason, Panasas has faded out of our view in recent years.

As for Seagate, it is a perfect example of combining density and HDD. ClusterStor can provide 84 disk capacity and 16 Gb/s performance in 5U space. From the latest publicity, the performance has doubled. Although they do not use SSD, they become an example that can completely play with HPC.

As for the development of storage, we have a practice that has been highly praised in the industry for reference, that is, the importance of open source. From Linux,OpenStack,Ceph to Lustre, BeeGFS, etc. Their success shows that the charm of openness is strong. The more participants, the more stable the products will be and the customers will recognize them. Embrace Open Source, customize based on your own advantages and understanding, and walk out of your own path. DDN, Seagate, and many HPC solution providers benefit from this. Looking back on HPC parallel file systems, there are only a handful. The mainstream are Lustre, GPFS, and BeeGFS. Of course, Glustre, Ceph, and enterprise NAS will also appear in HPC.

Lustre parallel file system in ultra-large scale (especially the super computing center) and science and education and other cost-sensitive fields, let's talk about Lustre's architecture briefly.

No matter how big the Lustre file system is, metadata and management node (MMU) are basically fixed. You only need to adjust the capacity of metadata storage (MGT & MDT) according to the system capacity configuration, data storage unit SSU is a basic storage unit, which can be scaled out on demand to achieve capacity and performance expansion.

GPFS has a huge share in industries and other systems that are insensitive to costs and require high system stability. Personally, I think IBM OEM out cooperation strategy makes up for the shortage of closed source. BeeGFS mostly uses servers to build file systems, mainly in European scientific research, university institutions, small and medium-sized super computing scenarios.

Looking ahead, what is the way out for HPC? Please refer to the previous articles on HPC technology trends HPDA, deep learning, and software definitions to learn about HPC trends and developments. HPC combined with big data is a direction. Currently, HPDA has been well integrated in the field of big data and HPC. For example, Lustre supports docking Hadoop and deploying Hbase and Hive. Big data has a wide range of applications and obvious cross-border integration. For example, for smart cities and massive video analysis, HPC is required to integrate with Hadoop for data sharing and computing resource sharing. IoT sensor data analysis; and small file scenarios such as machine learning, deep learning, gene analysis, financial analysis, and energy analysis Scratch.

HPC Anywhere combines HPC and cloud computing, and on-Cloud HPC implements HPC resource service. The HyperScale of HPC in Internet distributed applications will bring unlimited business opportunities to HPC. The Burst Buffer we discussed earlier, focusing on HDD and professional HPC storage, and embracing the open source trend are all intended to be used for reference by HPC manufacturers, urgent need to do is be HPC products planning layout to meet new era.

Author: architect Technical Alliance

source: 51CTO

Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now