Alibaba Cloud Big Computing Accelerates the Integration of HPC and AI

The discovery of various broad-spectrum drugs in human history is a long and lucky process. The discovery and manufacture of new drugs often take more than ten years, and few new drugs can be approved by FDA every year.

The outbreak of COVID-19 has brought an important opportunity. By integrating cloud computing resources, we helped scientific research institutions carry out research on COVID-19 at the first time, thus discovering the unique advantages of cloud computing. It is reported that 70% of the computing power and research and development of the world's top 20 pharmaceutical companies are migrating to the cloud.

At the beginning of the COVID-19, Alibaba Cloud immediately opened AI computing power free of charge to support anti epidemic research and development, and supported researchers to conduct drug research around COVID-19; Secondly, research on public health policy through big data will help big data system, tracking system and decision-making system; In addition, Ali's scientific research and anti-epidemic platform has been opened to the world, and has met 33 needs from 50 countries and regions.

This opportunity also makes us realize that AI is a new IT technology and new computing platform that is about to erupt.

In the past, high-performance computing supported the development of life sciences, mainly focusing on scientific research and the cultivation of scientific research teams, but it was not clear what drugs and results would be guided. Now, such demand is very obvious. We are faced with a large number of databases and compounds, and need to analyze diseases and samples, and gene sequencing.

The past applications can be divided into two categories: the first category is molecular dynamics and quantum chromodynamics based on the first principle, such as the analysis of the forces between molecules of cell composition and the interactions between compounds; The second type is gene sequencing for precision treatment, which also requires a lot of computational power. Scientists need to solve the mechanism and algorithm problems, but large-scale implementation requires engineers to solve, such as high-throughput sequencing.

The fastest growing algorithm in recent years is AI algorithm, which screens a large amount of data through AI. The problems to be solved in this process include: how to put AI algorithms and technologies on the cloud supercomputing platform? How can a large amount of data be transferred on the cloud?

In summary, the following pain points of offline supercomputing need to be solved when high-performance computing services are provided on the cloud:

① Flexible expansion is difficult: in actual business, it is often difficult to predict the demand of emergencies, so flexible expansion is very necessary.

② Low reliability: after the expansion of the scale of the computing center or physical cluster, it is impossible to guarantee 100% stability, so there will inevitably be a need for recalculation. For this demand, in addition to the stable SLA of cloud computing, the breakpoint continuation technology is also implemented.

③ Performance bottleneck: Cloud computing has broken through the GPU bottleneck of massive data for machine learning or screening. Computing that used to take weeks and months to complete can now be shortened to a few days.

④ Cost challenge: In the past, it was difficult to have both cost and computing power. Self-built supercomputing centers often had a low CAPEX, and the subsequent operation and maintenance cost OPEX was higher, which was difficult to achieve.

With the help of the cloud platform, scientists and researchers can concentrate on their own professional work and focus on applications. By applying this layer, researchers will put it on the cloud as software to enable more researchers to achieve scientific research cooperation and service sharing.

Alibaba Cloud's most basic capability is to provide flexible computing power. On top of this, the core part of high-performance computing is parallel job scheduling coupled with the computing power. It also needs to support the AI Framework. If users have their own computing resources, they can use the original computing resources on the cloud through hybrid scheduling. Most researchers are most familiar with the local environment and need to migrate their capabilities to the cloud. In addition, the life science field relies heavily on NH databases around the world, and needs high-speed interconnection, which can also be achieved through Alibaba Cloud's high-speed.

HPC application is Data go to compute, but AI is a distributed and massive compute go to data computing mode with its own ecosystem. How to combine the two? China's software companies are still underdeveloped, and new products and discoveries are difficult to be known in a short time, due to the problem of piracy and promotion. However, SaaS can be realized on the cloud, and products can be turned into services on the cloud through OpenAPI.

There are two major areas in high-performance computing that require infinite computing power, namely geophysics and meteorology and life science. This requires the flexible bare metal supercomputing cluster SCC based on DPCA to provide high-performance clusters with low latency networks and parallel file systems.

Through high-performance computing, Alibaba Cloud CPFS parallel file system has been implemented, providing HDFS distributed storage in addition to big data types on the cloud, which can meet the requirements of mass parallel throughput.

Through the "shadowless" provided by Alibaba, you can access computing resources on any end and cloud, including but not limited to PCs, mobile phones, screens, etc. You can integrate the operation of the public cloud, the application portal, and the cluster resource management behind it. On the one hand, it can be used as a virtual desktop, on the other hand, it can also be used as an application portal.

We have connected the cloud and the cloud. The cloud can be connected to the cloud through a private line, and the head node can be offline. Then we can install the E-HPC agent on the cloud to schedule resources through the job scheduler. In most cases, the task data needs to be transmitted on both sides, so it can make full use of the peaks and valleys online and offline. In addition, data stored asynchronously by NAS can be pulled from the line during job execution, which is very necessary in high-throughput computing scenarios.

In addition, you can also bury the management of computing in the head node, that is, use E-HPC as the control, and your own control is achieved by receiving agent under the cloud for computing.

E-HPC+AI is the current hot trend. The original purpose of various AI frameworks is not to solve the problem of scientific mechanism, but to solve the problems related to group thinking, such as search, promotion and advertising, but lack of mechanism model. Now, we make high-performance computing containers into images, and rapidly expand them in the deployment and calculation process, so that they can also be used for scientific research. For example, in the case of a large amount of data, we inject human experience into AI as a model, and then reduce the problem space through the machine.

In addition, in order to facilitate the use of scientific researchers, we have added E-HPC user access. The entire development and business process can be viewed from the user's perspective.

The platform integrates many visualization software, and scientific researchers can directly access it through the client (Invisible+Win&Mac). The bottom layer provides all services.

Invisible is a cloud native computer defined by software, which is equivalent to an entry point. It can be any device or screen. The scale of the data center on the cloud is far smaller than the scale of the end, and the end side is often unable to achieve too many capabilities due to the capacity of the CPU. However, if you can use the visualization part on the cloud through VID or your own protocol, you can achieve a lot of access.

In the past, the interaction of traditional computing structures was a display, keyboard, mouse, printer, and a computing storage network. In the future, you can access all the visualization software and computing resources on the cloud only through Invisible, which may be a box or an application portal on the computer. Wuying is likely to become the entrance of the future metauniverse, because all GPUs, DPUs and XPUs will enter the digital world through the service mode in the future.

In addition, users can completely control the information from being leaked. In the past, we accessed the Internet through a full-featured machine, and viruses could invade computers through the machine. And Shadowless can be configured as one-way to avoid virus intrusion.

Wuying can be used as a cloud product on any machine, such as an outdated mobile phone, and can use cloud computers to work anytime and anywhere.

Nowadays, many scientific research products are software and need to serve more scientific researchers. However, in the installation and use of their own machines, the O&M and OPEX are very high, and it is difficult to call more resources.

Therefore, we have launched the computing nest, which can quickly open all resource management of cloud computing itself, such as operation and maintenance, resource scheduling, and resource billing, to users transparently. Users only need to consider the installation work, and the rest can be completed by the computing nest.

Alibaba Cloud today released a white paper on cloud solutions and best practices in the life science industry, which mainly includes three parts: what problems the cloud can solve in the life science field, five solutions and three best practices. High-performance computing essentially hopes to help researchers focus on professional fields without spending energy on non-professional fields such as processor architecture.

The integrated solution of E-HPC and MemVerge is mainly used to help HPC gene sequencing and chip design to optimize the performance of large-memory examples. It can virtualize conventional memory and persistent memory into a large pool and scale according to specific needs.

GHDDI's R&D work increased significantly during its stay in COVID-19. Faced with an urgent demand for resources, it needs to quickly launch a batch of computing resources to support COVID-19 virus analysis, pathological analysis and other businesses. At the same time, GHDDI is a global research institution, which needs to connect domestic and overseas data to complete global cooperation research. For example, there will be web services that need to pull data through OSS. In addition, they need to be able to achieve asynchronous data pulling and asynchronous caching.

Our solutions are as follows:

◾ Use AutoDock Vina/NAMD/AI technology to conduct drug screening through docking and molecular dynamics simulation, and directly publish and share the computing results through Alibaba Cloud;

◾ E-HPC: create HPC application running environment;

◾ NAS: provide data storage;

◾ ECS/EGS: provide computing power&wiki services;

◾ 8 sets of 8-card A100 calculation support;

◾ OSS+EIP: storage and external sharing of computing results.

The demand of pharmaceutical research enterprises is often low cost and flexible, and can clearly track every workload. We developed a preemptive instance based on the needs of a pharmaceutical enterprise. There is a limited time after preempting the instance. If there is no cleanup after the timeout, the resource will be released, which greatly reduces the cost.

The research of reverse transcription needs to connect the database with overseas databases through Alibaba's high-speed network to realize asynchronous replication and high-throughput computing.

The single-chain structure template can be predicted by using AlphaFold2 in parallel with multiple CPUs. We hope to open AlphaFold2 service on the cloud to provide greater support for daily courses and training of colleges and universities.

There is great randomness in the business of scientific research institutions and pharmaceutical enterprises, so the utilization rate of resources needs more refined management.

Alibaba Cloud's goal of high-performance computing is to provide higher computing power and higher resource utilization for the scientific research industry, serve more scientific researchers, and let scientists devote more energy to professional fields to help the scientific research industry!

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us