Demystify the technology behind the ECS Yitian instance

01 Background: The demand for computing power is soaring, and Moore's law is invalid

At present, the demand for business on the cloud of enterprises is growing rapidly, and the demand for computing power is increasing "explosively".

In the live/short video industry, hundreds of millions of users produce UGC video content and publish it on different platforms every day, resulting in the demand for mega-core video coding power and high business costs. Molecular dynamics simulation, gene sequence comparison, and protein structure analysis involved in the gene/pharmaceutical industry were originally based on experiments. Today, computer simulation is mainly based on the behavior of atomic-molecular units, which consumes a lot of computational power. The e-commerce industry has changed from traffic oriented to AI reasoning and big data portrait required in lean operation, and the demand for computing power for intelligent and accurate recommendation is also growing. The scale of computing power represented by AI will double every three and a half months.

However, while the demand for computing power is soaring, the evolution speed of Moore's Law is slowing down, and the dividend of hardware technology progress is bottoming out.

Today, the power consumption and cost of each iteration of CPU, server and data center rise, and the power consumption of each kilowatt chip brings tens of thousands of dollars in the life cycle; The cost of hardware and chips is also rising generation by generation.

In multi-tenant scenarios such as the cloud, the problem of Hyper-Threading (HT) architecture is gradually exposed. It is difficult to meet the business requirements in the face of some high-density computing tasks. The mechanism of shared memory and physical core causes that the processing tasks between tenants may need to queue each other, resulting in a significant decline in performance; Or mutual interference may cause performance fluctuation.

How can we achieve low power consumption and low cost while achieving high performance (reducing interference)? We believe that the previous technical solution cannot solve the business needs and pain points. We need to design a cloud native chip, combined with the existing Alibaba Cloud software and hardware architecture, to better address the needs of customer applications.

Let's take a look at the results first: In the encoding and decoding scenario, the ECS Rely on Heaven instance has achieved 80% price performance improvement compared with the traditional instance, and the database scenario, AI reasoning scenario, and big data scenario have also achieved 30%, 70%, and 50% price performance improvement respectively. At present, ECS Yitian instance has been applied to the core business of Alibaba Group and serves scientific research, smart phone industry and many well-known Internet companies. During the "Double 11" period in 2021, the core trading system of Tmall's "Double 11" will be smoothly migrated to the "Yitian 710" cloud instance, and the cost performance ratio of computing power will be increased by 30%; The advertising reasoning business of Huiliang Technology uses the Yitian 710 cloud instance, which improves both the performance and network bandwidth, and the cost performance ratio increases by more than 40%.

02 "Yitian" sword comes out of its sheath: cloud native ECS architecture with soft and hard integration

How does the ECS Rely on Heaven instance achieve a significant price performance improvement? We share with you from different levels of ECS product architecture, including cloud native processor Yitian 710, cloud computing infrastructure processor CIPU, and cloud native application optimization scheme ECS Booster.

Yitian 710: high performance, low power consumption

Let's first look at how the design of the Yitian 710 processor solves the above problems.

From the chip level, the four major factors that mainly affect the application performance include ALU (logical computing unit), Cache, main frequency, and acceleration instructions.

First of all, Yitian 710 has achieved 128-core high-density design with a single CPU, and high-specification instances can achieve linear performance growth; At the same time, the processor has no hyper-threading concept, which avoids the problem of competing for performance: exclusive physical core, more powerful performance; Exclusive cache makes application cache more efficient.

The x86 architecture is that two vCPU/HT share one physical core and one ALU (arithmetic logical operation unit). The ECS Rely on the Sky instance uses the exclusive physical core mode, which allows computationally intensive computing instructions to be faster without queuing and competing.

In the cache dimension, in the past, two vCPUs/HTs shared the primary and secondary caches and competed with each other, resulting in severe performance fluctuations. The design of exclusive cache of Reliance CPU is adopted to ensure that the vCPUs do not affect each other and bring higher performance for heavy-load computing.

The key factor affecting computing power performance is not only the core resources, but also the dominant frequency.

Do you know why the safe water level of CPU utilization in most Web, App and DB production businesses is 50%, and the daily water level is lower than 30%?

Take the video coding in the following figure as an example. After more than 4 channels are concurrent, the performance decreases by 40%; In addition to the nuclear competition mentioned above, if the customer's actual business exceeds 50-60% of the water level, the response of key production applications will slow down, and the customer will feel stuck or even overtime. Therefore, it is necessary to keep the safe water level of CPU utilization low, sacrifice costs to ensure safety, and waste another 50% of resources.

The reason behind this is that x86 has high power consumption, and high computational load is easy to cause excessive power consumption and temperature rise. Therefore, frequency reduction is adopted to avoid it, thus affecting performance. The power consumption of Yitian 710 is 1/6 of the mainstream x86, and there is no frequency reduction problem. It is also recommended that the safe water level of Yitian can be raised to 70-80% to reduce resource waste.

On the cloud multi-tenant sharing platform, even if users run low-load applications, there is also a risk of mutual influence. Yitian has completely solved this problem. As can be seen from the case shown in the figure below, when there are more than four channels of video encoding and decoding, the orange part representing the sky is basically constant, while the x86 will decrease by 40%.

It is worth mentioning that Yitian 710 has also been accelerated and optimized for specific algorithm scenarios. For example, vector computing technologies such as NEON and SVE can enable a single instruction to process longer data, and can significantly improve scene performance such as machine learning, video coding and high-performance computing; In addition, Yitian instance also supports BF16 and INT8. In the machine learning scenario, it greatly improves the computing efficiency and provides customers with more choices.

CIPU-centric architecture: high density, stability and strength

In addition to the capabilities of the chip itself, in order to reduce costs and increase efficiency, the instance of Reliance ECS is designed based on the cloud native hardware architecture.

Traditional servers are often designed as 2-way or 4-way servers, which can improve the CPU density of the whole machine through multiple NUMA interconnections, allowing an OS to schedule more CPU power, but also increasing the complexity. In this architecture, with the increase of the number of cores, the network and storage IO are also rapidly doubled, and the consistency across NUMA caches is also to be maintained, resulting in the decline of application performance; At the same time, it also brings the problem of too large explosion radius. In the cloud computing scenario, the multi-channel design will make the impact range of local hardware failures larger.

Alibaba Cloud is redesigned with the idea of cloud origin. A single CPU of Yitian 710 CPU realizes the high-density design of 128 cores. At the same time, the hardware architecture centered on the CIPU connects two or more chips of Yitian through the CIPU. Under the NUMA scheme, the overall core density is higher, avoiding the performance degradation caused by the cross NUMA. At the same time, the high density of the overall unit brings the cost reduction, making the example of Yitian more competitive. At the same time, the hardware model design of multiple and single channels reduces the explosion radius by half, and the product is more stable.

In addition, the CIPU hardware itself is also an innovative design. By offloading virtualization and IO forwarding and other data planes to dedicated hardware for acceleration, the original virtualization loss and performance contention are eliminated, and IO is greatly accelerated, which will also improve the overall performance; The VPC environment supports elastic RDMA acceleration, which reduces the TCP latency by more than 70%.

ECS Booster

ECS Booster is a software performance optimization scheme provided by Alibaba Cloud on the Yitian instance. Through network interruption optimization, operating system optimization, application layer optimization and other technologies, it is optimized for mainstream scenarios such as web, APP and database. The performance of Alibaba Cloud PaaS products running on the Yitian instance has achieved significant performance gains, and it is believed that it can also bring significant benefits to customer business scenarios.

03 The performance of the whole scene has been greatly improved

On November 15, ECS G8y (Yitian instance) will be officially launched. The product specification covers 1-128 cores. It is fully equipped with eRDMA acceleration capability, which can greatly improve software performance.

ECS G8y (Yitian instance) has business value in three directions: excellent performance, rich ecology and green cost reduction. Among them, it has good data performance in terms of performance benefits and growth.

What are the performance benefits of the cloud native processor and innovative hardware architecture mentioned above? We look at the product performance from the seven most extensive scenarios, including Web, App, Media, DB, big data, scientific computing, AI reasoning.

Web scenario: comprehensive performance improved by 30%

Web scenarios are the scenarios with the most application scenarios and the most server resources consumed in the Internet. In order to solve the problem of mobile scene traffic and experience, the server often uses web page compression scheme to save bandwidth. However, the compression algorithm consumes a lot of CPU power and time, resulting in a long delay of multi-client requests in the queue.

Rely on the independent CPU physical core, combined with SVE instruction acceleration, the data compression performance of single vCPU has doubled, and the experience impact mentioned above has been halved. Web scenarios include Nginx, Apache, NodeJS, PHP and other top-ranking applications. Compared with Alibaba Cloud G7 series instances, the comprehensive performance of the Rely on Sky instance is improved by about 30%.

Alibaba Cloud Firewall CFW has implemented the migration of Alibaba Cloud ECS. CFW provides security protection for customers' business. It needs to scan a large number of regular expression rules, which consumes a lot of computing power and affects business performance. The ECS Tianyi instance with exclusive physical core is used to achieve both security and experience benefits. The performance advantages of different regular matching are 23%, 40% and 28% respectively.

App programming language performance: most performance increases by 40%

Both compiled languages such as C and Go and languages such as Java and Python that do not need to be compiled are smooth and compatible with the ARM architecture. We can see that the performance improvement of the application running on the ECS instance is mostly 40% when tested with the open source Benchmark.

Encoding and decoding scenario: 20-40% performance improvement

Short video and live broadcast are the hottest applications today. In the UGC era, the content grows exponentially, and the consumption of encoding and decoding computing power also increases. Nowadays, the most popular H.264 algorithm is fast and saves computational power, but the encoded file is larger and consumes more storage and bandwidth. H.265 can solve this problem very well. The proportion of use increases rapidly, but it consumes twice the cost of computing power.

In any scenario, the video codec performance of Yitian is higher than that of x86 instances, and the cost is lower.

As shown in the figure, the encoders X.264 and X.265 of the above two video coding specifications are running on the ECS Yitian instance, and both have achieved 20-40% performance improvement; In this process, we have carried out a lot of vector instruction optimization to greatly improve the performance, and the optimization software can be output to customers.

Database scenario: 10-30% performance improvement

In the database scenario, running on the ECS instance, compared with the G7 instance, the open-source software Redis and Memcached has a 30% performance advantage, while MySQL and PGSQL have a 10-20% performance advantage.

The Tair database used by Alibaba Group's e-commerce business has also been migrated to the Yitian platform. The protocol is compatible with Redis, and the performance is three times that of the open source Redis database. Tair has supported the "Double 11" promotion of Tmall for many years, and has a strong cache capability. It needs to cache data into memory and consume memory resources. We use the powerful computing power of Yitian to compress data, reducing the memory cost by 60%. By adding the eRDMA acceleration capability of the Yitian instance, we can improve the throughput by 80% and reduce the latency by more than 15%. At present, the Tair product based on ECS Yitian instance has been launched and can be used on the cloud.

Big data scenario: 20-60% performance improvement

In the big data scenario that requires a large amount of IO, computation and high memory bandwidth, the ECS Heavenly Rely instance has the characteristics of independent physical core, larger cache, lower network latency, and so on. The performance gains of running Spark applications are more than 20%, and the performance gains of search Elastic Search and streaming computing Flink scenarios reach 40% and 60%. The above big data open source software can be directly compiled and run on the Yitian instance. You are welcome to try it.

Scientific computing: more than 20% performance improvement

In scientific computing and other scenarios, compared with the x86 cloud instance of the same specification, the algorithms in the fields of gene, medicine and automobile running on the Yitian instance all have about 20% revenue. Recently, we have two partners testing molecular dynamics and EDA applications, and even doubled the performance. This is mainly due to the fact that the scientific computing scenario mainly uses physical core resources. Compared with x86 instances of the same specification, the number of nuclear physical cores is twice as large and the computing performance is higher.

Alibaba Cloud elastic high-performance computing platform E-HPC, elastic scaling and other cloud tools have supported the ARM platform, and the main scientific algorithms can also be smooth and compatible.

AI reasoning: double the performance

AI scenario computing power consumption has increased rapidly, and the cost share has risen sharply. In the reasoning scenario, the typical search and promotion customers cannot accept the cost reduction of the reduced accuracy (affecting the accuracy of the model). Alibaba Cloud's elastic computing team, in cooperation with the Dharma Institute, has launched the HIE-Engine dynamic quantification scheme, which can double the performance of the RestNet and Bert scenarios with no loss of accuracy by using the INT8 acceleration capability of Yitian instance.

04 Reduce carbon emissions and costs

For enterprise customers, in addition to business performance, IT carbon emissions and costs are also very important. Today's Reliance processor can reduce the power consumption per vCPU by 6 times compared with x86 and the overall power consumption by more than 60% when the CPU load is 30%. Carbon emissions are also reduced in equal proportion. The annual power consumption of IDC in China is more than 200 billion kilowatt-hours, which is equivalent to the power generation of two Three Gorges dams. By replacing it with the Tianyi example, the power of the whole Three Gorges can be saved.

In addition to green and low carbon, enterprises can also reduce IT costs. You can see the pricing in the figure. Compared with the latest generation of main selling instances, the pricing of Yitian instance is 30%, 23%, and 22% of the cost reduction range, and the technical benefits are transferred to customers.

Use this product to get better cost performance. In the seven major application scenarios mentioned above, the average cost-performance ratio (performance ÷ price) income can reach 50-80%.

05 Rich software ecology and application cases

The following figure shows the ARM server software ecosystem, including mainstream OS, programming language, Lib library, and open source applications. We have used and tested these software, which can run without modifying the code. You can click here for migration documents, migration tools and other solutions.

Currently, among Alibaba Cloud products, RDS, container, PAI, video cloud and other PaaS products are connected to Reliance ECS; Alibaba Group's e-commerce business also uses Yitian ECS to effectively support multiple activities of Alibaba 618, 99 and Double 11. External customers such as short video, Web, games and advertisements have used new products in advance.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us