PolarDB implements cloud native to the end

Extraordinary: PolarDB carries out cloud native to the end

This year marks the fifth anniversary of the birth of PolarDB. After five years of rapid development, PolarDB has exceeded 500000 online operating cores, serving tens of thousands of merchants, and has gradually become a successful commercial database.

Similar to AWS Aurora and Google AlloyDB, PolarDB is an upgraded cloud native database developed by Alibaba Cloud based on MySQL and PG. It is fully compatible with MySQL and PG. It supports seamless migration in and out of MySQL/PG. On the other hand, PolarDB has always focused on the evolution of cloud native databases, and has made a lot of improvements on the architecture level. PolarDB has the most efficient and stable physical replication capability in the industry to ensure efficient data synchronization between nodes in the cluster, as well as second level replication of global database data deployed globally under high pressure scenarios. At the same time, PolarDB makes full use of the new features of the new hardware to enable an efficient computing storage separation architecture. It is the first cloud database in the industry to use Intel Optane and RDMA on a large scale. In recent years, PolarDB has continued to make efforts in cloud native. For example, it first proposed a three-tier separation architecture in the industry. Based on the separation of computing and storage, it further developed a distributed memory pool to achieve the separation of computing, memory and storage. The flexibility and resource utilization of the shared resource database will be maximized. PolarDB has quite a few capabilities and research that are ahead of the industry. We will publish papers at the top conference of the database every year to share our research and commercialization achievements with academic/industrial peers.

The above figure shows the latest architecture of PolarDB. PolarDB will adhere to integration and modular construction in the future.

PolarDB is very mature in computing storage separation and physical replication, so it is gradually moving to another level. Previously, we supported read-write nodes and read-only nodes. Through continuous efforts in the past few years, we released several new computing nodes this year, including HTAP nodes, X-engine nodes, multi write nodes, AI nodes, and so on. These computing nodes can be collocated and converted freely, and nodes can also be automatically upgraded and downgraded to adapt to the application scenarios required by customers. With these capabilities, PolarDB has truly achieved scale out&scale up in both horizontal and vertical directions. At the same time, PolarDB can not only realize the free collocation of computing nodes, but also support the distributed memory pool and distributed storage pool through the three-tier separation architecture, providing the ultimate resource sharing and flexibility. At the same time, these nodes can be collocated freely like "Lego bricks", and simple cabins or luxury mansions can be built to meet the different needs of various customers.

PolarDB is a cloud native database for sharing resources. It pays close attention to the capability and progressiveness of resource hardware. We make full use of the new hardware to realize the integration of software and hardware, and share the dividend of hardware to users.

This year, PolarDB has implemented two hardware level upgrades. One is SmartSSD, which carries FPGA chips on SSD storage devices, and through these special chips, it realizes transparent compression of data, as well as fine scheduling and resource recovery of data on SSD, and achieves a 2.0-3.0 compression ratio of the whole stack. At the same time, as shown in the lower left corner of the above figure, depending on the powerful hardware compression capabilities, Smart SSD compression operations have very little impact on the overall read and write latency, and continue to maintain considerable performance advantages over ordinary/high-performance cloud disks. In addition, PolarDB can reduce the selling price of storage by 50% through its data compression capability, making it profitable for users. In the subsequent evolution, the new generation of SmartSSD can also achieve efficient encryption and decryption of data. PolarDB will use this capability to provide full encryption assurance for ON-SSD data.

Another hardware upgrade is PolarDB's introduction of 100G RDMA network. As far as PolarDB is concerned, RDMA network is an important part to couple nodes, and it is also the "contact" of node coupling on "Lego". We have been using 25G RDMA before. After upgrading to 100G RDMA with larger bandwidth, we can transmit more information directly to other nodes through high-speed networks. A specific example is PolarDB's "high performance global strong consistency" function. Most cloud databases, such as AWS Aurora, only support the final consistency of cross node queries, but do not support the strong consistency of read after write. That is, when users use these databases and write data to the read/write master node (RW), if they immediately read from the slave node (RO), the read node may not be able to read the latest data due to write replication delays. To achieve global consistency, we first need to ensure rapid data replication, and then ensure that the master and slave nodes synchronize more transaction and timestamp information.

PolarDB used to synchronize logs through the PolarStore shared storage. Now, because of the larger bandwidth network, it can synchronize log information directly through the RDMA network, greatly reducing the data replication delay. On the other hand, PolarDB uses a fine-grained transaction tracking system and RDMA to quickly synchronize a large number of transactions and timestamp information between nodes. With the minimum amount of information synchronization, PolarDB ensures the read after write consistency of the secondary node, which greatly improves the utilization of RO nodes, greatly improves the performance of global consistent reading, and also provides users with richer and more reliable use scenarios.

One of the most important releases of PolarDB this year is the HTAP function. Through In Memory Column Index (IMCI), PolarDB began to support column data format and lightweight analysis capabilities. PolarDB generates quantitative column storage logs for selected columns while generating row storage logs, and recovers the column storage data in the HTAP node through efficient physical replication, which facilitates complex analysis and query by the efficient column storage execution engine built in the HTAP node. Unlike other products (such as MySQL Heatsave), PolarDB has a very efficient physical replication capability, retains a large amount of transaction information, and has very low latency between nodes, enabling real real-time queries. In addition, binlog is often required for synchronization of other competing products, which directly affects the performance of the primary node (up to 20-40%), while the physical replication of PolarDB has no direct impact on the performance of the primary node. At the same time, PolarDB column storage data also comes with transaction information, which can support consistency query of column storage. Users can enjoy the performance improvement brought by fast column data query just like ordinary MySQL. This system truly meets the customer's OLTP and OLAP requirements in a one-stop manner.

It is worth noting that the performance of PolarDB IMCI is also excellent. The performance of TPCH 100G is nearly 100 times higher than that of friend A (Aurora), 25 times higher than that of friend B (OB), and nearly 3 times higher than that of friend C (TiDB/TiFlash).

In terms of analysis and query, PolarDB has made great progress not only in column storage, but also in row storage query acceleration. PolarDB's single node parallel query has been far ahead of some competing pairs, including AWS Aurora. This year, PolarDB took it to the next level and released the cross node elastic parallel query function (elastic PQ). The release of ePQ further improved the horizontal scalability of performance and opened the gap in analysis capability between ePQ and other MySQL products.

The left figure above shows the parallel query results of ePQ through four 32 core nodes. The results show that the overall execution performance is more than 60 times higher than MySQL, and the maximum single execution performance is 150 times higher. The figure on the right shows that ePQ's performance against the TPCH 1TB multi node query has achieved a linear improvement. Another example is the grouping aggregation of 6 billion+large tables. ePQ reduces the execution time from 8 hours of a single thread to less than 60s (16 nodes 16 X 16=256 in parallel).

Another major commercial release of PolarDB for MySQL this year is "multi write at the database and table level".

PolarDB has always been a write once, read many architecture: a cluster has only one read write primary node (RW), but it can have 1 to 15 read nodes, so it has sufficient horizontal scalability in terms of reading capacity. However, in some scenarios, users hope that PolarDB can also horizontally expand its write capabilities, and that multiple write nodes can provide certain resource isolation capabilities. Under this requirement, we support multi write at the database table level. A PolarDB can have multiple read/write master nodes. Each master node manages multiple libraries and tables, but each table is "bound" to a master node. All writes to this table will be directed to its read/write master node by the PolarDB proxy. Compared with the share nothing distributed database, the advantage of multi write at the database table level lies in the shared storage architecture, where all data can be seen by each node. Therefore, there is no need to migrate data across nodes when adding or reducing nodes, and it has excellent flexibility. Even if the nodes increase or decrease, the data can also flow to different nodes quickly, without the need for repeated data migration, to achieve second level horizontal expansion.

In addition, it realizes the ability of active and active mutual backup. Each node is a standby node of other nodes, which is both a standby node and a primary node, which greatly improves the utilization of nodes. After the common master/slave architecture is upgraded to "database table level multi write", the database cost can be greatly reduced.

Compared with RAC, library table level multi write is still a coarse-grained multi write solution. Because it is necessary to deal with a large number of data multi node consistency problems, how to achieve such row level multi write capabilities and ensure horizontal scalability has been a huge challenge in engineering. PolarDB successfully realizes the global coordination of transactions, locks, and cache information under line storage and multiple writes through the three-tier separation architecture, with the help of global memory pool, Polar Fusion and other components, and realizes the line level concurrent writes of multiple write nodes, breaking through the single point write bottleneck.

The above figure shows the performance comparison between Alibaba Cloud PolarDB row level multi write and AWS Aurora Multi Master. We can see that Aurora MM started to crash frequently in 4 nodes, and failed to complete the multi node conflict write test. PolarDB still has good scalability in 4-node conflict writing. In the conflict free write scenario, PolarDB row level multi write can scale to 16 nodes, while Aurora MM only supports 4 nodes at most, with poor horizontal scalability. The above figure on the right shows the performance of 8 nodes. The performance of PolarDB is far ahead of Aurora MM.

Since PolarDB's global database was launched two years ago, it has become the preferred solution for cross domain disaster recovery for many businesses. At the same time, PolarDB global database makes use of unique parallel physical replication technology, and its replication stability and speed are world leading. Under high pressure scenarios, the data written by PolarDB in any machine room can be read by other PolarDBs deployed globally within 2 seconds. Therefore, PolarDB is often used by users to meet the business needs of global data replication and nearby reading. PolarDB now only supports a small amount of global writes. The written SQL will be forwarded by the proxy to the master node in the primary region for execution. There is a certain cross domain delay. However, many users want PolarDB to support stronger global nearby write capabilities. Data can be synchronized in multiple directions to achieve true cross domain multi read and multi write. The research and development of PolarDB table level multi write technology based on physical replication technology is also nearing the end. In the next few months, we will launch this nearby write function to meet the global business deployment needs of more customers.

PolarDB launched an LSM tree (Log Structure Merge Tree) architecture engine: X-Engine in January this year. Compared with traditional engines based on B-tree architecture (such as MySQL InnoDB), LSM tree packs compressed block data into SSTable files more efficiently, with a higher compression ratio. Therefore, PolarDB's X-engine, as a high compression engine, can help users store relatively "cold" data, thus saving storage costs.

The figure above shows that in the group business scenario (Taobao), users achieve a compression ratio of nearly 3.5-7 times after using X-Engine for compression. In addition, compared with the PolarDB native engine, the performance of the X-Engine decreases slightly but is similar. X-engine is a powerful tool provided by PolarDB to reduce costs and increase efficiency. PolarDB supports "dual" engines online at the same time. Users can migrate relatively "cold" and infrequent data to X-engine while using the efficient PolarDB default engine.

Another important feature of cloud database is resource pooling. Users want PolarDB to provide elastic resource allocation and pay as you go functionality. Before the release of the PolarDB Serverless function, PolarDB's storage PolarStore has the ability to allocate storage on demand and pay as you go capabilities. However, the number of CPU cores and memory capacity of each computing node are fixed. The user can upgrade the configuration when the business pressure increases, but it needs to be manually triggered by the user. PolarDB Serverless provides automatic intelligent detection, and more rapid CPU and memory configuration change capabilities to achieve single machine second level detection and non disruptive second level jump/drop. In addition, if the resources of a single machine are occupied or insufficient, PolarDB can also support horizontal and cross node springing through link maintenance, hot BP and other technologies, and almost insensibly add/remove nodes to adapt to user needs.

As can be seen from the figure above, PolarDB Serverless can automatically increase the number of PCUs according to the increase of pressure (QPS). After the pressure stops, the PCU gradually decreases. The figure in the lower left corner shows that after the single node specification reaches the upper limit, the read-only connection node can be automatically added to cope with sudden pressure and improve the cluster performance.

Since its birth, PolarDB has been continuously optimizing its performance. Over the past year, PolarDB has greatly improved the performance of a single machine through full path deep optimization for cloud native architecture, high-performance storage engine optimization, and high-performance index PolarIndex, and has continued to distance itself from other competitors.

PolarDB is also evolving on some basic functions. For example, parallel DDL is fully opened online through one-year grayscale. In the high concurrency scenario, parallel DDL improves the speed of index creation by 15-20 times, which solves the urgent needs of multiple customers in the large table scenario. For example, for a cross-border financial customer, the 3TB table with nearly 500 million rows has not been indexed due to business pressure, and it only took less than 20 minutes after using parallel DDL. At the same time, we are implementing Multi VersionDictionary, which can implement various InstantDDLs (such as modifying field types/adding or deleting fields). It can take effect by changing the Dictionary in seconds without rebuilding the whole table.

This year is a year of cost reduction and efficiency increase. A large number of users hope that PolarDB can help them reduce costs. In this scenario, PolarDB starts to support OSS storage. In order to better transfer data to OSS and manage these data, PolarDB has done a lot of work in partitioning tables. For example, the interval range partition table can automatically generate new partitions by time. Some old partitions can automatically age to cheaper storage such as OSS.

This year is the 5th anniversary of PolarDB and a bumper harvest year for PolarDB. Many important functions of PolarDB have been realized and implemented through efforts in the past few years. We hope that these capabilities can give PolarDB users more choices and adapt to various user scenarios. PolarDB will continue to adhere to the full compatibility of MySQL/PG, continue to modularize functions, integrate multiple forms of a database, and refine scenarios to serve customers in various industries. We will continue to ensure the high performance and stability of PolarDB, adhere to cost reduction and efficiency increase, meet more challenges with customers, and become the first choice of customers' cloud native relational databases!

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us