When it comes to managing databases, everything depends on the input-output ratio, which corresponds to the cost performance in the database. For PolarDB, an Alibaba self-developed database, the frequently asked question is: how is the performance? Can it support my business? Is it expensive? Obviously, in the early research stage, when it is difficult to have quantitative indicators for stability and reliability, performance quickly becomes a very critical decision factor.
When PolarDB was first designed, the performance was included as a key requirement indicator in the product requirement specification. From architecture design to new hardware selection, to code implementation, and from driver to distributed block storage, to distributed file system and database engine, the entire technology stack is opened up for collaborative optimization, and finally the performance can be guaranteed to have an order of magnitude improvement.
The architecture diagram shared at the 2018 Hangzhou Cloud Habitat Conference shows the internal details of PolarDB. From the bottom up, PolarDB consists of four parts: Shared distributed storage PolarStore, distributed file system PolarFS, multi-node database cluster PolarDB and proxy PolarProxy that provides a unified portal.
PolarFS is designed with the following technologies to maximize I/O performance:
In contrast tests under the same hardware environment, the write performance of block 3 replica in PolarFS is close to the delay performance of single replica local SSD. Therefore, the reliability of data is guaranteed, while the performance of single instance TPS of PolarDB is greatly improved.
In PolarDB, a physical log (Redo Log) is creatively introduced to replace the traditional logical log, which not only greatly improves the efficiency and accuracy of replication, but also saves 50% of I/O operations. For databases with frequent writes or updates, the performance can be improved by more than 50%.
The significance of PolarProxy is that it can integrate the resources of multiple computing nodes at the bottom tier together and provide a unified portal for applications to access, which greatly reduces the cost of applications using the database and facilitates the migration and switching from the old system to PolarDB. In essence, PolarProxy is a distributed stateless database proxy cluster with self-adaptive capacity. Its dynamic scale-out capability can bring the advantages of PolarDB in rapidly increasing and decreasing read nodes to an extreme, so as to improve the throughput of the entire database cluster. The more ECS that access it, the higher the concurrency, and the more obvious the advantages.
Cost and performance are inseparable.
First, the architecture of PolarDB that separates storage from computation can free CPU, memory and disk from mutual restriction. It allows computation and storage to be managed and allocated as separate resource pools, which greatly reduces resource fragments and improves overall resource utilization. If the computation and storage models are different, we can also customize and optimize them more specifically to reduce the cost per unit of resources.
In a general sense, the diminishing marginal cost caused by scale effect will continue to occur. Based on Alibaba ultra-large infrastructure, we can continuously reduce our costs from global supply chains, low-energy data centers, server R&D and other dimensions.
No matter how advanced the technology is or how low the cost is, the user's approval is ultimately required.
Therefore, from the perspective of the user, we are most concerned about the cost performance, that is, whether the same cost can obtain better performance.
Let's take a quick look at the cost performance of PolarDB and RDS MySQL.
To be fair, we use the same database configuration, test data sets and test methods, and then calculate the price and performance of the two separately.
The details are as follows:
In addition, both RDS MySQL and PolarDB have the ability to scale out reads by adding read-only nodes. The difference is that PolarDB does not require additional storage costs as the number of nodes increases. Therefore, we need to compare several architectures, from 1 to 3 read nodes, as follows:
Some of the base prices in the table (the 2018.11.8 prices were taken as example ):
Using the figure below will make it more clear, where the gray "standby database" does not provide services externally. Thus, the cost-Performance of PolarDB is very high, and all nodes provide services, therefore, the resource utilization ratio is also higher than RDS.
In practical applications, the customer's business is complex; often times, business-related access is mixed with a large number of statistical analysis type complex SQL (Ad-hoc query), that's when MySQL's single-threaded model often gets bogged down in processing.
In order to cope with this scenario, PolarDB has a built-in parallel query engine, for large table complex queries, such as the TCP-H benchmark, the performance can be improved eight fold, this is especially true for slow SQL (such as report queries), that take longer than 1 minute in execution. It also supports advanced syntax such as set operation, WITH, window function OVER, and so on. This feature has been beta tested and is now available for free use.
The following chart shows a comparative test that we did, in a SQL acceleration scenario, query efficiency is more than eight times faster than using SQL to accelerate direct queries. Specific test cases include,
Today's SQL acceleration features provide additional connection addresses and non-transactional complex query services. The underlying computation nodes and storage reuse the resources that PolarDB has available, a single data access approach, avoid the bother of data migration and do not require additional cost input.
Technical implementation, includes the following points
Alibaba Clouder - February 8, 2021
Alibaba Clouder - August 7, 2020
Alibaba Clouder - April 9, 2018
ApsaraDB - April 19, 2019
Alibaba Cloud Indonesia - August 5, 2021
Alibaba Clouder - November 16, 2020
An on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
ApsaraDB for POLARDB is a cloud-native relational database compatible with MySQL, PostgreSQL, and Oracle.Learn More
An on-demand database hosting service for MySQL, SQL Server and PostgreSQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
More Posts by ApsaraDB