In many ways, MySQL has represented the rapid development of open-source databases around the world. Quickly following this, AliSQL was developed at Alibaba Group based on the history of relational database development, from the emergence of lightweight web 2.0 applications such as Wiki and WordPress around 2004, to the popularization of MySQL databases on a large scale for e-commerce and payment scenarios by Alibaba in 2010 in China, and then to the beginning of Alibaba Cloud ApsaraDB RDS for MySQL in 2012. AliSQL provided reliable relational database services for tens of millions of customers and has accumulated many requirements for internal and external functions, performance, and security. With these several years of experience, we at Alibaba have built a strong MySQL research and development team and created an enterprise-oriented AliSQL branch, with a quickly evolving and stable service for internal and external key business scenarios.
Let's take a look at AliSQL's enterprise scenario-oriented design and functions.
MySQL 8.0 offers five outstanding enterprise-level features. The following describes each of them:
In 2019, we were able to quickly launch an AliSQL edition that was based on MySQL 8.0, coming equipped with these features.
In enterprise-level scenarios, including both Internet and non-Internet scenarios, the digital transformation of businesses across the board is a booming and growing area, therefore application scenarios are becoming more diverse than ever before. So, as such, RDS databases face the following challenges to provide high-quality services while using limited resources, specifically of the 4c/8c specification for elastically reduced enterprise costs in the cloud:
The Listener and Worker mechanism is used to move all requests into the Worker queue. Then, Shared Worker threads are used to execute the requests of different users. This makes it unnecessary for each client session to create a background thread, such as the non-Dynamic Thread Pool, to execute requests. For example, if 1,000 client connections initiate requests, the server only needs dozens of Worker threads to handle this without needing to create 1,000 threads. As such, you can create more connections with limited resources.
Since each connection does not correspond to a fixed thread, and no thread operation is performed when a new connection is created, the cost of creating a new connection has been greatly reduced, providing greater and more stable connection creation capabilities.
By controlling the number of threads under the condition of limited resources, the number of thread switchover attempts can be reduced, the SYS CPU usage is dramatically reduced, and the CPU resource usage is improved. In the sysbench point select test with high concurrency, Dynamic Thread Pool can improve performance by about 10%. In the case of low concurrency, Dynamic Thread Pool can dynamically switch back to the dedicated mode. This means that each connection is mapped to a dedicated Worker thread to ensure better performance under high concurrency.
A database involves read and write operations. Meanwhile, SQL operations can be divided into simple operations and complex operations. To prevent interference, a Dynamic Thread Pool has multiple queues in design. In this design, SQL operations are classified by SQL operation cost, and each of these operations is sorted into a dedicated queue. In the meantime, concurrent control is implemented in the queues to prevent different types of operations from intervening with each other. For example, in the case that the core business logic involves simple queries, and the preceding feature ensures that complex SQL statements without the review of new applications do not affect the core logic.
Dynamic Thread Pool can be dynamically started and stopped without restarting database instances. This is a major technological innovation and breakthrough of AliSQL Dynamic Thread Pool.
Dynamic Thread Pool is available in RDS 5.6, 5.7, and 8.0, and is enabled by default. It is applicable to all users and customers. The TPC-C model is closest to the core business scenarios of large enterprises. In TPC-C tests with 1,024 or 2,048 concurrent clients, Dynamic Thread Pool can increase TpmC by 50% to 100%. It also reduces CPU resource consumption by 10% to 20%, creating tangible technological benefits for all Alibaba Cloud customers.
The TPC-C model approximates the core business scenarios of an enterprise. Therefore, optimizations in these scenarios can benefit enterprise applications. During TPC-C stress tests, it is found that the Index lock is very stubborn, and all other operations are blocked when an index node is split. AliSQL is optimized to effectively reduce the cost of splitting index nodes during stress tests, and to greatly improve the processing capability in TPC-C scenarios. In the full memory test, the single machine reaches 390,000 TpmC, which is 35% to 50% higher than the native version. Specifically, in large-capacity tests, the increase is 35% for version 5.6, 40% for version 5.7, and 45% for version 8.0.
This optimization has been simultaneously applied to RDS 5.6, 5.7, and 8.0.
RDS 5.6, 5.7, and 8.0 fully support Transparent Data Encryption (TDE), and support the SM4 Chinese national encryption algorithm to ensure the security of customer data.
SQL optimization on the application side is critical to system and business optimization. Native MySQL lacks effective monitoring data. To address this, AliSQL combines the application experience of different databases and the requirements of internal and external customers to extend SQL performance in the following aspects:
table_stats: records the number of rows that are added, deleted, changed, and queried for each table. This allows you to quickly understand the data access behavior in the existing system, determine the core tables of the business, and identify which tables are suitable for caching.
index_stats: records the used index entries and their utilization frequencies to help you quickly understand index usage, determine redundant index entries, and identify core index entries.
IO_STATISTICS: records the total amount and time of different I/O operations at the instance level. By default, a set of data, which is configurable, is generated every 2 seconds. The data can be directly presented by using Grafana or other tools.
After a careful analysis of the native Performance Schema, these additional performance-related data items are added to AliSQL to accurately reflect the health of the system and facilitate the troubleshooting of performance problems. This also helps you perform SQL performance analysis, and allows third parties to provide better performance optimization services based on the performance data.
Deleting a particularly large file, that is, performing Drop Table/Partition or Truncate Table/Partition, in the file system can lock the entire file system. This can cause other operations on the same file system to fail, blocking the entire instance. To address this potential issue, AliSQL introduces an optimization as follows. When a table or shard is deleted, an asynchronous file cleanup task is scheduled. Then, after the table or shard operation is completed, the backend thread gently runs the cleanup task. Here, the task is designed to shrink large files gradually and delete them after they reach a certain safe size.
In some business scenarios, a large number of slow SQL statements are generated and recorded in a slow log. Native MySQL uses a single-file mechanism and keeps the slow log file open. As a result, the log file can grow to an extremely large size, up to hundreds of GB. In this case, to maintain the slow log file, you must close the MySQL instance, which is very user-unfriendly. AliSQL improved this by designing a slow log switching mechanism so that switching can be performed at a scheduled time or after a specified file size is reached, the switching is triggered externally. This eliminates the need to restart database instances to maintain slow log files. For example, to release the space occupied by slow log files. In addition, the damage caused by storing slow log files in CSV format, with the SQL execution time range exceeding 35 days, which is usually due to the Binlog Dump command, is also fixed.
SQL Hint is a very effective optimization method. AliSQL ensures the stability of the execution plan by extending the features of SQL Hint, which can be used to persistently store path information in SQL computing when running. This feature can effectively handle SQL execution plan changes in the following scenarios:
SQL statements for core businesses need to be solidified by using SQL Outline to avoid the deterioration of SQL execution plans.
AliSQL also provides APIs for SQL Hint editing. That is, you can manually add a Hint to an SQL statement for transparent SQL optimization, allowing you to optimize SQL statements without modifying application code. Additionally, this feature is integrated with other optimization functions of AliSQL, such as Statement Queue and Query Cache, which is currently under development. You can specify a Hint in SQL Outline to enable the concurrent control of Statement Queue. Alternatively, you can enable Query Cache to implement SQL optimization that is transparent to applications.
MySQL's logical log (Binlog) makes data subscription so convenient, but it also affects performance. In the process of serving customers, we found that enterprise scenarios impose increasingly demanding performance requirements. After Binlog was enabled, we made major technical changes to the transaction commitment mechanism, which removes all negative performance impacts brought by the XA mechanism without affecting Binlog. Compared with native MySQL, the transaction processing performance has been nearly doubled. In addition, the Pipeline mechanism for transaction submission was implemented. In this mechanism, threads in Dynamic Thread Pool respond to the user's transaction submission results, further reducing the number of threads in the thread pool and improving the processing efficiency of the system.
In MySQL, DDL statements cannot be rolled back. Therefore, drop or truncate operations on tables or shards can be extremely dangerous, which is also true for commercial databases. Oracle 10g introduces the recycle bin feature. This feature can move the dropped or truncated table or shard to the recycle bin, and then, based on the usage of system space, automatically reclaim them according to the first-in-first-out (FIFO) policy. This allows you to quickly recover data from the recycle bin if you accidentally performed a drop or truncate operation, without needing to perform point-in-time-based instance recovery to restore the mis-operated table. AliSQL 8.0 also implements this recycle bin function. Moreover, AliSQL offers other enterprise-level functions, such as Sequence indexing and Statement Queue, which are not explained in detail here. We collect requirements from businesses, technologies, O&M practices, and our partners. For us, all parties with requirements are our clients. We have been committed to making AliSQL more open and oriented towards enterprise-level applications.
Alibaba Clouder - February 4, 2019
Alibaba Clouder - January 7, 2021
Alibaba Clouder - November 29, 2019
Alibaba Clouder - May 9, 2018
Alibaba Clouder - October 10, 2018
ApsaraDB - February 19, 2020
An on-demand database hosting service for SQL Server with automated monitoring, backup and disaster recovery capabilitiesLearn More
An on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
Fully managed and less trouble database servicesLearn More
An on-demand database hosting service for MySQL, SQL Server and PostgreSQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
More Posts by ApsaraDB