By Nanlong Yu (Gushen)
Mixed transaction processing (OLTP) and analytical processing (OLAP) workloads are becoming common in today's business systems. Due to the requirements of real-time and easy operation and maintenance, some business systems use the HTAP database to replace the original OLTP-ETL-OLAP architecture. The HTAP database can simultaneously process TP requests and AP requests efficiently in a database system.
However, since the access mode of TP requests and AP requests are different, efficient processing of the two requests depends on different storage formats and computing modes. Therefore, one type of HTAP database would store data in two different formats and choose different modes to compute the corresponding data when processing different requests.
PolarDB (the cloud-native database of Alibaba Cloud ApsaraDB) uses this method to process mixed workloads. Read-only nodes (RO) that enable the In-Memory Column Index index (IMCI) function to maintain an additional column index and use the vectorized method to compute column data when processing AP requests (column computing). However, when processing TP requests, MySQL's original one-tuple-at-a-time method is still used to compute the InnoDB data of the row-based index (row-based computing). Both column index and row-based index playback the writes of the read and write nodes (RW) through physical replication links. In this system, the storage, executor, and optimizer that process the two requests are independent of each other. TP requests and AP requests are separated on the execution path. An SQL statement either selects column or row-based computing.
From the users' workload, we observed that for most of the requests in the mixed load, the row-column separation computing method has been able to execute with the optimal plan, but neither column nor row-based computing is optimal for a small number of requests.
To handle this part of long-tail requests in mixed loads more efficiently, we designed and implemented a row-column fusion execution architecture based on two different indexes in the optimizer and executor of IMCI. This article describes the design of the basic components (the cost model of the optimizer, the multi-engine access of executors, and the log playback and transaction processing of storage engines) of the row-column fusion execution architecture of PolarDB and the HybridProject operator designed for the first type of the requests above. Through the basic components, we can easily implement HybridIndexScan and HybridIndexJoin to handle the remaining two types of requests mentioned above.
The main challenges of implementing row-column fusion in a row-column separation system come from the following three aspects:
The three types of long-tail requests in the row-column separation execution architecture above would be considered based on the cost, in which AP requests are executed by IMCI. Therefore, we choose an IMCI-based optimizer and introduce cost estimation for row-based execution fragments so these long-tail requests can choose the row-column fusion execution plan.
When introducing row-based execution cost estimation, we use the following two principles:
Based on the principles above, we adopt the following design:
With the comparable cost constants above, we can compute the comparable cost of the row-column fusion execution plan based on the respective algorithms and statistics of MySQL and IMCI.
After the IMCI optimizer selects the row-column fusion execution plan for long-tail requests, we introduce a new Hybrid operator in the IMCI executor to compute row-type execution fragments.
The new Hybrid operator needs to access InnoDB in the IMCI executor. We refer to the implementation of the server layer of MySQL and interact with the storage engine through the handler in TABLE objects. From the execution process shown in the preceding figure, although the table has been opened in the preparation stage, the related objects and interfaces are all implemented for MySQL single-threaded execution. Therefore, in the workers of IMCI executor, we need to open the table again and clone or reference related objects. The main difficulty of this part lies in the compatibility with the engineering complexity of MySQL logic.
The asynchronous playback of two different indexes is shown in yellow in the preceding figure. InnoDB updates the latest read view after the playback, while the column index updates its last commit seq. The playback runs every other redo (including several redo log entries). The commit status of InnoDB transactions and IMCI transactions in a redo is the same. Therefore, after a redo segment is played back on two different indexes, we call the read view and commit seq above corresponding because the data visibility based on the updated latest read view and last commit seq on the two different indexes is the same. If the query uses the corresponding read view and commit seq to determine data visibility, strong, consistent reads can be achieved.
However, the playback of row and column indexes is asynchronous. Due to the difference in playback speed, the latest read view and the last commit seq may not be corresponding at any time. As such, we need to find the corresponding read view and commit seq updated recently.
To this end, we introduce the process shown in the blue part of the figure above in the asynchronous playback process.
When a dummy log object is processed, the previous regular log objects may not be played back completely. Therefore, the commit seq set to the read view may not take effect on the column index. After the playback is completed, the read view and the corresponding commit seq can be used.
After this process is introduced, the row-column fusion query will find a read view in which the corresponding commit seq has taken effect in the read views that have not yet been purged to determine data visibility.
We used the preceding basic components to introduce HybridProject to handle the first long-tail case. HybridProject uses the primary key to obtain complete rows from the primary index of InnoDB and outputs the rows after computing relevant expressions. In addition to the capabilities provided by the basic components, HybridProject needs its child operators in the execution plan to output the primary key, which can be implemented in the following two ways:
As the IMCI optimizer does not support column pruning, we adopt the second way.
On the OnTime dataset provided by ClickHouse, we verified the effect of the row-column fusion execution architecture and HybridProject.
https://clickhouse.com/docs/en/getting-started/example-datasets/ontime
The table of the OnTime dataset contains 110 columns, which is a typical large wide table. We use the following SQL to simulate the first type of long-tail requests.
select * from ontime order by ArrTime limit 1000;
The three execution plans are all cold-start queries. The results are listed below:
The results show PolarDB can achieve optimal performance for long-tail requests in hybrid workloads using the row-column fusion execution architecture and hybrid operators. The performance is improved by orders of magnitude compared with pure column execution or pure row execution.
The row-column fusion execution architecture can select a better execution plan to handle long-tail requests in mixed workloads. In the future, we will introduce HybridIndexScan and HybridIndexJoin operators based on this architecture to comprehensively improve the performance of long-tail requests.
If you have any questions related to the technology and practice mentioned in this article, please email nanlong.ynl@alibaba-inc.com.
Nanlong Yu (Gushen), from Alibaba Cloud Apsara Database - PolarDB New Storage Engine Group, is engaged in the research and development of PolarDB in memory column index (IMCI).
About Database Kernel | How Does PolarDB HTAP Implement IMCI Query Optimization?
ApsaraDB - June 7, 2022
ApsaraDB - January 4, 2024
ApsaraDB - November 21, 2023
ApsaraDB - April 27, 2023
ApsaraDB - April 10, 2024
ApsaraDB - August 8, 2023
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreAlibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by ApsaraDB