This topic describes the technical background, introduction, and technical architecture of the In-Memory Column Index (IMCI) feature of PolarDB.
Technical background
HTAP database solutions for the MySQL ecosystem
MySQL is an open source database that is mainly designed for online transaction processing (OLTP) scenarios. The research and development in the open source community focus on strengthening the transaction processing capabilities of MySQL, including single-core performance, multi-core scalability, and cluster capabilities, to improve availability. However, the MySQL community places a low priority on the capabilities that are required to process complex queries on large data volumes, such as the capabilities of optimizer-based subquery processing, high-performance operator Hash Join, and SQL parallel execution. Therefore, the improvement of MySQL data analysis capabilities is slow.
As MySQL becomes the most popular open source database system in the world, users store large amounts of data and run key business logic in MySQL. Therefore, real-time data analysis has become a growing demand. When the standalone MySQL cannot meet requirements, users expect a better solution. For example, the following solutions are available: building block solution of MySQL + dedicated AP database, divergent design method based on multiple replicas, and integrated hybrid row-column store solution.
Building block solution of MySQL + dedicated AP database
In this solution, two systems are deployed to meet OLTP and online analytical processing (OLAP) requirements, and a data synchronization tool is used to synchronize data in real time between the two systems. Users can even add a layer of proxy to automatically route transaction processing (TP) loads to MySQL and analytical processing (AP) loads to an OLAP database. This way, users can mask the application layer from the deployment topology of the underlying database. The following figure shows the architecture of the solution.
The architecture is flexible. For example, the best scheme can be selected for a TP database and an AP database, and TP and AP loads are isolated from each other. However, the shortcomings of the architecture are also obvious. First, it is technically necessary to maintain the two sets of database systems that use different technical systems. Second, due to the differences in the processing mechanisms of the two database systems, it is also very challenging to ensure real-time consistency of upstream and downstream data. Moreover, due to synchronization latency, data that is stored in the downstream AP system is often outdated, which fails to meet the requirement of real-time analysis.
Divergent design method based on multiple replicas
Many of the emerging database products that have emerged with the rise of the Internet are compatible with the MySQL protocol. Most of the distributed database products adopt the distributed Share Nothing solution. One of the core features of the solution is the use of the distributed consistency protocol to ensure data consistency between multiple replicas of a single partition. Since one copy of data is completely independent among multiple replicas, it is easy to use different formats for storage on different replicas to serve different query loads. A typical example is TiDB. Starting from version 4.0, TiDB uses TiFlash, a column store, on one of the replicas in a Raft group to respond to AP loads. TiDB also uses its intelligent routing capability to automatically select data sources. This allows one database system to serve both OLTP and OLAP loads.
This method has been used in many research and industry fields and has increasingly become a de facto standard scheme for integrated HTAP in distributed data fields. However, the premise is that users must migrate data to the corresponding NewSQL database system. In most cases, this brings various compatibility issues.
Integrated hybrid row-column store solution
A solution that is more advanced than the multi-replica divergent design is to use a hybrid row-column store on a database instance to respond to both TP and AP loads. This solution is used by all traditional commercial databases such as Oracle, SQL Server, and Db2.
Oracle Corporation released the Database In-Memory suite on Oracle Database 12c in 2013. The core capability of the suite is in-memory column store. This improves OLAP performance by using techniques such as hybrid row-column store and advanced query optimization based on materialized expressions or JoinGroup.
Microsoft began to provide the column store index feature on SQL Server 2016 SP1. Users can use various modes such as row-oriented tables, column-oriented tables, a combination of row-oriented and column-oriented tables, and a combination of column-oriented tables and row store indexes based on load characteristics.
In Kepler 10.5 released in 2013, IBM added the Db2 BLU Acceleration component. This component greatly improved the performance in analysis scenarios by using a column store, in-memory computing, and DataSkipping technology.

Three leading commercial database vendors have adopted the technical route of a hybrid row-column store combined with in-memory computing at the same time. A column store has better I/O efficiency (compression, DataSkipping, and column pruning) and CPU computing efficiency (cache-friendly). Therefore, to achieve the highest analysis performance, a column store must be used. However, the index accuracy issue caused by index sparseness of a column store determines that a column store cannot become the storage format for TP scenarios. Therefore, the hybrid row-column store solution becomes a required solution. However, in the hybrid row-column store architecture, a performance gap exists between row store indexes and column store indexes in terms of handling random data updates. The feature of low read-write latency of the dynamic random access memory (DRAM) must be used to compensate for the low efficiency of column store updates. Therefore, with the capabilities of low-latency OLTP and high-performance real-time data analysis, the hybrid row-column store solution together with in-memory computing is the optimal choice.
From the building block solution to the divergent design method and the integrated hybrid row-column store solution, the integration level is increased, and the user experience is improved. However, the challenges in implementing the kernels also increase. The role of basic software is to leave the complexity to itself and the simplicity to users. Therefore, the integrated hybrid row-column store solution is in line with technological trends.
Evolution of AP capabilities of PolarDB for MySQL
The capability stack of PolarDB for MySQL is similar to that of open source MySQL. PolarDB for MySQL can efficiently handle TP scenarios but is weak in AP scenarios. A PolarDB cluster can store up to 500 TB of data, and its transaction processing capability far exceeds that of a self-managed MySQL database. Therefore, PolarDB users tend to store more data in a single cluster and run some complex aggregate queries on the data. The write-once-read-many architecture of PolarDB allows users to add multiple read-only (RO) nodes to run complex queries based on business requirements. This prevents the interference of analytical queries on TP loads.
Defects of the MySQL architecture in AP scenarios
The poor performance of the MySQL architecture in running complex queries is caused by many reasons. Compared with a dedicated OLAP system, the performance bottleneck of MySQL is reflected in the following aspects:
The SQL execution engine of MySQL is implemented based on the volcano iterator model. This architecture highly depends on a large number of deep-level nested function and virtual functions in terms of project implementation. When a large amount of data is processed, the architecture affects the efficiency of CPU pipelines, which leads to low CPU cache efficiency. In addition, the iterator execution model cannot make full use of the single instruction multiple data (SIMD) instructions provided by CPUs to accelerate execution.
The execution engine can only serially execute and cannot make full use of the parallel processing capability of multi-core CPUs. Since MySQL 8.0, the parallel execution capability has been added to some basic queries such as
count(*). However, the parallel execution capability of complex SQL statements still has a long way to go.The most frequently used storage engines of MySQL use the row store format. When a database analyzes a large amount of data by column, a large amount of I/O bandwidth is wasted when the database reads data from disks by row. In addition, the row store format copies a large amount of unnecessary column data when a large amount of data is processed. This adversely impacts the read and write efficiency of the memory.
Parallel query framework of PolarDB breaks through the CPU bottleneck
The parallel query framework developed by the PolarDB team can automatically start parallel execution when the amount of queried data reaches a specific threshold. The data is distributed to different threads at the storage layer for parallel computing, and the results are aggregated to the main thread. Then, the main thread summarizes and returns the results to users. This framework helps improve the query efficiency. 
The parallel query feature allows PolarDB to break through the limitation of single-core execution performance. By using the parallel processing capability of multi-core CPUs, the time consumption of some SQL queries on PolarDB exponentially decreases.
PolarDB column store
The parallel execution framework breaks through the limitation of CPU scalability and significantly improves performance. However, the single-core execution performance has a ceiling due to the efficiency limit of a row store and row executor. The peak performance still falls behind that of a dedicated OLAP system. To further improve the analysis performance of PolarDB for MySQL, a column store must be introduced.
In analysis scenarios, it is often necessary to access a large number of records in a column. By storing data in different columns, a column store avoids reading unnecessary columns. In addition, a column store continuously saves columns that have the same attributes. The compression efficiency of a column store is much higher than that of a row store. In most cases, the compression efficiency is more than 10 times higher. Moreover, combined with rough index information such as MIN and MAX, the structure of large block storage of a column store can be used to filter a wide range of data. All these practices greatly improve the I/O efficiency. In the compute-storage separation architecture, the reduction in the amount of data read through the network can immediately improve the response time of query processing.
A column store can also improve the execution efficiency of CPUs during data processing. First, the compact arrangement of a column store can improve the memory access efficiency of CPUs and reduce the execution pause caused by an L1 or L2 cache miss. Second, the SIMD technology can be applied in a column store to further improve single-core throughput.
Introduction
The IMCI feature provides column store and in-memory computing capabilities to PolarDB. This allows users to simultaneously run TP and AP hybrid loads on a set of PolarDB databases. This also ensures the current excellent OLTP performance of PolarDB and greatly improves the performance of PolarDB in running complex queries on large data volumes. The following figure shows the schematic diagram of the IMCI feature.

The IMCI feature uses the hybrid row-column store solution and combines the shared storage-based write-once-read-many architecture of PolarDB. The IMCI feature includes the following key technical innovations:
Support for columnar indexes is added to the storage engine InnoDB of PolarDB. Users can use DDL to create all or some columns of a table as columnar indexes. Columnar indexes use column compression storage, and the storage space consumption is much smaller than that of a row store. By default, columnar indexes reside in the memory to maximize analysis performance. However, columnar indexes can also be persisted to the shared storage when the memory capacity is insufficient.
In the SQL executor layer of PolarDB, the PolarDB team rewrites a column-oriented executor engine framework. The framework makes full use of the advantages of a column store. For example, the framework can access data of the storage layer in a batch of 4,096 rows and use SIMD instructions to improve the throughput of single-core data processing of CPUs. All key operators support parallel execution. In a column store, the performance of the new executor is several orders of magnitude better than that of the original row store executor of MySQL.
An optimizer framework that supports hybrid row-column execution is provided. The optimizer framework performs a covered query if an issued SQL statement can be executed on columnar indexes, and starts columnar indexes if the functions and operators on which the SQL statement depends can be executed by the column executor. The optimizer framework estimates the costs of a row-store execution plan and a column-store execution plan and selects the execution plan that has a lower cost.
Users can use an RO node in a PolarDB cluster as an analytical node and configure a column store index on the RO node. Complex queries run on the column store index and use the computing power of all available CPUs. This way, maximum execution performance is obtained without affecting the available memory and CPU resources of the TP loads in the cluster.
The combination of several key technologies makes PolarDB a true HTAP database system.
Technical architecture
Hybrid row-column optimizer
PolarDB provides a row-oriented optimizer. After the engine layer adds support for a column store, the optimizer must be enhanced. The optimizer must be able to determine whether to schedule a query to a row store or column store for execution. The IMCI feature achieves this by using a whitelist mechanism and a framework for execution cost calculation. The system ensures the acceleration of supported SQL statements and is compatible with unsupported SQL statements.
How to achieve 100% MySQL compatibility
The PolarDB team uses a whitelist mechanism to achieve the compatibility goal. The whitelist mechanism is used by taking into account the following factors:
The limitation of the available system resources, mainly memory resources.
In most cases, columnar indexes are not created on all columns in all tables of a database. When a query statement needs to use a column that does not exist in the column store, the statement cannot be executed on the column store.
The performance.
The PolarDB team rewrites a column-oriented SQL execution engine that involves all physical execution operators and expression computing. The coverage of supported scenarios is inadequate compared with the native row store of MySQL. When an issued SQL statement contains some operator fragments or column types that are not supported by the IMCI execution engine, the execution engine must be able to identify and intercept the SQL statement, and switch back to the row store for execution.

The whitelist mechanism checks data types, operators, and expressions in SQL statements and other scenarios, such as scenarios in which multi-statements are not supported.
MySQL has been evolving for decades. MySQL supports various column types and rich SQL syntax. In IMCI, the initial focus is on optimizing the most common SQL performance issues in analytical query statements. Even if the applicable scenarios of IMCI are limited, the SQL syntax supported by IMCI is more compatible with MySQL than most OLAP systems. When an issued SQL statement cannot be executed on the column store, the SQL statement directly falls back to the native execution engine of MySQL. This way, 100% compatibility with MySQL is achieved.
Query plan conversion
The purpose of plan conversion is to convert the abstract syntax tree (AST) representation of a native logical execution plan in MySQL into a logical plan of IMCI. After a logical plan of IMCI is generated, a physical plan is generated after a series of optimizations.
Plan conversion is simple and direct. Users need to only traverse the execution plan tree and convert the AST that is optimized by MySQL into a tree structure with relation operators as its nodes in IMCI. In this process, implicit conversion of types is performed to ensure compatibility with the flexible type system of MySQL.
The plan conversion process generates an equivalent logical plan. The logical plan cannot be executed by the executor and must be converted into a physical plan for execution. The optimizer of IMCI uses simple logic. In addition to some basic execution plan optimizations, such as determining whether to use Hash Join or Nested Loop Join, the optimizer is mainly used to convert subqueries that are not supported by the IMCI executor into equivalent Join operations.
Optimizer for hybrid row-column execution
Based on the two execution engines for a row store and a column store, the optimizer provides more choices when an execution plan is selected. The optimizer can compare the cost of a row store execution plan with that of a column store execution plan and use the execution plan that has a lower cost.
In addition to the native row store-based serial execution feature of MySQL, PolarDB also supports the row store-based parallel query feature that can make full use of the multi-core computing power. Therefore, the PolarDB optimizer chooses between row store-based serial execution, row store-based parallel query, and IMCI. In the current iteration phase, the PolarDB optimizer complies with the following execution process:
The PolarDB optimizer parses an SQL statement and generates a logical plan. Then, the PolarDB optimizer calls the native optimizer of MySQL to perform optimizations such as join order adjustment. At the same time, the logical plan that is obtained is passed to the compilation module of IMCI execution plans. The module attempts to generate a column store execution plan. The SQL statement may be blocked by the whitelist and fall back to the row store for execution.
The PolarDB optimizer calculates the row-oriented execution cost based on the row store execution plan. If the cost exceeds a specific threshold, the PolarDB optimizer attempts to push the SQL statement down to the IMCI executor for execution based on an IMCI plan.
If the IMCI executor cannot execute the SQL statement, PolarDB attempts to compile a parallel query execution plan and execute the SQL statement. If the parallel query execution plan cannot be generated, the IMCI and parallel query features cannot execute the SQL statement, and the SQL statement falls back to the row store for execution.
The preceding strategy is based on the following judgment: In terms of execution performance, row store serial execution is inferior to row store parallel execution, which is inferior to IMCI. In terms of SQL compatibility, IMCI is inferior to row store parallel execution, which is inferior to row store serial execution. However, the actual situation is more complicated. For example, in some cases, a parallel index join based on row store indexes has a lower cost than a sort merge join based on a column store. IMCI may be selected to execute the SQL statement based on the current strategy.
Column-oriented execution engine
The IMCI execution engine is an implementation with column-oriented optimizations and is completely independent of the existing row executor of MySQL. The purpose of rewriting the executor is to eliminate the two key bottlenecks that lead to the low efficiency issue of the existing row store execution engine during the execution of analytical SQL statements. The bottlenecks are overhead of virtual function access caused by row-based access and inability to execute SQL statements in parallel.
Vectorized parallel executor
The IMCI execution engine uses the classic volcano model and improves execution performance based on a column store and vectorization execution.
In terms of the volcano model, in the relational algebra that corresponds to the syntax tree generated by SQL, each operation is abstracted into an operator. The execution engine constructs the entire SQL into an operator tree. The query tree calls the Next() interface from top to bottom, and the data is pulled from bottom to top. The advantage of this method is that the computing model is simple and straightforward. This is achieved by abstracting different physical operators into iterators. Each operator only cares about its own internal logic. This reduces the coupling between operators and makes it easier to write a logically correct execution engine.
In the IMCI execution engine, each operator also uses an iterator function to access data. The difference is that each call to the iterator returns a batch but not a row of data. Therefore, the execution engine can be regarded as a volcano model that uses a vectorization model.

The capability of serial execution is limited by factors such as single-core computing efficiency, memory access latency, and I/O latency. Several key physical operators such as Scan, Join, and Agg in the IMCI executor support parallel execution. The physical operators need to support parallelism, and the IMCI optimizer also needs to support the generation of a parallel execution plan. When the optimizer determines the access mode of a table, the optimizer decides whether to enable parallel execution based on the amount of data to be accessed. If the optimizer determines to enable parallel execution, the optimizer refers to a series of state data to determine the degree of parallelism (DOP). The state data includes available CPU, memory, and I/O resources in the system, information about tasks that are scheduled and queued, statistics, query complexity, and parameters that can be configured by users. Based on these data, a recommended DOP value is calculated for the operators, and each operator internally uses the same DOP value. Users can also specify a DOP value by using hints.

Based on the preceding two optimization ideas, all physical execution operators are reimplemented, including Table Scan, Hash Join, Nested Loop Join, and Group By. In the following example, Hash Join is used to demonstrate the parallelization and vectorization acceleration of the executor. The following figure shows the execution process of Hash Join in IMCI.

Vectorization execution solves the problem of low single-core execution efficiency, while parallel execution breaks through the single-core computing bottleneck. The combination of the two makes the IMCI execution speed several orders of magnitude faster than the traditional row execution of MySQL.
Vectorized expression system
In AP scenarios, SQL often contains many computing processes that involve one or more values, operators, or functions. This belongs to the category of expression computing. The evaluation of expressions is a computing-intensive task, so the computing efficiency of expressions is a key factor that affects overall performance.
The traditional expression computing system of MySQL uses the row-by-row operation method, which is generally called iterator model implementation. The iterator abstracts the entire table, and the expressions are implemented as a tree structure. However, this abstraction brings performance loss. This is because, during the iteration of the iterator, the acquisition of each row of data triggers multiple layers of function calls. The row-by-row acquisition of data brings too much I/O and is not friendly to the cache. MySQL uses a tree iterator model as MySQL is constrained by the access methods of storage engines. This makes it difficult to optimize the complex computing logic.
For a column store, since the data in each column is separately and sequentially stored, the expression computing in a specific column can be performed in batches. For each expression, its unit of input and output is batch. In the batch processing model, the computing process can be accelerated by using SIMD instructions.
The vectorized expression system has two key optimizations:
The vectorized expression system makes full use of the advantages of a column store and replaces the iterator model with the batch processing model. The PolarDB team uses SIMD instructions to rewrite the expression kernel implementation of the most common data types. For example, all the basic mathematical operations (+, -, ×, /, and abs) of all numeric types including INT, DECIMAL, and DOUBLE are implemented by using corresponding SIMD instructions. With the help of the AVX512 instruction set, the performance of single-core operations is improved by several times.

The expression implementation is similar to that of PostgreSQL. In SQL compilation and optimization stages, IMCI expressions are stored in a tree structure, which is similar to that of the existing row iterator model. However, before execution, the expression tree is traversed in a post-order and converted into a one-dimensional array for storage. In subsequent computing, users need to only traverse the one-dimensional array to complete the operation. The computing is more efficient because it eliminates the recursive process in the tree iterator model. In addition, this method provides a concise abstraction of the computing process and separates data from computing, which is suitable for parallel computing.
Hybrid row-column store
Transactional applications and analytical applications have completely different requirements for storage engines. The former requires that indexes can be accurately located to each row and support efficient additions, deletions, and modifications. The latter requires support for efficient batch scanning and processing. The design requirements for storage engines in these two scenarios are completely different and sometimes even contradictory. Therefore, it is very challenging to design an integrated storage engine that can serve both OLTP and OLAP loads. Only several large enterprises that have decades of experience in research and development do well in HTAP storage engines. For example, Oracle has In-Memory Column Store, SQL Server has In-Memory Column Index, and Db2 has BLU. TiDB can only meet HTAP requirements by adjusting one replica in a multi-replica cluster to a column store.
The integrated HTAP storage engine generally uses the hybrid row-column store solution. In other words, both a row store and a column store exist in the engine at the same time. The row store serves TP, and the column store serves AP. Compared with the separate deployment of an OLTP database and an OLAP database to meet business requirements, a single HTAP engine has the following advantages:
Row-oriented data and column-oriented data are consistent in real time and can meet many high business requirements. All data can be found in analytical queries once written.
The HTAP engine is cost-effective. Users can easily specify which columns or even which range of a table is stored in the column store format for analysis. Full data continues to be stored in a row store.
Management and O&M are convenient. Users do not need to worry about data synchronization and data consistency between the two systems.
PolarDB uses a hybrid row-column store technology similar to that of commercial databases such as Oracle and SQL Server. The PolarDB team calls the technology IMCI.
When users create a table, users can specify part of the table or some columns to use the column store format. Users can use the
ALTER TABLEstatement to add column store attributes to existing tables. Analytical queries automatically use the column store format to accelerate queries.By default, column-oriented data is compressed and stored on disks, and In-Memory Column Store Area can be used for cache and query acceleration. The traditional row format is still stored in the buffer pool for OLTP loads to use.
All additions, deletions, and modifications of transactions are reflected in the column store in real time. This ensures transaction-level data consistency.

It is technically difficult to implement a hybrid row-column store engine. Users also face different situations when users add column store support to a mature OLTP load-optimized storage engine like InnoDB:
Meeting the OLTP requirements is a top priority. Therefore, adding column store support cannot have too much impact on TP performance. This requires that the maintenance of the column store must be sufficiently lightweight. If necessary, users need to sacrifice AP performance to ensure TP performance.
The design of the column store does not need to consider the modification of data by transaction concurrency and the unique check of data. These problems have been solved in the row store system but are very difficult for individual column store engines such as ClickHouse.
With a proven row store system, a problem with the column store system can be dealt with by switching back to the row store system to respond to query requests.
The effect of the preceding conditions is mixed and has affected the design of the hybrid row-column store solution for PolarDB.
Column store implemented as an index
In the architecture of the plug-in storage engine framework of MySQL, the simplest solution to add column store support is to implement a separate storage engine, like a column store of Infobright and MariaDB. PolarDB uses the solution of implementing a column store as a secondary index of InnoDB. This is mainly based on the following considerations:
InnoDB natively supports multiple indexes. INSERT, UPDATE, and DELETE operations apply to the primary index and all secondary indexes at row granularity and guarantee transactions. The solution of implementing a column store as a secondary index can reuse this transaction processing framework.
The column store implemented as a secondary index can use the same data encoding format as other row store indexes. Memory copy is enough. Users do not need to consider information such as character sets and collations.
A secondary index can be flexibly managed. Users can specify the columns in the index when users create a table. Users can also use DDL statements to add or delete columns in a secondary index in subsequent operations. For example, users can add INT, FLOAT, and DOUBLE columns that need to be analyzed to a columnar index. The TEXT and BLOB fields that are generally involved only in point queries but take up a lot of space can be kept in the row store.
The crash recovery process can reuse the redo transaction log module of InnoDB to ensure seamless compatibility with existing implementations. This also facilitates the physical replication process of PolarDB. Column store indexes can be generated on independent RO nodes or standby nodes to provide analysis services.
A secondary index has the same lifecycle as the primary index. Users can conveniently manage the secondary index.

As shown in the preceding figure, all the primary and secondary indexes in PolarDB are implemented as a B+ tree. A columnar index is an index by definition, but it is a virtual index used to capture the addition, deletion, and modification operations on the columns covered by the index.
For the preceding table, the primary index contains five columns (C1, C2, C3, C4, and C5) of data, and the secondary index contains two columns (C2 and C1) of data. In the common secondary index, C2 and C1 are encoded into one row and stored in the B+ tree. The column store index contains three columns (C2, C3, and C4) of data. In actual physical storage, the three columns are split and stored independently. Each column is converted into the column store format based on their writing order.
Another advantage of implementing a column store as a secondary index is that the engineering implementation of the executor is very simple. The term of covering index already exists in MySQL, which means that all columns required by a query are stored in a secondary index. The data in this secondary index can be directly used to meet the query requirements. Compared with the primary index, the use of a secondary index can greatly reduce the amount of data read and thus improve the query performance. When all columns required for a query are covered by a columnar index, the acceleration of the column store can improve query performance by dozens or even hundreds of times.
Organization of column-oriented data
Each column in a columnar index is stored in an unordered format and uses the append write mode. Space reclamation is achieved by using label deletion and asynchronous compaction in the background. The specific implementation has the following key points:
Records in a columnar index are organized based on row groups. Each row group contains 64,000 rows. Different columns in each row group are packaged to form a data pack.
Each row group uses the append write mode, and the data pack that belongs to each column also uses the append write mode. For a columnar index, only an active row group is responsible for accepting new writes. When the row group is full, it freezes. All the data packs that are contained in the row group are compressed and stored on disks, and the statistics of each data block are recorded to facilitate filtering.
Each new row written to the column-oriented row group is assigned a row ID for positioning. The system calculates the position for all columns that belong to a row by using the row ID. At the same time, the system maintains the mapping index from the primary keys to row IDs to support subsequent deletion and modification operations.
Update operations are implemented based on label deletion. To perform delete operations, users can directly specify the bitmap. To perform update operations, users can calculate the original position based on the row ID, specify the deletion label, and then write the new data to an active row group.
When the number of invalid records in a row group exceeds a specific threshold, asynchronous compaction is triggered in the background. On the one hand, the asynchronous compaction is used to reclaim space. On the other hand, the asynchronous compaction can make effective data storage more compact and improve the efficiency of analytical query orders.

This data organization method meets the requirement of batch scanning and filtering by columns during analytical queries. In addition, the impact on TP transaction operations is very small. For a write operation, users need to only append data to the memory by column. For a delete operation, users need to only specify a deletion label. An update operation involves label deletion and an append write. The column store supports transaction-level updates without reducing the OLTP performance.
Full and incremental row-to-column conversion
Row-to-column operations are performed in the following cases:
Case 1: DDL statements are used to create a columnar index on some columns if users have analytical requirements for an existing table. Users need to scan the data in the entire table to create a columnar index.
Case 2: Row-to-column operations are performed on columns involved in a transaction operation.
In the case of full-table row-to-column conversion, users can use parallel scanning to scan primary keys of InnoDB and convert all the columns involved into the column store format in turn. This operation is very fast and only limited by the I/O throughput speed and available CPU resources on the server. This operation is an online DDL process and does not block the running of online services.

After a columnar index is created on a table, all update transactions simultaneously update the row-oriented and column-oriented data to ensure the transactional consistency of the row-oriented and column-oriented data. The following figure demonstrates the difference when the IMCI feature is disabled and enabled.
When the IMCI feature is disabled, updates to all rows by the transaction are locked first. Then, the data page is modified. All locked records are unlocked at a time before the transaction is committed.
After the IMCI feature is enabled, the transaction system creates an update cache for the column store. When all data pages are modified, the modification operations of the column store are recorded. Before the transaction ends and the locked records are unlocked, the update cache is applied to the column store system.

For general OLTP requests, the time spent in updating the memory data pages accounts for only a small part of the transaction operation process. Therefore, this method has very little impact on the latency of TP transactions. For large transactions that operate on a large number of rows, the updates to the columnar index are directly applied to the column store in real time, but the updates are not publicly visible before the transaction is committed. This ensures that the commit latency of large transactions increases within a very small time range. To further reduce the impact on TP performance, the updates to the columnar index can be asynchronously applied to the column store if AP queries do not have high requirements for data timeliness.
The column store provides the same level of transaction isolation as the row store does. For each write operation, each row in a row group records the ID of the transaction that modifies the row. For each label deletion operation in a delete bitmap, the transaction ID of the operation is also recorded. By writing and deleting transaction IDs, AP queries can obtain a globally consistent snapshot in a very lightweight manner.
Rough index scheme of columnar indexes
As can be seen from the preceding storage format, all data packs in IMCI are unordered and use the append write mode. Therefore, it is impossible to filter out data that does not meet the requirements as accurately as normal ordered indexes of InnoDB. In IMCI, users can use statistics to filter data blocks to reduce the unit price of data access.
When each active data pack finishes writing, the system performs computing in advance. Then, the system generates information, including the minimum value, the maximum value, the sum of values, the number of null values, and the total number of records, in the data pack. All the information is maintained in the metadata area of data packs and resides in the memory. Since data deletion operations are involved in frozen data packs, the update and maintenance of statistics are performed in the background.
For query requests, data packs are classified into relevant, irrelevant, and possibly relevant data packs based on query conditions. This reduces actual data block access. For some aggregate query operations such as count and sum, users can perform simple operations on pre-computed statistical values. These data blocks do not even need to be decompressed.

The rough index scheme based on statistics is not very friendly for some queries that need to accurately locate some data. However, in a hybrid row-column store engine, columnar indexes need to only assist in accelerating queries that involve the scanning of a large amount of data. In this scenario, IMCI has significant advantages. For SQL queries that access only a small amount of data, the optimizer usually uses the cost model to calculate a more cost-effective solution based on a row store.
Isolation of TP and AP resources in the hybrid row-column store solution
The hybrid row-column store solution of PolarDB supports both AP queries and TP queries in one cluster. However, many services have high OLTP loads, and sudden OLAP loads may interfere with the response latency of TP services. Therefore, it is necessary to support load isolation in HTAP databases. With the write-once-read-many architecture of PolarDB, users can easily isolate AP loads and TP loads. Based on the technical architecture of PolarDB, the following deployment modes are supported:
Mode 1: The hybrid row-column store solution is enabled on read/write (RW) nodes. This mode supports lightweight AP queries. This mode is suitable when TP loads are mainly used and AP requests are relatively few. Users can also use this mode if users want to use PolarDB to query reports and the data comes from batch data import.
Mode 2: RW nodes support OLTP loads, and an AP-type RO node is started to enable the hybrid row-column store solution to support queries. In this mode, CPU resources can be completely isolated, and all the memory of the AP-type RO node can be allocated to the column store and executor. However, since the same shared storage is used, I/O is affected.
Mode 3: Both RW and RO nodes support OLTP loads, and the hybrid row-column store solution is enabled on a separate standby node to support AP queries. Since the standby node uses an independent shared storage cluster, this mode can also achieve the isolation of I/O resources, in addition to the isolation of CPU and memory resources that is achieved in Mode 2.

In addition to the preceding different deployment architectures that support different levels of resource isolation, in PolarDB, automatic degree of parallelism (Auto DOP) is supported in some large queries that need to be executed in parallel. This mechanism takes into account the current system loads and available CPU and memory resources, and limits the resources that can be used by a single query. This prevents a single query from consuming too many resources and affecting the processing of other requests.
OLAP performance
For more information, see IMCI performance.