This topic describes the In-Memory Column Index (IMCI) feature that is soon to be released for the PolarDB for MySQL engine.
PolarDB for MySQL is oriented to online transaction processing (OLTP) scenarios that involve online business and large amounts of data. As a result, it is difficult for row store-based PolarDB for MySQL to meet the query performance requirements of all scenarios. In most cases, to implement complex analytic queries, you need to export data from PolarDB for MySQL, and then import the data to an online analytical processing (OLAP) system for analysis and queries. In this context, two database systems are required, and costs, architecture complexity, and O&M loads increase.
The PolarDB for MySQL engine will release IMCI oriented to OLAP scenarios that involve large amounts of data and complex query requirements. PolarDB for MySQL provides IMCI to implement integrated real-time transaction processing and data analysis features and provides a one-stop hybrid transaction/analytical processing (HTAP) solution. PolarDB for MySQL allows you to use only one system to meet requirements of OLTP and OLAP scenarios.
How it works
The following figure shows the architecture of the IMCI feature in PolarDB for MySQL.PolarDB for MySQL provides the IMCI feature designed at the storage engine, query engine, and optimizer layers.
Storage engine: supports the hybrid row-column storage that ensures real-time transactional consistency.
Query engine: uses the vectorized parallel query engine that is oriented to column store. Single-table queries and multi-table queries can be implemented with minimal latency.
SQL parser and optimizer: uses the cost-based optimizer (CBO) that is oriented to hybrid row-column storage. The optimizer automatically selects row store or column store based on the cost threshold to execute query requests.
This architecture helps PolarDB for MySQL accelerate queries by several orders of magnitude while PolarDB for MySQL is completely compatible with the MySQL protocol.
Equipped with the IMCI feature, PolarDB for MySQL has the following features:
Complete compatibility with MySQL: A system is provided for column store. This system is consistent with the system provided for row store and supports flexible type conversion.
Ultimate HTAP performance: PolarDB provides ultimate performance in terms of OLTP. The IMCI feature provides OLAP with performance commensurate with the OLAP system.
Hybrid row-column storage: Both row store and column store are supported, which saves costs. Moreover, transactional consistency is ensured for row store and column store. Column store also has an advantage in lower costs.
The IMCI feature of PolarDB for MySQL provides a one-stop HTAP experience that can be used in a variety of business scenarios.
Scenarios where real-time analysis of online data is required, such as real-time reports.
Data warehousing scenarios that depend on the large volume data storage capacity of PolarDB to aggregate multiple upstream data sources and use PolarDB as the dedicated data warehouse.
Extract, transform, load (ETL)-faced accelerated data computing scenarios that depend on the powerful and flexible computing capability of IMCI provided by PolarDB to implement ETL features by using SQL syntax.
The IMCI feature notably accelerates queries executed by using SQL statements by up to one hundred folds. The following section provides a query test to verify the acceleration effects. The data tables and SQL statements contained in the standard TPC-H benchmark are used in the example.
Test method: TPC-H is a commonly used benchmark that is developed and released by the Transaction Processing Performance Council (TPC) to evaluate the analytic query capabilities of databases. The TPC-H benchmark contains eight tables and 22 complex SQL statements. Most of the queries contain JOIN clauses on several tables, subqueries, and GROUP BY clauses.
In this example, a test based on the TPC-H benchmark is implemented, but it does not meet all the requirements of a TPC-H benchmark test. Therefore, the test results are incomparable with the published results of the TPC-H benchmark test.
Data size: 100 GB.
Compare performance between IMCI-enabled and IMCI-disabled scenarios
The following figure shows the response time difference between IMCI-enabled and IMCI-disabled scenarios when 22 complex SQL statements of the TPC-H benchmark are executed.
Compare performance between ClickHouse and IMCI-enabled PolarDB for MySQL
The following figure shows the response time difference between ClickHouse and IMCI-enabled PolarDB for MySQL when 21 complex SQL statements of the TPC-H benchmark are executed. The two databases have the same amount of data and the same data schema. Query Statement 21 is not executed because ClickHouse does not support Query Statement 21.
The IMCI feature notably accelerates most complex queries by up to one hundred folds.
Each of the traditional OLAP database service ClickHouse and IMCI-enabled PolarDB for MySQL has their own advantages. PolarDB for MySQL excels in single table scan and aggregation and JOIN scenarios. In the future, the IMCI feature of PolarDB for MySQL will be tuned on an ongoing basis and make a breakthrough in terms of aggregation acceleration and window functions.