PolarDB supports the Extract Transform Load (ETL) feature. After the ETL from IMCI feature is enabled, you can use In-memory Column Indexes (IMCIs) on the primary node. The SELECT requests in the statements that you execute on the primary node are sent to read-only column store nodes. Then, you can use the IMCI feature to accelerate queries. After data is read, PolarDB transmits data back to the primary node over the internal network and writes data to destination tables.
How it works

Prerequisites
Your PolarDB cluster uses one of the following versions:
A cluster of PolarDB for MySQL 8.0.1 whose revision version is 8.0.1.1.29 or later.
A cluster of PolarDB for MySQL 8.0.2 whose revision version is 8.0.2.2.12 or later.
For information about how to check the version, see the "Query the engine version" section of the Engine versions topic.
Limits
This feature applies only to the following SQL statements:
CREATE TABLE table_name [AS] SELECT ...INSERT ... SELECT ...
Usage
You can configure parameters in the database to specify whether to read data from read-only column store nodes and whether to compress files when reading data from read-only column store nodes. The following table describes the parameters.
Parameter | Description |
etl_from_imci | Specifies whether to read data from read-only column store nodes. Valid values:
|
etl_from_imci_compress | Specifies whether to compress files when reading data from read-only column store nodes. Valid values:
|
When you modify parameter values, you can set each parameter at one of the following levels: global, session, and statement. In the following examples, the etl_from_imci parameter is used.
Set the parameter at the global level.
SET GLOBAL etl_from_imci = ON;After you set the parameter at the global level, all
CREATE TABLE new_tbl [AS] SELECT ..andINSERT ... SELECT ...requests within the current cluster read data from read-only column store nodes.Set the parameter at the session level.
SET etl_from_imci = ON;After you set the parameter at the session level, only the
CREATE TABLE new_tbl [AS] SELECT ..andINSERT ... SELECT ...requests in the current session read data from read-only column store nodes.Set the parameter at the statement level.
You can use hints to specify whether to read data from read-only column store nodes for an SQL statement. Example:
CREATE TABLE t2 SELECT /*+ SET_VAR(etl_from_imci=ON) */ * from t1 where 'A' = 'a';
When the etl_from_imci parameter is set to ON, data is read from read-only column store nodes. After data is read from read-only column store nodes and sent back to the primary node, you can execute the SHOW processlist statement to view the current status of the process list, which is expected to be ETL FROM IMCI.
Usage notes
If complex query conditions are specified and the execution time of SQL statements is long but only a small amount of data is returned, you can enable the ETL from IMCI feature to significantly improve performance.
However, the ETL from IMCI feature may deteriorate performance in the following scenarios:
For simple queries, reading data from remote read-only column store nodes incurs additional overheads for network transmission and result set parsing. Therefore, performance may be degraded.
When the result set of a query contains a large amount of data, reading data from remote read-only column store nodes, sending data back to the primary node, and writing data to the table on the primary node may cause performance degradation.
Performance comparison
The following tests use 10 GB of data that is generated based on TPC Benchmark H (TPC-H) to test the performance of complex queries.
The query result contains one row of data.
select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year and l_discount between .06 - 0.01 and .06 + 0.01 and l_quantity < 24Compare the consumed time values (in seconds) if the query result contains one row of data.
Query data on the read-only column store node
Execute the
CREATE TABLE new_tbl [AS] SELECT ...statement on the primary node when the ETL from IMCI feature is not enabledExecute the
CREATE TABLE new_tbl [AS] SELECT ...statement on the primary node when the ETL from IMCI feature is enabledExecute the
INSERT ... SELECT ...statement on the primary node when the ETL from IMCI feature is not enabledExecute the
INSERT ... SELECT ...statement on the primary node when the ETL from IMCI feature is enabled0.05
>60
0.17
>60
0.08
The query result contains four rows of data.
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '90' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatusCompare the consumed time values (in seconds) if the query result contains four rows of data.
Query data on the read-only column store node
Execute the
CREATE TABLE new_tbl [AS] SELECT ...statement on the primary node when the ETL from IMCI feature is not enabledExecute the
CREATE TABLE new_tbl [AS] SELECT ...statement on the primary node when the ETL from IMCI feature is enabledExecute the
INSERT ... SELECT ...statement on the primary node when the ETL from IMCI feature is not enabledExecute the
INSERT ... SELECT ...statement on the primary node when the ETL from IMCI feature is enabled0.58
>60
0.64
>60
0.58
The query result contains 27,840 rows of data.
select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'MEDIUM POLISHED%' and p_size in (49, 14, 23, 45, 19, 3, 36, 9) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_sizeCompare the consumed time values (in seconds) if the query result contains 27,840 rows of data.
Query data on the read-only column store node
Execute the
CREATE TABLE new_tbl [AS] SELECT ...statement on the primary node when the ETL from IMCI feature is not enabledExecute the
CREATE TABLE new_tbl [AS] SELECT ...statement on the primary node when the ETL from IMCI feature is enabledExecute the
INSERT ... SELECT ...statement on the primary node when the ETL from IMCI feature is not enabledExecute the
INSERT ... SELECT ...statement on the primary node when the ETL from IMCI feature is enabled0.55
>60
0.92
>60
0.82