The In-Memory Column Index (IMCI) feature stores table columns in columnar format on a PolarDB read-only node, enabling a hybrid row-and-columnar storage model. IMCI accelerates complex analytical queries—including large-scale scans, aggregations, multi-table joins, and subqueries—without affecting write performance on row store nodes. Write performance decreases by less than 5% after creating IMCIs.
When to use IMCI
IMCI delivers the most benefit when:
-
Your queries scan tens of millions of rows or more
-
Your workload includes aggregations (SUM, AVG, COUNT), GROUP BY, or multi-table JOINs
-
You run both online analytical processing (OLAP) and online transaction processing (OLTP) workloads on the same cluster and want automatic resource isolation between them
IMCI does not accelerate a query unless all columns referenced in the query are covered by an IMCI. If only some tables have IMCIs, queries that touch uncovered tables fall back to row index execution.
Prerequisites
Before you begin, make sure you have:
-
A PolarDB cluster running Enterprise Edition
-
At least one read-only IMCI node added to the cluster. For instructions, see Add a read-only IMCI node
-
(Optional) The
tpchdatabase with 100 GB TPC-H benchmark data, if you want to reproduce the benchmarks in this document
Test environment
The benchmarks in this document use the following configuration.
Cluster
| Setting | Value |
|---|---|
| Database edition | Enterprise Edition |
| Cluster type | Cluster Edition (Dedicated) |
| Database engine | 8.0.1.1.45.2 |
| Hot standby storage cluster | Enabled |
| Primary node | 32 cores, 256 GB memory (polar.mysql.x8.4xlarge) |
| Read-only IMCI node | 32 cores, 256 GB memory (polar.mysql.x8.4xlarge) |
| Storage type | PSL5 |
| Parameter template | MySQL_InnoDB_8.0_ Standard Edition_Default parameter template |
Dataset: 100 GB TPC-H benchmark data in the tpch database.
| Table | Rows | Size (GB) |
|---|---|---|
| customer | 13,179,406 | 2.59 |
| lineitem | 590,446,240 | 87.52 |
| nation | 25 | 0.00 |
| orders | 142,929,780 | 18.70 |
| part | 19,354,445 | 3.11 |
| partsupp | 67,862,725 | 20.45 |
| region | 5 | 0.00 |
| supplier | 986,923 | 0.17 |
Row counts and table sizes vary based on indexes, storage engines, statistics, and system tables. Your output may differ. The TPC-H workload in this document is based on the TPC-H benchmark but does not satisfy all benchmark requirements. Results are not comparable to published TPC-H benchmark results.
Single-table query acceleration
Step 1: Establish a baseline
Run the following queries against the lineitem table before creating any IMCI and record the execution times.
Single-table scan with filter
SELECT * FROM lineitem WHERE L_COMMENT > 'aaaaaaaa' AND L_COMMENT < 'aaaaaaz';
-- Sample result
Empty set (8 min 47.29 sec)
Single-column aggregation
SELECT SUM(L_DISCOUNT) FROM lineitem;
-- Sample result
+-----------------+
| SUM(L_DISCOUNT) |
+-----------------+
| 30001636.44 |
+-----------------+
1 row in set (2 min 6.64 sec)
Grouped aggregation
SELECT AVG(L_DISCOUNT) FROM lineitem
WHERE L_SHIPDATE <= date '1998-12-01' - interval '90' day
GROUP BY L_RETURNFLAG, L_LINESTATUS;
-- Sample result: 4 rows in set (6 min 28.96 sec)
Pagination with ORDER BY and LIMIT
SELECT L_ORDERKEY, SUM(L_QUANTITY) FROM lineitem
GROUP BY L_ORDERKEY
ORDER BY SUM(L_QUANTITY) DESC
LIMIT 1000000, 100;
-- Sample result: 100 rows in set (12 min 24.22 sec)
Step 2: Create an IMCI on the lineitem table
ALTER TABLE lineitem COMMENT 'COLUMNAR=1 lineitem';
-- Sample result
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
For additional options, see Create an IMCI.
Step 3: Wait for the IMCI to finish building
Query INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS to monitor build progress.
SELECT * FROM INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS;
Key fields in the output:
| Field | Meaning |
|---|---|
STATUS |
Building: index is being created. Safe to read: index is ready. |
SCANNED_ROWS |
Rows processed so far, shown with percentage (for example, 36718757(6%)). |
ESTIMATE_SECOND |
Estimated remaining build time in seconds. |
Sample output while building:
| SCHEMA_NAME | TABLE_NAME | STATUS | APPROXIMATE_ROWS | SCANNED_ROWS | ESTIMATE_SECOND |
| tpch | lineitem | Building | 590446240 | 36718757(6%) | 299 |
Sample output when complete:
| SCHEMA_NAME | TABLE_NAME | FINISHED_AT | STATUS | APPROXIMATE_ROWS | SCANNED_ROWS |
| tpch | lineitem | 2024-10-21 13:50:11 | Safe to read | 590446240 | 600037902(100%) |
For more information, see View the building progress of IMCIs.
Step 4: Re-run the baseline queries and compare
Run the same four queries from Step 1. Sample results with IMCI:
| Query type | With IMCI (sec) | Without IMCI (sec) |
|---|---|---|
| Single-table scan and filter | 1.47 | 527.29 |
| Single-column aggregation | 0.06 | 126.64 |
| Grouped aggregation (GROUP BY) | 2.54 | 388.96 |
| Pagination (ORDER BY and LIMIT) | 12.80 | 744.22 |
These results are benchmarks, not guarantees. Actual execution times vary with cluster configuration, connection count, query concurrency, and system load.
Multi-table query and subquery acceleration
Step 1: Establish a baseline
At this point, only lineitem has an IMCI. Run the following queries and record the execution times. Queries that join tables without IMCIs do not benefit from columnar acceleration.
To check whether your query is fully covered by IMCIs before running it, call dbms_imci.check_columnar_index('<query_string>'). The stored procedure identifies tables and columns not covered by IMCIs. See Check whether an IMCI is created for a table in an SQL statement and DDL tools for IMCIs.
Multi-table join
SELECT COUNT(l3.L_DISCOUNT)
FROM (
(
(
(
(nation n1 STRAIGHT_JOIN nation n2 ON n1.N_NATIONKEY = n2.N_NATIONKEY)
STRAIGHT_JOIN supplier ON n2.N_NATIONKEY = supplier.S_NATIONKEY AND S_SUPPKEY < 2000
)
STRAIGHT_JOIN lineitem AS l1 ON l1.L_SUPPKEY = supplier.S_SUPPKEY
)
STRAIGHT_JOIN lineitem AS l2 ON l1.L_ORDERKEY = l2.L_ORDERKEY AND l1.L_LINENUMBER = l2.L_LINENUMBER
)
STRAIGHT_JOIN lineitem AS l3 ON l2.L_ORDERKEY = l3.L_ORDERKEY AND l2.L_LINENUMBER = l3.L_LINENUMBER
)
GROUP BY n1.N_NAME;
-- Result: query stopped after 7,200 seconds (execution timeout)
Query with a subquery
SELECT O_ORDERPRIORITY, COUNT(*) AS ORDER_COUNT
FROM orders
WHERE O_ORDERDATE >= '1995-01-01'
AND O_ORDERDATE < date_add('1995-01-01', interval '3' month)
AND EXISTS (
SELECT * FROM lineitem
WHERE L_ORDERKEY = O_ORDERKEY AND L_COMMITDATE < L_RECEIPTDATE
)
GROUP BY O_ORDERPRIORITY
ORDER BY O_ORDERPRIORITY;
-- Sample result: 5 rows in set (4 min 9.51 sec)
Multi-table join with subqueries
SELECT C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE, SUM(L_QUANTITY)
FROM (
SELECT * FROM orders
WHERE O_ORDERKEY IN (
SELECT L_ORDERKEY FROM lineitem GROUP BY L_ORDERKEY HAVING SUM(L_QUANTITY) > 300
)
) AS tmp, customer, lineitem
WHERE C_CUSTKEY = O_CUSTKEY AND O_ORDERKEY = L_ORDERKEY
GROUP BY C_NAME, C_CUSTKEY, O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE
ORDER BY O_TOTALPRICE DESC, O_ORDERDATE;
-- Sample result: 6,398 rows in set (12 min 46.15 sec)
Step 2: Create IMCIs for all tables in the tpch database
Use the batch DDL command to create IMCIs for all tables at once:
CREATE COLUMNAR INDEX FOR TABLES IN tpch;
-- Sample output
+------------+-------------------+
| Table_Name | Result |
+------------+-------------------+
| customer | Ok |
| lineitem | Skip by no change |
| nation | Ok |
| orders | Ok |
| part | Ok |
| partsupp | Ok |
| region | Ok |
| supplier | Ok |
+------------+-------------------+
8 rows in set (56.74 sec)
lineitem shows Skip by no change because an IMCI already exists for it.
Step 3: Wait for all IMCIs to finish building
SELECT * FROM INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS;
Wait until all tables show Safe to read in the STATUS field. Sample output when all IMCIs are ready:
| SCHEMA_NAME | TABLE_NAME | FINISHED_AT | STATUS | APPROXIMATE_ROWS | SCANNED_ROWS |
| tpch | region | 2024-10-21 14:43:27 | Safe to read | 5 | 5(100%) |
| tpch | lineitem | 2024-10-21 14:42:23 | Safe to read | 590446240 | 600037902(100%) |
| tpch | supplier | 2024-10-21 14:44:17 | Safe to read | 986923 | 1000000(100%) |
| tpch | part | 2024-10-21 14:43:38 | Safe to read | 19354445 | 20000000(100%) |
| tpch | customer | 2024-10-21 14:43:27 | Safe to read | 13179406 | 15000000(100%) |
| tpch | nation | 2024-10-21 14:43:27 | Safe to read | 25 | 25(100%) |
| tpch | partsupp | 2024-10-21 14:44:16 | Safe to read | 67862725 | 80000000(100%) |
| tpch | orders | 2024-10-21 14:44:27 | Safe to read | 142929780 | 150000000(100%) |
Step 4: Re-run the queries and compare
Run the same three queries from Step 1. Sample results with full IMCI coverage:
| Query type | With IMCI (sec) | Without IMCI (sec) |
|---|---|---|
| Multi-table join | 6.25 | >7,200 (timeout) |
| Query with a subquery | 2.49 | 249.51 |
| Multi-table join with subqueries | 16.16 | 766.15 |
These results are benchmarks, not guarantees. Actual execution times vary with cluster configuration, connection count, query concurrency, and system load.
Configure a sort key to improve filter performance
By default, IMCI data is organized into row groups of 64,000 rows each, built in parallel based on primary key order. The resulting column data blocks are unordered across row groups. For queries with selective filters on specific columns, configuring a sort key reduces the number of row groups scanned and improves performance.
For details, see Configure sort keys for IMCIs.
Step 1: Verify the imci_enable_pack_order_key parameter
The imci_enable_pack_order_key parameter controls IMCI data sorting. Its default value is ON. If you have not changed it, skip this step.
In the PolarDB console, all parameters use theloose_prefix. To modify this parameter in the console, useloose_imci_enable_pack_order_key. For instructions, see Configure cluster and node parameters.
Step 2: Run a baseline query without a sort key
SELECT
L_SHIPMODE,
SUM(CASE WHEN O_ORDERPRIORITY = '1-URGENT' OR O_ORDERPRIORITY = '2-HIGH' THEN 1 ELSE 0 END) AS high_line_count,
SUM(CASE WHEN O_ORDERPRIORITY <> '1-URGENT' AND O_ORDERPRIORITY <> '2-HIGH' THEN 1 ELSE 0 END) AS low_line_count
FROM orders, lineitem
WHERE O_ORDERKEY = L_ORDERKEY
AND L_SHIPMODE IN ('MAIL', 'SHIP')
AND L_COMMITDATE < L_RECEIPTDATE
AND L_SHIPDATE < L_COMMITDATE
AND L_RECEIPTDATE >= date '1994-01-01'
AND L_RECEIPTDATE < date '1994-01-01' + interval '1' year
GROUP BY L_SHIPMODE
ORDER BY L_SHIPMODE;
-- Sample result: 2 rows in set (4.35 sec)
Step 3: Add a sort key to the lineitem table
ALTER TABLE lineitem COMMENT='COLUMNAR=1 order_key=L_RECEIPTDATE,L_SHIPMODE';
Step 4: Wait for the sorted IMCI to finish building
Monitor progress with SELECT * FROM INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS until lineitem shows Safe to read. For estimated build time, see Time consumption.
Step 5: Re-run the query and compare
+------------+-----------------+----------------+
| L_SHIPMODE | high_line_count | low_line_count |
+------------+-----------------+----------------+
| MAIL | 623115 | 934713 |
| SHIP | 622979 | 934534 |
+------------+-----------------+----------------+
2 rows in set (0.88 sec)
| Ordered dataset (sec) | Unordered dataset (sec) |
|---|---|
| 0.88 | 4.35 |
These results are benchmarks, not guarantees. Actual execution times vary with cluster configuration, connection count, query concurrency, and system load.
Route OLAP and OLTP requests
After you add a read-only IMCI node, PolarDB automatically distributes OLAP and OLTP requests when clients connect through the cluster endpoint. PolarProxy routes a query to the IMCI node when the estimated execution cost exceeds the threshold set by loose_imci_ap_threshold (or loose_cost_threshold_for_imci in earlier engine versions) and Transactional/Analytical Processing Splitting is enabled.
-
Mixed workloads from one application: Use the cluster endpoint. The system selects the row store or IMCI node based on estimated scan cost.
-
Separate OLAP and OLTP applications: Create a custom endpoint for each application and add the appropriate nodes (row store or IMCI) to the selected nodes of each endpoint.
For configuration instructions, see HTAP-based request distribution among row store and IMCI nodes.
Enable serverless on a read-only IMCI node
If your analytical workload fluctuates unpredictably, enable the serverless feature on the IMCI node to scale compute capacity automatically. The node scales up within seconds when load increases and scales down when load decreases.
To enable serverless, go to Basic Information > Database Nodes in the cluster, then select Enable serverless. For step-by-step instructions, see Enable the serverless feature for a fixed-specification cluster.
For an overview of the serverless feature, see Overview about the serverless feature.
Performance and billing
Query acceleration: IMCI can accelerate complex analytical queries by up to 100x, with the greatest gains in SCAN, AGGREGATE, and JOIN operations.
Write overhead: Write performance decreases by less than 5% after creating IMCIs. In Sysbench oltp_insert workloads, the decrease is approximately 3%.
Storage: IMCI uses columnar compression with a 3x–10x compression ratio compared to row storage, occupying approximately 10%–30% of the equivalent row storage space.
Billing: The IMCI feature itself is free of charge. Read-only IMCI nodes are billed as common compute nodes. For details, see Billing rules for compute nodes and Billing rules for storage.
Advanced usage
For sort key tuning, partial column indexing, and other optimization techniques, see Advanced IMCI usage.
FAQ
Why doesn't my query use IMCI?
IMCI acceleration requires four conditions: a read-only IMCI node exists in the cluster, all tables and columns in the query are covered by IMCIs, the estimated execution cost exceeds the configured threshold, and the query is routed to the IMCI node. Work through the following checks to identify the cause.
1. Check whether the query is routed to the IMCI node.
Verify that the IMCI node is included in the selected nodes of the cluster endpoint and that Transactional/Analytical Processing Splitting is enabled. Use SQL Explorer to confirm which node ran the query. For instructions, see SQL Explorer.
To force a query to the IMCI node for testing, prefix it with the /*FORCE_IMCI_NODES*/ hint:
/*FORCE_IMCI_NODES*/EXPLAIN SELECT COUNT(*) FROM t1 WHERE t1.a > 1;
Alternatively, create a custom endpoint associated only with the IMCI node. For details, see HTAP-based request distribution among row store and IMCI nodes.
2. Check whether the estimated execution cost meets the threshold.
On the IMCI node, the optimizer uses IMCI only when the estimated cost exceeds the threshold set by loose_imci_ap_threshold (or loose_cost_threshold_for_imci in engine versions earlier than 8.0.1.1.39 or 8.0.2.2.23). Run EXPLAIN and then check the estimated cost:
EXPLAIN SELECT * FROM t1;
/*ROUTE_TO_LAST_USED*/SHOW STATUS LIKE 'Last_query_cost';
Use the/*ROUTE_TO_LAST_USED*/hint withSHOW STATUS LIKE 'Last_query_cost'when connected through the cluster endpoint to make sure the cost is retrieved from the correct node.
If the estimated cost is below the threshold, lower the threshold or override it for a single query:
/*FORCE_IMCI_NODES*/EXPLAIN SELECT /*+ SET_VAR(cost_threshold_for_imci=0) */ COUNT(*) FROM t1 WHERE t1.a > 1;
3. Check whether all columns in the query are covered by an IMCI.
Call the built-in stored procedure to check coverage:
CALL dbms_imci.check_columnar_index('SELECT COUNT(*) FROM t1 WHERE t1.a > 1');
If the procedure returns uncovered tables or columns, create IMCIs for them. If the result is empty, full coverage is confirmed.
4. Check IMCI usage limits.
Some SQL constructs are not supported by IMCI. Review IMCI usage limits to confirm the query is eligible.
If the query still does not use IMCI after these checks, contact us or join DingTalk group 27520023189 for expert support.
How do I create the right IMCI for a query?
An IMCI accelerates a query only when it covers all columns referenced in the query. Use the dbms_imci.columnar_advise() stored procedure to get the exact DDL statement needed:
CALL dbms_imci.columnar_advise('<your_query_string>');
For a batch of queries, use dbms_imci.columnar_advise_begin(), dbms_imci.columnar_advise(), and dbms_imci.columnar_advise_end() together. For details, see Batch obtain the DDL statements used to create IMCIs and Obtain the DDL statement used to create an IMCI.
What's next
-
What is the IMCI feature? — Architecture overview and supported scenarios
-
Advanced IMCI usage — Sort key tuning, partial indexes, and more
-
IMCI usage limits — Supported and unsupported SQL constructs
-
Specify the thresholds for automatic request distribution — Tune routing thresholds