All Products
Search
Document Center

PolarDB:Instructions

Last Updated:Mar 28, 2026

In-memory column index (IMCI) accelerates complex, time-consuming SQL queries in hybrid transactional and analytical processing (HTAP) workloads. This guide walks you through the setup process and covers advanced options for fine-tuning IMCI to meet your requirements.

Prerequisites

Before you begin, ensure that you have:

  • A PolarDB for MySQL cluster

  • Permissions to add or remove nodes and modify cluster endpoint settings

Quick start

Complete the following steps to enable HTAP query acceleration on your cluster.

Step 1: Add a read-only IMCI node

Log on to the PolarDB console, select the region where your cluster is located, and find the target cluster in the Clusters list. In the Actions column, select Add/Remove Node to add a read-only IMCI node.

image
If you already added read-only IMCI nodes when purchasing the cluster, skip this step. Your cluster must also meet the version requirements for IMCI. For details, see Add a read-only column store node.

Step 2: Configure request routing

Choose how to route queries between row store nodes and IMCI nodes.

Automatic request distribution among row store and IMCI nodes

PolarDB automatically routes each read request to either a read-only IMCI node or a read-only row store node based on the estimated execution cost (number of rows scanned). Use this mode when a single application sends both OLAP and OLTP requests.

On the cluster details page, go to the Database Connections section. For the Cluster Endpoint, set Transactional/Analytical Processing Splitting to On.

image
Important

Manual request distribution among row store and IMCI nodes

If different applications handle OLAP and OLTP requests separately, create a custom endpoint that includes only Read-only IMCI Nodes as service nodes. Point your OLAP application to this endpoint.

On the cluster details page, go to the Database Connections section and create a custom endpoint. Set the service nodes to Read-only IMCI Nodes only.

image

Step 3: Add IMCIs to your tables

Important

All columns referenced in a SQL statement must be covered by an IMCI to accelerate that query.

Manually add IMCIs

IMCI supports the following DDL statements to add or remove column indexes on your tables:

DDL syntaxExample
Create IMCIs when creating a table-- Add IMCI for a specific column<br>CREATE TABLE <table_name>(<br> <column_name_1> INT COMMENT 'COLUMNAR=1',<br> <column_name_2> VARCHAR(100)<br>) ENGINE InnoDB;<br><br>-- Add IMCI for the entire table<br>CREATE TABLE <table_name>(<br> <column_name_1> INT,<br> <column_name_2> VARCHAR(100)<br>) ENGINE InnoDB COMMENT 'COLUMNAR=1';
Dynamically add or delete IMCIs-- Add IMCI for a specific column<br>ALTER TABLE <table_name> MODIFY COLUMN <column_name_1> INT COMMENT 'COLUMNAR=1';<br><br>-- Add IMCI for the entire table<br>ALTER TABLE <table_name> COMMENT 'COLUMNAR=1';
Manage table-level IMCIs-- Add IMCI for an entire table<br>CREATE COLUMNAR INDEX ON <db_name>.<table_name>;<br><br>-- Add IMCI for an entire database<br>CREATE COLUMNAR INDEX FOR TABLES IN <db_name>;
To preserve existing table comments, append COLUMNAR=1 before them: ALTER TABLE <table_name> COMMENT 'COLUMNAR=1 <original comment>';
When adding IMCIs in batches at the database or table level, the table comment is automatically updated to 'COLUMNAR=1 <original comment>'.

Automatically add IMCIs

The AutoIndex feature monitors slow queries and automatically creates IMCIs to accelerate them. As workloads change, AutoIndex continuously adjusts the IMCI strategy to maintain optimal performance — without requiring manual query analysis.

On the cluster details page, enable the AutoIndex feature:

image

Step 4 (Optional): Check IMCI build progress

After adding IMCIs manually, verify that the build is complete before expecting query acceleration. Run the following query:

SELECT * FROM INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS;

The output looks similar to this:

+-------------+------------+---------------------+---------------------+---------------------+--------------+------------------+-----------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+
| SCHEMA_NAME | TABLE_NAME | CREATED_AT          | STARTED_AT          | FINISHED_AT         | STATUS       | APPROXIMATE_ROWS | SCANNED_ROWS    | SCAN_SECOND | SORT_ROUNDS | SORT_SECOND | BUILD_ROWS | BUILD_SECOND | AVG_SPEED | SPEED_LAST_SECOND | ESTIMATE_SECOND |
+-------------+------------+---------------------+---------------------+---------------------+--------------+------------------+-----------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+
| tpch        | lineitem   | 2024-10-21 13:44:02 | 2024-10-21 13:44:02 | 2024-10-21 13:50:11 | Safe to read | 590446240        | 600037902(100%) | 369         | 0           | 0           | 0(0%)      | 0            | 1625058   | 0                 | 0               |
+-------------+------------+---------------------+---------------------+---------------------+--------------+------------------+-----------------+-------------+-------------+-------------+------------+--------------+-----------+-------------------+-----------------+
1 row in set, 1 warning (0.00 sec)

Check the STATUS column to determine build progress:

  • `Safe to read`: The IMCI is fully built and queries on this table can be accelerated.

  • Any other status: The build is still in progress. Wait and re-run the query.

For more information, see View DDL execution speed and build progress for IMCIs.

Step 5 (Optional): Verify that queries use IMCI

Run EXPLAIN on your query to confirm it uses the IMCI execution path:

EXPLAIN SELECT COUNT(*) FROM t1 WHERE t1.a > 1;

IMCI execution plan (horizontal tree format)

If the query uses IMCI, the output shows a horizontal tree structure:

+----+----------------------------+----------+--------+--------+-----------------------------------------------------------------------------+
| ID | Operator                   | Name     | E-Rows | E-Cost | Extra Info                                                                  |
+----+----------------------------+----------+--------+--------+-----------------------------------------------------------------------------+
|  1 | Select Statement           |          |        |        | IMCI Execution Plan (max_dop = 4, max_query_mem = 858993459)                |
|  2 | └─Sort                     |          |        |        | Sort Key: revenue DESC,o_orderdate ASC                                      |
|  3 |   └─Hash Groupby           |          |        |        | Group Key: (lineitem.L_ORDERKEY, orders.O_ORDERDATE, orders.O_SHIPPRIORITY) |
|  4 |     └─Hash Join            |          |        |        | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY                          |
|  5 |       ├─Hash Join          |          |        |        | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY                            |
|  6 |       │ ├─Table Scan       | customer |        |        | Cond: (C_MKTSEGMENT = "BUILDING")                                           |
|  7 |       │ └─Table Scan       | orders   |        |        | Cond: (O_ORDERDATE < 03/24/1995)                                            |
|  8 |       └─Table Scan         | lineitem |        |        | Cond: (L_SHIPDATE > 03/24/1995)                                             |
+----+----------------------------+----------+--------+--------+-----------------------------------------------------------------------------+
8 rows in set (0.01 sec)

The key signal is the Extra Info field on row 1: IMCI Execution Plan. This confirms the query is running on IMCI.

Row store execution plan

If the query uses the row store instead, the output follows the standard MySQL tabular format without the IMCI Execution Plan header:

+----+-------------+----------+------------+------+--------------------+------------+---------+-----------------------------+--------+----------+----------------------------------------------+
| id | select_type | table    | partitions | type | possible_keys      | key        | key_len | ref                         | rows   | filtered | Extra                                        |
+----+-------------+----------+------------+------+--------------------+------------+---------+-----------------------------+--------+----------+----------------------------------------------+
|  1 | SIMPLE      | customer | NULL       | ALL  | PRIMARY            | NULL       | NULL    | NULL                        | 147630 |    10.00 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | orders   | NULL       | ref  | PRIMARY,ORDERS_FK1 | ORDERS_FK1 | 4       | tpch100g.customer.C_CUSTKEY |     14 |    33.33 | Using where                                  |
|  1 | SIMPLE      | lineitem | NULL       | ref  | PRIMARY            | PRIMARY    | 4       | tpch100g.orders.O_ORDERKEY  |      4 |    33.33 | Using where                                  |
+----+-------------+----------+------------+------+--------------------+------------+---------+-----------------------------+--------+----------+----------------------------------------------+
3 rows in set, 1 warning (0.00 sec)

For more information, see Check whether a query can use IMCIs.

Auxiliary tools

When troubleshooting IMCI coverage for a specific query, use the following built-in stored procedures to check coverage and generate the DDL needed to add missing IMCIs.

FAQ

Why doesn't my SQL query use IMCI?

For IMCI to accelerate a query, all of the following conditions must be met:

  1. A read-only IMCI node is added to the cluster.

  2. IMCIs are created on all tables and columns referenced in the query.

  3. The query's estimated execution cost exceeds the configured threshold.

  4. The query is routed to the read-only IMCI node.

If a query still doesn't use IMCI, work through the following checks.

Check 1: Confirm the query reaches the read-only IMCI node

PolarProxy routes a query to the read-only IMCI node only when:

  • The query accesses the cluster through the cluster endpoint

  • Transactional/Analytical Processing Splitting is On for that endpoint

  • The estimated execution cost exceeds the threshold set by loose_imci_ap_threshold or loose_cost_threshold_for_imci

loose_imci_ap_threshold replaces loose_cost_threshold_for_imci in database engine minor versions 8.0.1.1.39 and later, or 8.0.2.2.23 and later.

To confirm the query is reaching the IMCI node, first check whether the read-only IMCI node is included in the selected nodes of the cluster endpoint. Then use SQL Explorer to confirm the query is executing on the IMCI node. Alternatively, force the query to the IMCI node using the /*FORCE_IMCI_NODES*/ hint:

/*FORCE_IMCI_NODES*/EXPLAIN SELECT COUNT(*) FROM t1 WHERE t1.a > 1;

For a permanent routing solution, create a custom endpoint associated only with the read-only IMCI node. See HTAP-based request distribution.

Check 2: Confirm the estimated execution cost exceeds the threshold

On the IMCI node, the optimizer compares the query's estimated cost against the threshold. If the cost is below the threshold, the optimizer uses row indexes instead of IMCI.

Run EXPLAIN followed by SHOW STATUS to compare the estimated cost against the threshold:

-- Get the execution plan
EXPLAIN SELECT * FROM t1;
-- Get the estimated cost of the previous query
SHOW STATUS LIKE 'Last_query_cost';
When connecting through the cluster endpoint, add the /*ROUTE_TO_LAST_USED*/ hint to the SHOW STATUS statement so it runs on the same node as the EXPLAIN: /*ROUTE_TO_LAST_USED*/SHOW STATUS LIKE 'Last_query_cost';

If the estimated cost is below the threshold, lower the threshold for loose_imci_ap_threshold or loose_cost_threshold_for_imci. To override the threshold for a single query, use a HINT:

/*FORCE_IMCI_NODES*/EXPLAIN SELECT /*+ SET_VAR(cost_threshold_for_imci=0) */ COUNT(*) FROM t1 WHERE t1.a > 1;

Check 3: Confirm IMCI coverage for all columns in the query

Call the built-in stored procedure to check whether all tables and columns in the query are covered by an IMCI:

CALL dbms_imci.check_columnar_index('SELECT COUNT(*) FROM t1 WHERE t1.a > 1');
  • If any columns are missing, the stored procedure returns a list of uncovered tables and columns. Create IMCIs for those columns.

  • If all columns are covered, the stored procedure returns an empty result set.

See Check whether an IMCI covers all columns in an SQL statement.

Check 4: Confirm the query is supported by IMCI

Review the IMCI usage limits to determine whether the query type is supported.

If the query still doesn't use IMCI after completing these checks, see the IMCI FAQ or contact support.

Advanced usage

After completing the quick start, use the following features to optimize IMCI for your workload.

FeatureDescription
Configure sort keys for IMCIsIMCI organizes data into row groups. In each row group, column data blocks are built in parallel based on the primary key order of the raw row data, which results in an overall unordered state. Adding a sort key reorders the column data blocks to improve query performance for specific access patterns.
Analyze JSON-formatted dataIMCI integrates column-oriented storage for JSON, virtual columns, Instant DDL, and extended maximum column counts for large-scale structured and semi-structured data analysis.
Accelerate ETL with IMCIThe ETL feature enables IMCI on read-write (RW) nodes. SELECT statements issued on RW nodes are routed to read-only IMCI nodes for acceleration, and the results are transferred back to the RW node through the internal network for writing to the target table.
Enable serverless on a read-only IMCI nodeServerless automatically scales the IMCI node up during peak hours and down during off-peak hours to reduce costs.
Accelerate wide table queries with hybrid plansA hybrid plan uses both column indexes and row indexes in a single query. IMCI-eligible sub-plans run on IMCI to produce intermediate results containing only primary key information, then the InnoDB primary index retrieves the remaining columns.
Scale out with multi-node MPPFor complex queries on massive datasets where a single IMCI node is insufficient, multi-node parallel processing distributes the query across multiple IMCI nodes.

Related information

For a deeper understanding of how IMCI works: