Frequently asked questions about the In-Memory Column Index (IMCI) feature of PolarDB for MySQL.
How do I enable IMCI?
Enabling IMCI requires three steps:
-
Add a column store read-only node to your cluster. See Add a read-only node with the IMCI feature enabled.
-
Add a columnstore index to the tables you want to accelerate. Use
CREATE TABLEorALTER TABLEwithCOLUMNAR=1in theCOMMENTfield. After the index is ready, the optimizer selects it automatically based on execution cost. See DDL syntax for creating a columnstore index. -
Route SQL statements to the column store node. With automatic request distribution enabled on the cluster endpoint, the database proxy forwards queries whose estimated cost exceeds the
imci_ap_thresholdthreshold automatically. See Configure a cluster endpoint to implement automatic request distribution between row store and column store nodes.
How do I check the status of a columnstore index?
After using ALTER TABLE to add a columnstore index, the index is built asynchronously on the column store node. Query INFORMATION_SCHEMA.IMCI_INDEXES to get the build status.
| Status | Meaning |
|---|---|
COMMITTED |
Index is ready and available for queries |
| Other values | Index is still being built |
For an index that is still building, query INFORMATION_SCHEMA.IMCI_ASYNC_DDL_STATS to track build progress. See View the index status.
Connect to the cluster endpoint that has automatic request distribution enabled, or connect directly to the column store node, to run these queries.
When you log in to a database using Data Management (DMS), DMS connects to the primary endpoint by default. To connect to a different endpoint:
Cluster endpoint: Log in to DMS 5.0DMS 5.0. On the Add Instance page, set Entry Method to Connection String and enter the cluster endpoint. See Add an ApsaraDB instance.
Column store node directly: Create a custom cluster endpoint that contains only the target column store node. Then log in to DMS 5.0DMS 5.0, set Entry Method to Connection String, and enter that custom endpoint. See Add an ApsaraDB instance.
How do I verify that a query is using the columnstore index?
Run EXPLAIN on the query. If the execution plan contains IMCI Execution Plan, the columnstore index is active.
EXPLAIN SELECT COUNT(*) FROM t1 WHERE t1.a > 1;
Sample output:
*************************** 1. row ***************************
IMCI Execution Plan (max_dop = 8, max_query_mem = 3435134976):
Project | Exprs: temp_table3.lineitem.L_ORDERKEY, temp_table3.SUM(lineitem.L_EXTENDEDPRICE * 1.00 - lineitem.L_DISCOUNT), temp_table3.orders.O_ORDERDATE, temp_table3.orders.O_SHIPPRIORITY
Sort | Exprs: temp_table3.SUM(lineitem.L_EXTENDEDPRICE * 1.00 - lineitem.L_DISCOUNT) DESC,temp_table3.orders.O_ORDERDATE ASC
HashGroupby | OutputTable(3): temp_table3 | Grouping: lineitem.L_ORDERKEY orders.O_ORDERDATE orders.O_SHIPPRIORITY | Output Grouping: lineitem.L_ORDERKEY, orders.O_ORDERDATE, orders.O_SHIPPRIORITY | Aggrs: SUM(lineitem.L_EXTENDEDPRICE * 1.00 - lineitem.L_DISCOUNT)
HashJoin | HashMode: DYNAMIC | JoinMode: INNER | JoinPred: orders.O_ORDERKEY = lineitem.L_ORDERKEY
HashJoin | HashMode: DYNAMIC | JoinMode: INNER | JoinPred: orders.O_CUSTKEY = customer.C_CUSTKEY
CTableScan | InputTable(0): orders | Pred: (orders.O_ORDERDATE < 03/24/1995 00:00:00.000000)
CTableScan | InputTable(1): customer | Pred: (customer.C_MKTSEGMENT = "BUILDING")
CTableScan | InputTable(2): lineitem | Pred: (lineitem.L_SHIPDATE > 03/24/1995 00:00:00.000000)
1 row in set (0.04 sec)
The execution plan is a tree structure where each layer represents one operator:
| Operator | Corresponds to |
|---|---|
CTableScan |
A column store table scan |
HashJoin |
The JOIN clause |
HashGroupby |
The GROUP BY clause |
Project |
Column projection |
Sort |
The ORDER BY clause |
Sequence |
Optimizer-generated; no direct SQL counterpart |
The header line also shows max_dop (the degree of parallelism used) and max_query_mem (the memory limit for the query).
Why isn't the query using the columnstore index?
Work through the following checks in order. Stop as soon as you find the cause.
Step 1: Confirm the query reaches the column store node
Use SQL Explorer to verify the query was routed to the column store node.
If it wasn't, add the /*FORCE_IMCI_NODES*/ hint before SELECT to force routing:
/*FORCE_IMCI_NODES*/EXPLAIN SELECT COUNT(*) FROM t1 WHERE t1.a > 1;
Alternatively, create a custom endpoint pointing only to the column store node so all queries go there by default.
With automatic request distribution enabled, the database proxy routes queries whose estimated cost exceeds imci_ap_threshold to the column store node automatically. See Configure automatic request distribution.
Step 2: Check whether the execution cost meets the threshold
On the column store node, the optimizer uses the columnstore index only when the estimated execution cost exceeds cost_threshold_for_imci. Run the following to check the estimated cost of your query:
EXPLAIN SELECT * FROM t1;
SHOW STATUS LIKE 'Last_query_cost';
If the cost is below the threshold, lower cost_threshold_for_imci for that statement using a hint:
/*FORCE_IMCI_NODES*/EXPLAIN SELECT /*+ SET_VAR(cost_threshold_for_imci=0) */ COUNT(*) FROM t1 WHERE t1.a > 1;
Step 3: Check whether the columnstore index covers all required columns
Use the built-in stored procedure to check coverage:
CALL dbms_imci.check_columnar_index('SELECT COUNT(*) FROM t1 WHERE t1.a > 1');
-
Empty result set: The query is fully covered. Move to step 4.
-
Non-empty result set: The returned rows list the uncovered tables and columns. Add those columns to the columnstore index.
Step 4: Check for unsupported SQL features
Some features—such as certain spatiotemporal expressions, full-text indexes, and some forms of correlated subqueries—are not supported by the IMCI engine. Review the list of limits to confirm whether your query uses any unsupported feature. Queries using unsupported features fall back to the row-store index automatically.
If all four checks pass, the query should use the columnstore index.
How do I control whether a query uses the columnstore index?
Three mechanisms are available, from broadest to narrowest scope:
| Mechanism | Scope | How |
|---|---|---|
| Automatic request distribution | Cluster-wide | Enable on the cluster endpoint; the database proxy routes queries based on imci_ap_threshold |
| Custom endpoint | Connection-level | Connect through an endpoint that points only to the column store node |
| SQL hint | Per-statement | Add a hint directly in the SQL statement |
Force a query to use the columnstore index:
SELECT /*+ SET_VAR(cost_threshold_for_imci=0) */ COUNT(*) FROM t1 WHERE t1.a > 1;
Force a query to the column store node:
/*FORCE_IMCI_NODES*/SELECT COUNT(*) FROM t1 WHERE t1.a > 1;
Force a query to skip the columnstore index:
SELECT /*+ SET_VAR(USE_IMCI_ENGINE=OFF) */ COUNT(*) FROM t1 WHERE t1.a > 1;
A column store node is a regular read-only node with IMCI enabled, so it can use both row-store and columnstore indexes. The optimizer selects between them based on cost_threshold_for_imci.
How do I add a columnstore index for a specific query?
Use the dbms_imci.columnar_advise() stored procedure to get the DDL statement needed to fully cover a query:
CALL dbms_imci.columnar_advise('<query_string>');
Execute the returned DDL statement to create the columnstore index. The query will then be fully covered.
On a Multi-master Cluster (Limitless) Edition, run this stored procedure on a global read-only node. Use the /*force_node='<Node ID>'*/ hint to target the correct node:
/*force_node='pi-bpxxxxxxxx'*/ CALL dbms_imci.columnar_advise('<query_string>');
To get DDL statements for a batch of queries, use the dbms_imci.columnar_advise_begin() and dbms_imci.columnar_advise_end() interfaces together with dbms_imci.columnar_advise(). See Obtain DDL statements for creating columnstore indexes in batches.
PolarDB for MySQL IMCI: Does it support single-node parallel queries? If so, how can you customize the degree of parallelism for a specific SQL statement?
Single-node parallel query is enabled by default. You can use EXPLAIN to view the execution plan. The max_dop field indicates the actual degree of parallelism used. To customize the degree of parallelism for a specific SQL statement, you can set the imci_max_dop parameter at the session level before you execute the SQL statement. Example:
set imci_max_dop=8; explain select xxxx
Does IMCI support parallel queries?
Single-node parallel query is enabled by default. The max_dop field in the EXPLAIN output shows the actual degree of parallelism used.
To set the degree of parallelism for a specific query, configure imci_max_dop at the session level before running the query:
SET imci_max_dop = 8;
EXPLAIN SELECT ...;
Why is the column store node using high CPU or memory?
By default, each query can use all available CPU cores. When multiple queries run concurrently, the internal scheduler dynamically limits the CPU and memory allocated per query. As a result, column store nodes typically run at higher average CPU and memory utilization than row store nodes.
To limit per-query CPU usage, lower imci_max_dop:
SET imci_max_dop = 4;
For monitoring, set these thresholds:
-
CPU: alert at 70%
-
Memory: alert at 90%
A column store node can have different specifications than other nodes in the same cluster. The minimum recommended configuration is 8 cores and 16 GB of memory. Upgrade or downgrade the node independently as needed.
Which PolarDB for MySQL versions support IMCI?
IMCI is supported on PolarDB for MySQL 8.0 only. PolarDB for MySQL 5.6 and 5.7 do not support IMCI.
Is IMCI compatible with MySQL? Does it support full-text indexes?
IMCI is fully compatible with MySQL. A small number of less common query features are not yet fully supported:
-
Certain spatiotemporal expressions
-
Full-text indexes
-
Some forms of correlated subqueries
Queries using these features fall back to the row-store index automatically. For the complete list, see Limits.
Can INSERT INTO SELECT and CREATE TABLE AS SELECT use the columnstore index?
Not directly. INSERT and CREATE statements run on the primary (RW) node, while IMCI queries run on read-only nodes. To accelerate INSERT INTO SELECT or CREATE TABLE AS SELECT with a columnstore index, use the IMCI ETL feature. See Use a columnstore index to accelerate ETL.
Is IMCI free? Can I use it on a hot standby-disabled read-only node?
The IMCI feature itself is not charged separately. Using IMCI requires a dedicated read-only node with IMCI enabled. You are charged for:
-
The new read-only node
-
The extra storage consumed by the columnstore indexes
Hot standby-disabled read-only nodes do not support IMCI.
How much extra storage does a columnstore index use?
A columnstore index stores data by column rather than by row, which gives a compression ratio 3 to 10 times higher than row store. However, maintaining this separate column-organized copy increases total storage by 10% to 30%.
How do I view the storage space used by a columnstore index?
The system tables to query differ by cluster version.
PolarDB for MySQL 8.0.1.1.32 and earlier
Query information_schema.imci_columns for both storage space and column compression ratio:
SELECT
SCHEMA_NAME, TABLE_NAME,
SUM(EXTENT_SIZE * TOTAL_EXTENT_COUNT) AS TOTAL_SIZE,
SUM(EXTENT_SIZE * USED_EXTENT_COUNT) AS USED_SIZE,
SUM(EXTENT_SIZE * FREE_EXTENT_COUNT) AS FREE_SIZE,
SUM(RAW_DATA_SIZE) / SUM(FILE_SIZE) AS COMPRESS
FROM
information_schema.imci_columns
WHERE SCHEMA_NAME = 'test' AND TABLE_NAME = 'test';
PolarDB for MySQL 8.0.1.1.33 and later
Query INFORMATION_SCHEMA.IMCI_DATA_FILES for storage space and INFORMATION_SCHEMA.IMCI_COLUMNS for compression ratio separately:
-- Storage space
SELECT
SCHEMA_NAME, TABLE_NAME,
SUM(EXTENT_SIZE * TOTAL_EXTENT_COUNT) AS TOTAL_SIZE,
SUM(EXTENT_SIZE * USED_EXTENT_COUNT) AS USED_SIZE,
SUM(EXTENT_SIZE * FREE_EXTENT_COUNT) AS FREE_SIZE
FROM
INFORMATION_SCHEMA.IMCI_DATA_FILES
WHERE SCHEMA_NAME = 'test' AND TABLE_NAME = 'test';
-- Column compression ratio
SELECT
SCHEMA_NAME, TABLE_NAME,
SUM(RAW_DATA_SIZE) / SUM(CMP_DATA_SIZE) AS COMPRESS_RATIO
FROM
INFORMATION_SCHEMA.IMCI_COLUMNS
WHERE SCHEMA_NAME = 'test' AND TABLE_NAME = 'test';
Parameter reference
| Parameter | Description |
|---|---|
SCHEMA_NAME |
Database name |
TABLE_NAME |
Table name |
EXTENT_SIZE |
Size of one extent, in bytes |
TOTAL_EXTENT_COUNT |
Total number of extents |
USED_EXTENT_COUNT |
Number of used extents |
FREE_EXTENT_COUNT |
Number of free extents |
RAW_DATA_SIZE |
Column data size before compression, in bytes |
FILE_SIZE |
Column data size after compression, in bytes (versions earlier than 8.0.1.1.33) |
CMP_DATA_SIZE |
Column data size after compression, in bytes (8.0.1.1.33 and later) |
Why doesn't instant DDL work after adding a columnstore index?
Behavior differs by version:
| Version | Behavior |
|---|---|
| Earlier than 8.0.1.1.42 and 8.0.2.2.23 | Instant column addition is not supported on tables with a table-level IMCI. Adding a column triggers a full data rebuild, including the index. |
| 8.0.1.1.42 and later, 8.0.2.2.23 and later | Instant DDL is supported on tables with a table-level IMCI. Set imci_enable_add_column_instant_ddl to OFF and ensure the table has a primary key. |
The instant DDL implementation in 8.0.1.1.42 and later is not compatible with the rebuild mode of earlier versions.
How do I view or delete a columnstore index added by AutoIndex?
To view AutoIndex-created columnstore indexes:
SELECT * FROM information_schema.imci_autoindex_executed;
To delete one, use the same command as for a manually created index:
ALTER TABLE t1 COMMENT 'columnar=0';
Why does ALTER TABLE take longer after a columnstore index is added?
When you add or remove columns, the table data must be rebuilt. If the table has a columnstore index, the index data must be rebuilt as well, and that rebuild writes to the redo log. Because a columnstore index typically covers many columns, the redo log volume is proportional to the original table size, which increases I/O and extends the overall execution time.
Does adding a columnstore index affect write performance?
The impact is within 5%. In Sysbench oltp_insert workload tests, write throughput dropped by approximately 3% after adding a columnstore index.
Which transaction isolation levels does IMCI support?
Columnstore indexes support READ_COMMITTED and REPEATABLE_READ.
For REPEATABLE_READ, connect through a custom endpoint that contains only column store read-only nodes.
Some ecosystem tools such as Metabase BI may set the transaction isolation level to an unsupported value such asREAD_UNCOMMITTED. This behavior affects PolarDB for MySQL 8.0.1.1.40 and later, and 8.0.2.2.21 and later. To work around this, runSET imci_ignore_unsupported_isolation_level=ON, or add it to the ODBC/JDBC connection string. For example, in Metabase, addsession Variables=imci_ignore_unsupported_isolation_level='ON'to the connection string.
Does the columnstore index accelerate fuzzy searches?
Yes. The columnstore index supports several fuzzy search optimizations: LIKE PRUNER, NGRAM LIKE, and SMID LIKE. The columnstore full-text index also accelerates fuzzy search workloads.