When a single read-only column store node can no longer keep up with large analytical queries — due to data volume, query complexity, or concurrent OSS table scans — multi-node massively parallel processing (MPP) lets you scale out by adding read-only column store nodes. The In-Memory Column Index (IMCI) optimizer automatically distributes qualifying queries across all nodes in the execution group, increasing available CPU and I/O throughput without changing your SQL.
How it works
Multi-node MPP forms a multi-node execution group from multiple read-only column store nodes. For each incoming SQL statement, the IMCI optimizer decides whether to run it on a single node or distribute it across the group.
The optimizer accurately determines the transactional processing (TP) method of SQL statements and selects single-node execution or multi-node parallel execution for large analytical processing (AP) workloads.
Scenarios
Use the scaling capability provided by multi-node MPP to increase the CPU resources and IOPS used in queries to reduce latency.
Implement in-memory query handling by distributing queries to multiple nodes for higher throughput.
Prerequisites
Before you enable multi-node MPP, make sure your cluster meets the following requirements:
A PolarDB for MySQL Enterprise Edition cluster
Version 8.0.1, revision version 8.0.1.1.38 or later
At least one read-only column store node added to the cluster. For instructions, see Add a read-only IMCI node.
Enable multi-node MPP
To enable and configure multi-node MPP, join the DingTalk group (ID: 27520023189) for technical support.
Best practices
Partition keys
PolarDB supports level-1 and level-2 partitions using HASH or KEY partitioning strategies. Multi-node MPP uses a share-nothing strategy, where each partition is assigned to exactly one node. This means:
Each node caches only its own partition, improving memory efficiency.
JOINandGROUP BYoperations on partition keys are processed locally on each node, eliminating cross-node data transfer.
To take advantage of this:
Create level-1 or level-2 HASH or KEY partitions on the columns most commonly used in
JOINandGROUP BYqueries.Keep the partition count identical across all HASH and KEY tables. If two tables have different partition counts,
JOINoperations between them require cross-node data transfer and cannot be processed locally.Use a large prime number as the partition count to reduce uneven data distribution and improve resource utilization.
Sort keys
For queries that filter large volumes of data, create RANGE partitions or add sort keys to the column store. Apply RANGE partitions and sort keys to columns that appear in WHERE clause predicates.
For example, for the query condition WHERE date > '2024-10-01' AND date < '2024-10-07' AND customer_id = 'X231':
Create RANGE partitions on the
datecolumn to skip entire date-range partitions during the scan.Add IMCI sort keys on the
customer_idcolumn to skip row segments that do not match the predicate.
Together, these reduce the data volume that must be scanned and improve query response time. For more information, see IMCI overview and Configure sort keys for columnstore indexes.
Verify that multi-node MPP is used
Verifying MPP usage is a two-step process: first check whether a query is capable of using MPP, then check whether the optimizer actually chose MPP for that query.
Step 1: Check whether a query can use MPP
Add the SET_VAR(imci_plan_use_mpp=forced) hint to force the optimizer to generate an MPP execution plan. If the resulting plan contains an Exchange operator, the query supports multi-node MPP execution.
EXPLAIN SELECT /*+ SET_VAR(imci_plan_use_mpp=forced) */ COUNT(*) FROM nation;Sample output:
+----+----------------------------+--------+---------------------------------------------------------------------------------+
| ID | Operator | Name | Extra Info |
+----+----------------------------+--------+---------------------------------------------------------------------------------+
| 1 | Select Statement | | IMCI Execution Plan (max_dop = 11, max_query_mem = 37438953472) |
| 2 | └─Compute Scalar | | |
| 3 | └─Aggregation | | |
| 4 | └─Consume | | Consume ProducerPipeId: 1 |
| 5 | └─Exchange | | PipeId: 1, Consumers: 23377031, Producers: 23377031,23377032, Part Type: Gather |
| 6 | └─Aggregation | | |
| 7 | └─Table Scan | nation | |
+----+----------------------------+--------+---------------------------------------------------------------------------------+The Exchange operator in row 5 confirms that the query can run in parallel across multiple nodes.
Step 2: Check whether the optimizer chooses MPP without forcing it
After you determine that multi-node MPP can be used to execute an SQL statement, you can view the execution plan of the SQL statement to check whether multi-node MPP will be used in the execution of the SQL statement. If the execution plan contains the Exchange operator, the SQL statement will be executed in parallel by using multi-node MPP.
Performance test results
For benchmark results across different node configurations, see IMCI performance.