All Products
Search
Document Center

PolarDB:Use multi-node MPP to accelerate mass data analysis

Last Updated:Mar 28, 2026

When a single read-only column store node can no longer keep up with large analytical queries — due to data volume, query complexity, or concurrent OSS table scans — multi-node massively parallel processing (MPP) lets you scale out by adding read-only column store nodes. The In-Memory Column Index (IMCI) optimizer automatically distributes qualifying queries across all nodes in the execution group, increasing available CPU and I/O throughput without changing your SQL.

How it works

image

Multi-node MPP forms a multi-node execution group from multiple read-only column store nodes. For each incoming SQL statement, the IMCI optimizer decides whether to run it on a single node or distribute it across the group.

The optimizer accurately determines the transactional processing (TP) method of SQL statements and selects single-node execution or multi-node parallel execution for large analytical processing (AP) workloads.

Scenarios

  • Use the scaling capability provided by multi-node MPP to increase the CPU resources and IOPS used in queries to reduce latency.

  • Implement in-memory query handling by distributing queries to multiple nodes for higher throughput.

Prerequisites

Before you enable multi-node MPP, make sure your cluster meets the following requirements:

  • A PolarDB for MySQL Enterprise Edition cluster

  • Version 8.0.1, revision version 8.0.1.1.38 or later

  • At least one read-only column store node added to the cluster. For instructions, see Add a read-only IMCI node.

Enable multi-node MPP

To enable and configure multi-node MPP, join the DingTalk group (ID: 27520023189) for technical support.

Best practices

Partition keys

PolarDB supports level-1 and level-2 partitions using HASH or KEY partitioning strategies. Multi-node MPP uses a share-nothing strategy, where each partition is assigned to exactly one node. This means:

  • Each node caches only its own partition, improving memory efficiency.

  • JOIN and GROUP BY operations on partition keys are processed locally on each node, eliminating cross-node data transfer.

To take advantage of this:

  • Create level-1 or level-2 HASH or KEY partitions on the columns most commonly used in JOIN and GROUP BY queries.

  • Keep the partition count identical across all HASH and KEY tables. If two tables have different partition counts, JOIN operations between them require cross-node data transfer and cannot be processed locally.

  • Use a large prime number as the partition count to reduce uneven data distribution and improve resource utilization.

Sort keys

For queries that filter large volumes of data, create RANGE partitions or add sort keys to the column store. Apply RANGE partitions and sort keys to columns that appear in WHERE clause predicates.

For example, for the query condition WHERE date > '2024-10-01' AND date < '2024-10-07' AND customer_id = 'X231':

  • Create RANGE partitions on the date column to skip entire date-range partitions during the scan.

  • Add IMCI sort keys on the customer_id column to skip row segments that do not match the predicate.

Together, these reduce the data volume that must be scanned and improve query response time. For more information, see IMCI overview and Configure sort keys for columnstore indexes.

Verify that multi-node MPP is used

Verifying MPP usage is a two-step process: first check whether a query is capable of using MPP, then check whether the optimizer actually chose MPP for that query.

Step 1: Check whether a query can use MPP

Add the SET_VAR(imci_plan_use_mpp=forced) hint to force the optimizer to generate an MPP execution plan. If the resulting plan contains an Exchange operator, the query supports multi-node MPP execution.

EXPLAIN SELECT /*+ SET_VAR(imci_plan_use_mpp=forced) */ COUNT(*) FROM nation;

Sample output:

+----+----------------------------+--------+---------------------------------------------------------------------------------+
| ID | Operator                   | Name   | Extra Info                                                                      |
+----+----------------------------+--------+---------------------------------------------------------------------------------+
|  1 | Select Statement           |        | IMCI Execution Plan (max_dop = 11, max_query_mem = 37438953472)                 |
|  2 | └─Compute Scalar           |        |                                                                                 |
|  3 |   └─Aggregation            |        |                                                                                 |
|  4 |     └─Consume              |        | Consume ProducerPipeId: 1                                                       |
|  5 |       └─Exchange           |        | PipeId: 1, Consumers: 23377031, Producers: 23377031,23377032, Part Type: Gather |
|  6 |         └─Aggregation      |        |                                                                                 |
|  7 |           └─Table Scan     | nation |                                                                                 |
+----+----------------------------+--------+---------------------------------------------------------------------------------+

The Exchange operator in row 5 confirms that the query can run in parallel across multiple nodes.

Step 2: Check whether the optimizer chooses MPP without forcing it

After you determine that multi-node MPP can be used to execute an SQL statement, you can view the execution plan of the SQL statement to check whether multi-node MPP will be used in the execution of the SQL statement. If the execution plan contains the Exchange operator, the SQL statement will be executed in parallel by using multi-node MPP.

Performance test results

For benchmark results across different node configurations, see IMCI performance.