Use multi-node MPP to accelerate mass data analysis - PolarDB

This topic describes the multi-node massively parallel processing (MPP) capability provided by the In-Memory Column Index (IMCI) feature of PolarDB.

Background information

PolarDB provides IMCIs as a hybrid transaction/analytical processing (HTAP) solution. With the increase of the data volume involved in queries, the complexity of queries, and the number of queries for Object Storage Service (OSS) tables, a single read-only column store node becomes incapable of handling queries for massive data. To address this issue, IMCIs provide the multi-node MPP and resource scaling capabilities.

Architecture

The following figure shows the architecture of IMCI-based multi-node MPP.

IMCI-based multi-node MPP is a multi-node execution group that consists of multiple read-only column store nodes and implements the multi-node parallel execution of IMCIs. When query workloads change, you can quickly increase or decrease the number of read-only column store nodes to balance query performance and computing costs.

Multi-node MPP can handle a wide range of analytical processing (AP) workloads. The IMCI optimizer can accurately determine the transactional processing (TP) method of SQL statements, select single-node execution or multi-node parallel execution for the SQL statements, and then distribute the SQL statements to desired column store nodes based on your business requirements.

Scenarios

Use the scaling capability provided by MPP to increase the CPU resources and IOPS used in queries to reduce latency.
Implement in-memory query handling by distributing queries to multiple nodes for higher throughput.

Supported version

The cluster that you want to manage must be a PolarDB for MySQL 8.0.1 cluster of Enterprise Edition whose revision version is 8.0.1.1.38 or later.

Usage notes

You can add multiple read-only column store nodes to a cluster. For more information, see Add a read-only column store node.
For more information about how to enable and use multi-node MPP, join the DingTalk group (ID: 27520023189) for technical support.

Performance test

For more information about the performance test results of multi-node MPP, see IMCI performance.

Related operations

Determine whether multi-node MPP can be used to execute the current SQL statement.

You can add the SET_VAR(imci_plan_use_mpp=forced) hint to an SQL statement to view the execution plan of the SQL statement. This way, you can determine whether multi-node MPP can be used to execute the SQL statement. Example:

EXPLAIN SELECT /*+ SET_VAR(imci_plan_use_mpp=forced) */ COUNT(*) FROM nation;

The following result is returned:

+----+----------------------------+--------+---------------------------------------------------------------------------------+
| ID | Operator                   | Name   | Extra Info                                                                      |
+----+----------------------------+--------+---------------------------------------------------------------------------------+
|  1 | Select Statement           |        | IMCI Execution Plan (max_dop = 11, max_query_mem = 37438953472)                 |
| 2 | └─Compute Scalar           |        |                                                                                 |
| 3 | └─Aggregation            |        |                                                                                 |
| 4 | └─Consume              |        | Consume ProducerPipeId: 1                                                       |
| 5 | └─Exchange           |        | PipeId: 1, Consumers: 23377031, Producers: 23377031,23377032, Part Type: Gather |
| 6 | └─Aggregation      |        |                                                                                 |
| 7 | └─Table Scan     | nation |                                                                                 |
+----+----------------------------+--------+---------------------------------------------------------------------------------

If the execution plan contains the Exchange operator, multi-node MPP can be used to execute the SQL statement.

Determine whether an SQL statement will be executed in parallel by using multi-node MPP.
After you determine that multi-node MPP can be used to execute an SQL statement, you can view the execution plan of the SQL statement to check whether multi-node MPP will be used in the execution of the SQL statement. If the execution plan contains the Exchange operator, the SQL statement will be executed in parallel by using multi-node MPP.