PolarDB Hands-on | Multi-node Elastic Parallel Query

Overview

PolarDB for MySQL 8.0 provides the elastic Parallel Query(ePQ) feature. The feature is automatically enabled to reduce the response time of queries when the volume of data you query exceeds a specified threshold.

The principle of parallel query is to split a large query task into multiple sub-query tasks, and distribute the sub-tasks to multiple threads for concurrent processing. The core idea is to use the concurrent processing capability of multi-core chips to speed up the query. Elastic parallel query also supports distributing subtasks to multiple nodes for execution, have better elastic expansion ability.

Business Scenario

The elastic parallel query feature is applicable to most SELECT statements, such as queries on large tables, multi-table queries that use JOIN statements, and queries on a large amount of data. This feature does not benefit extremely short queries. The diverse parallel methods make the feature suitable for multiple scenarios.

Analytic Queries on Vast Amounts of Data

If medium or large amounts of data is involved, SQL statements for analytic queries are often complex and time-consuming. You can enable the elastic parallel query feature to linearly reduce the response time.

The steps is based on that we have an PolarDB MySQL instance, and we connected a session to the instance. The following PolarDB cluster and test data is used in the example:

The node specification is 8 Cores and 64GB Memory.
Generate test data by TPC-H tools, the scale factor of data is 100 (100GB).

Step 1: Set Cluster Endpoints as Read-Only Mode.

Click modify.

Set Cluster Endpoints as Read-Only Mode.

Step 2: Connected a session to the Cluster Endpoints, and warm up the data.

-- warm up the data.
mysql> select count(*) from part;
+----------+
| count(*) |
+----------+
| 20000000 |
+----------+

Step 3: Test an aggregation analysis query without ePQ.

mysql> select avg(P_RETAILPRICE),min(P_RETAILPRICE),max(P_RETAILPRICE) from part;
+--------------------+--------------------+--------------------+
| avg(P_RETAILPRICE) | min(P_RETAILPRICE) | max(P_RETAILPRICE) |
+--------------------+--------------------+--------------------+
|        1499.495035 |             900.01 |            2098.99 |
+--------------------+--------------------+--------------------+
1 row in set (8.55 sec)

Without ePQ, the query time is 8.55 sec.

Step 4: Turn on ePQ.

Click the On option of Parallel Query.

Configure degree of parallel is 8, and click OK.

Step 5: Test the same query with ePQ is enabled.

Note: The modified configuration is only valid for new session connections, so you need to disconnect and reconnect the session before testing.

mysql> select avg(P_RETAILPRICE),min(P_RETAILPRICE),max(P_RETAILPRICE) from part;
+--------------------+--------------------+--------------------+
| avg(P_RETAILPRICE) | min(P_RETAILPRICE) | max(P_RETAILPRICE) |
+--------------------+--------------------+--------------------+
|        1499.495035 |             900.01 |            2098.99 |
+--------------------+--------------------+--------------------+
1 row in set (1.57 sec)

After enabled ePQ， the query time is speedup to 1.57 sec. By checking the query plan, the elastic parallel query uses 8 parallel workers(threads) to speed up the query.

Query Plan:

Step 6: Add a Read-only Node to Cluster.

Click Add/Remove Node, and add a Read-only node which has same specification.

You need to wait for the new node to be added successfully.

Step 7: Test the same query after new read-only node is added.

mysql> select avg(P_RETAILPRICE),min(P_RETAILPRICE),max(P_RETAILPRICE) from part;
+--------------------+--------------------+--------------------+
| avg(P_RETAILPRICE) | min(P_RETAILPRICE) | max(P_RETAILPRICE) |
+--------------------+--------------------+--------------------+
|        1499.495035 |             900.01 |            2098.99 |
+--------------------+--------------------+--------------------+
1 row in set (0.62 sec)

After a new node is expanded to the cluster, the query response time is speedup to 0.62 sec. By checking the query plan, the number of parallel workers is flexibly adjusted to 16, that because the cluster has two idle computing nodes currently.

Step 8: Comparison of test results

	ePQ=OFF	ePQ=ON	ePQ=ON
The number of nodes	1	1	2
The query response time	8.55 sec	1.57 sec	0.62 sec

Enabled elastic parallel query, the response time of the slow query can be reduced linearly.

For more details and examples, please refer to Alibaba Cloud documentation for learning.

Community

PolarDB Hands-on | Multi-node Elastic Parallel Query

Overview

Business Scenario

Analytic Queries on Vast Amounts of Data

Step 1: Set Cluster Endpoints as Read-Only Mode.

Step 2: Connected a session to the Cluster Endpoints, and warm up the data.

Step 3: Test an aggregation analysis query without ePQ.

Step 4: Turn on ePQ.

Step 5: Test the same query with ePQ is enabled.

Step 6: Add a Read-only Node to Cluster.

Step 7: Test the same query after new read-only node is added.

Step 8: Comparison of test results

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for PostgreSQL

PolarDB for Xscale

PolarDB for MySQL

Database for FinTech Solution