×
Community Blog PolarDB Hands-on | Multi-node Elastic Parallel Query

PolarDB Hands-on | Multi-node Elastic Parallel Query

In this article, we will introduce the elastic parallel query feature of PolarDB for MySQL and the use case and scenario.

Overview

PolarDB for MySQL 8.0 provides the elastic Parallel Query(ePQ) feature. The feature is automatically enabled to reduce the response time of queries when the volume of data you query exceeds a specified threshold.

The principle of parallel query is to split a large query task into multiple sub-query tasks, and distribute the sub-tasks to multiple threads for concurrent processing. The core idea is to use the concurrent processing capability of multi-core chips to speed up the query. Elastic parallel query also supports distributing subtasks to multiple nodes for execution, have better elastic expansion ability.

1

Business Scenario

The elastic parallel query feature is applicable to most SELECT statements, such as queries on large tables, multi-table queries that use JOIN statements, and queries on a large amount of data. This feature does not benefit extremely short queries. The diverse parallel methods make the feature suitable for multiple scenarios.

Analytic Queries on Vast Amounts of Data

If medium or large amounts of data is involved, SQL statements for analytic queries are often complex and time-consuming. You can enable the elastic parallel query feature to linearly reduce the response time.

The steps is based on that we have an PolarDB MySQL instance, and we connected a session to the instance. The following PolarDB cluster and test data is used in the example:

Step 1: Set Cluster Endpoints as Read-Only Mode.

Click modify.

2

Set Cluster Endpoints as Read-Only Mode.

3

Step 2: Connected a session to the Cluster Endpoints, and warm up the data.

-- warm up the data.
mysql> select count(*) from part;
+----------+
| count(*) |
+----------+
| 20000000 |
+----------+

Step 3: Test an aggregation analysis query without ePQ.

mysql> select avg(P_RETAILPRICE),min(P_RETAILPRICE),max(P_RETAILPRICE) from part;
+--------------------+--------------------+--------------------+
| avg(P_RETAILPRICE) | min(P_RETAILPRICE) | max(P_RETAILPRICE) |
+--------------------+--------------------+--------------------+
|        1499.495035 |             900.01 |            2098.99 |
+--------------------+--------------------+--------------------+
1 row in set (8.55 sec)

Without ePQ, the query time is 8.55 sec.

Step 4: Turn on ePQ.

Click the On option of Parallel Query.

4

Configure degree of parallel is 8, and click OK.

5

Step 5: Test the same query with ePQ is enabled.

Note: The modified configuration is only valid for new session connections, so you need to disconnect and reconnect the session before testing.

mysql> select avg(P_RETAILPRICE),min(P_RETAILPRICE),max(P_RETAILPRICE) from part;
+--------------------+--------------------+--------------------+
| avg(P_RETAILPRICE) | min(P_RETAILPRICE) | max(P_RETAILPRICE) |
+--------------------+--------------------+--------------------+
|        1499.495035 |             900.01 |            2098.99 |
+--------------------+--------------------+--------------------+
1 row in set (1.57 sec)

After enabled ePQ, the query time is speedup to 1.57 sec. By checking the query plan, the elastic parallel query uses 8 parallel workers(threads) to speed up the query.

Query Plan:

6

Step 6: Add a Read-only Node to Cluster.

Click Add/Remove Node, and add a Read-only node which has same specification.

7

You need to wait for the new node to be added successfully.

8

Step 7: Test the same query after new read-only node is added.

mysql> select avg(P_RETAILPRICE),min(P_RETAILPRICE),max(P_RETAILPRICE) from part;
+--------------------+--------------------+--------------------+
| avg(P_RETAILPRICE) | min(P_RETAILPRICE) | max(P_RETAILPRICE) |
+--------------------+--------------------+--------------------+
|        1499.495035 |             900.01 |            2098.99 |
+--------------------+--------------------+--------------------+
1 row in set (0.62 sec)

After a new node is expanded to the cluster, the query response time is speedup to 0.62 sec. By checking the query plan, the number of parallel workers is flexibly adjusted to 16, that because the cluster has two idle computing nodes currently.

9

Step 8: Comparison of test results

ePQ=OFF ePQ=ON ePQ=ON
The number of nodes 1 1 2
The query response time 8.55 sec 1.57 sec 0.62 sec

Enabled elastic parallel query, the response time of the slow query can be reduced linearly.

For more details and examples, please refer to Alibaba Cloud documentation for learning.

0 2 1
Share on

ApsaraDB

451 posts | 97 followers

You may also like

Comments