Parallel hints allow you to specify whether to enable parallel queries. You can also
specify the degree of parallelism (DOP), the tables that need to be scanned in parallel,
and the parallel execution mode of each operator. PolarDB for MySQL 8.0.1 supports the following hints in parallel queries: PARALLEL
and NO_PARALLEL
. In addition to supporting PARALLEL
and NO_PARALLEL
, PolarDB for MySQL 8.0.2 also supports the PQ_DISTRIBUTE, PQ_GROUPBY, PQ_DISTINCT, PQ_WINDOW, and PQ_ORDERBY
hints. The PQ_DISTRIBUTE
hint is used to specify the parallel execution strategy of joins. The PQ_GROUPBY
, PQ_DISTINCT
, PQ_WINDOW
, and PQ_ORDERBY
hints are used to specify the parallel execution mode of each analytic operator.
Enable or disable parallel queries
- Enable parallel queries
You can execute one of the following statements to enable the parallel query feature:
SELECT /*+PARALLEL(x)*/ ... FROM ...; -- x >0
SELECT /*+ SET_VAR(max_parallel_degree=n) */ * FROM ... // n > 0
- Disable parallel queries
You can execute one of the following statements to disable the parallel query feature:
SELECT /*+NO_PARALLEL()*/ ... FROM ...;
SELECT /*+ SET_VAR(max_parallel_degree=0) */ * FROM ...
Use hints to specify the tables that need to be scanned in parallel
PARALLEL
and NO_PARALLEL
, where:
- You can use the
PARALLEL
hint to enforce parallel queries and specify the DOP and the tables that need to be scanned in parallel. The following syntax is supported:/*+ PARALLEL [( [query_block] [table_name] [degree] )] */
- You can use the
NO_PARALLEL
hint to enforce serial queries or specify the tables for which parallel scans are prohibited./*+ NO_PARALLEL [( [query_block] [table_name][, table_name] )] */
Parameter | Description |
---|---|
query_block | The name of the query block to which the hint is applied. |
table_name | The name of the table to which the hint is applied. |
degree | The DOP. |
Examples:
SELECT /*+PARALLEL()*/ * FROM t1, t2;
-- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, parallel queries are forcibly executed. The default value of records_threshold_for_parallelism is 10000.
-- The default max_parallel_degree parameter is used to specify the DOP. If max_parallel_degree is set to a value greater than 0,
-- parallel queries are enabled. If max_parallel_degree is set to 0, parallel queries are disabled.
SELECT /*+PARALLEL(8)*/ * FROM t1, t2;
-- This statement enforces parallel queries in a DOP of 8.
-- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, parallel queries are forcibly executed. The default value of records_threshold_for_parallelism is 10000.
-- The max_parallel_degree parameter is set to 8.
SELECT /*+ SET_VAR(max_parallel_degree=8) */ * FROM ...
-- The max_parallel_degree parameter is set to 8.
-- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, such as 20000, parallel queries are automatically disabled.
SELECT /*+PARALLEL(t1)*/ * FROM t1, t2;
-- This statement enforces parallel queries on table t1 by using the /*+PARALLEL()*/ syntax.
SELECT /*+PARALLEL(t1 8)*/ * FROM t1, t2;
-- This statement enforces parallel queries on table t1 in a DOP of 8 by using the /*+PARALLEL(8)*/ syntax.
SELECT /*+PARALLEL(@subq1)*/ SUM(t.a) FROM t WHERE t.a =
(SELECT /*+QB_NAME(subq1)*/ SUM(t1.a) FROM t1);
-- This statement enforces parallel subqueries. The default parameter max_parallel_degree is used to specify the DOP.
-- If max_parallel_degree is set to a value greater than 0, parallel queries are enabled. If max_parallel_degree is set to 0, parallel queries are disabled.
SELECT /*+PARALLEL(@subq1 8)*/ SUM(t.a) FROM t WHERE t.a =
(SELECT /*+QB_NAME(subq1)*/ SUM(t1.a) FROM t1);
-- This statement enforces parallel subqueries. The max_parallel_degree parameter is set to 8.
SELECT SUM(t.a) FROM t WHERE t.a =
(SELECT /*+PARALLEL()*/ SUM(t1.a) FROM t1);
-- This statement enforces parallel subqueries.
-- The default max_parallel_degree parameter is used to specify the DOP.
-- If max_parallel_degree is set to a value greater than 0, parallel queries are enabled. If max_parallel_degree is set to 0, parallel queries are disabled.
SELECT SUM(t.a) FROM t WHERE t.a =
(SELECT /*+PARALLEL(8)*/ SUM(t1.a) FROM t1);
-- This statement enforces parallel subqueries. The max_parallel_degree parameter is set to 8.
SELECT /*+NO_PARALLEL()*/ * FROM t1, t2;
-- This statement disables parallel queries.
SELECT /*+NO_PARALLEL(t1)*/ * FROM t1, t2;
-- This statement prohibits parallel queries on only table t1. If the system enables parallel queries, the system may perform parallel scans and queries on table t2.
SELECT /*+NO_PARALLEL(t1, t2)*/ * FROM t1, t2;
-- This statement prohibits parallel queries on table t1 and table t2.
SELECT /*+NO_PARALLEL(@subq1)*/ SUM(t.a) FROM t WHERE t.a =
(SELECT /*+QB_NAME(subq1)*/ SUM(t1.a) FROM t1);
-- This statement prohibits parallel subqueries.
SELECT SUM(t.a) FROM t WHERE t.a =
(SELECT /*+NO_PARALLEL()*/ SUM(t1.a) FROM t1);
-- This statement prohibits parallel subqueries.
PARALLEL
hint is ineffective for queries that do not support parallel processing or tables
that do not support parallel scans.
Parallel joins
You can use the PQ_DISTRIBUTE
hint to specify the parallel query execution mode of a join.
- Use the
PQ_DISTRIBUTE
hint to specify the data distribution strategy of a table:/*+ PQ_DISTRIBUTE([query_block] table_name strategy )
The following table describes the required parameters.Parameter Description query_block The name of the query block to which the hint is applied. table_name The name of the table to which the hint is applied. strategy The data distribution strategy. Valid values: - PQ_GATHER: Data is aggregated to a worker of the next phase.
- PQ_HASH: Data is shuffled and distributed to multiple workers of the next phase.
- PQ_BROADCAST: Data is broadcast to multiple workers of the next phase.
- PQ_NONE: Data is not distributed.
Examples:SELECT /*+ PARALLEL(t1) PQ_DISTRIBUTE(t1 PQ_GATHER) */ * FROM t as t1; -- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, parallel queries are forcibly executed. The default value of records_threshold_for_parallelism is 10000. -- The default max_parallel_degree parameter is used to specify the DOP. If max_parallel_degree is set to a value greater than 0, -- parallel queries are enabled. If max_parallel_degree is set to 0, parallel queries are disabled. -- After the table is scanned in parallel, the data is not distributed, but the results are gathered by the leader. SELECT /*+ PARALLEL(t1) PQ_DISTRIBUTE(t1 PQ_HASH) */t1.a, SUM(t1.b) * FROM t as t1 GROUP BY t1.a; -- After the table is scanned in parallel, the data is distributed to workers of the next phase based on the group key.
- Use the
PQ_DISTRIBUTE
hint to specify the joining mode of two tables:/*+ PQ_DISTRIBUTE([query_block] table_name strategy1 [strategy2])
In the preceding statement, if onlystrategy1
is specified, it indicates that only the data distribution mode of the table_name table is specified. If bothstrategy1
andstrategy2
are specified, it indicates that the parallel joining mode between thetable_name
table and a previous table is specified.Note The previous table can be a physical table or an intermediate table in the previous joining operation.Examples:
If queries do not support parallel processing or queries use two conflicting hints, the queries may fail. Example:SELECT /*+ PARALLEL(t1) PARALLEL(t2) PQ_DISTRIBUTE(t2 PQ_HASH PQ_HASH) */ * FROM t as t1 STRAIGHT_JOIN t as t2 on t1.b = t2.c; -- Table t1 is scanned in parallel, and the data is shuffled and distributed to workers of the next phase. -- Table t2 is scanned in parallel, and the data is shuffled and distributed to workers of the next phase. -- Workers of the second phase execute a co-location join. Then, the leader gathers the results. SELECT /*+ PARALLEL(t1) PARALLEL(t2) PQ_DISTRIBUTE(t2 PQ_GATHER PQ_GATHER) */ * FROM t as t1 STRAIGHT_JOIN t as t2 on t1.b = t2.c; -- Table t1 is scanned in parallel, and the data is distributed to the leader. -- Table t2 is scanned in parallel, and the data is distributed to the leader. -- The leader gathers the data and completes the joining operation.
SELECT /*+ PARALLEL(t1) PARALLEL(t2) PQ_DISTRIBUTE(t2 PQ_HASH PQ_GATHER) */ * FROM t as t1 STRAIGHT_JOIN t as t2 on t1.b = t2.c; -- Table t1 is scanned in parallel, and the data is distributed to multiple workers of the next phase. -- Table t2 is scanned in parallel, and the data is distributed to the leader. -- The data distribution modes of the two hints conflict with each other, so a parallel query plan cannot be generated.
Parallel by-group aggregation: PQ_GROUPBY
PQ_GROUPBY
hint to specify the execution mode of by-group aggregation./*+ PQ_GROUPBY(strategy) */
Parameter | Description |
---|---|
strategy | The data distribution strategy. Valid values:
|
SELECT /*+ PARALLEL(t1) PQ_GROUPBY(PQ_ONEPHASE) */ t1.a, sum(t1.b) FROM t as t1 GROUP BY t1.a;
-- Table t1 is scanned in parallel and the data is distributed to multiple workers of the next phase based on column t1.a.
-- Workers of the second phase complete aggregation calculations. Then, the results are gathered by the leader.
SELECT /*+ PARALLEL(t1) PQ_GROUPBY(PQ_TWOPHASE_HASH) */ t1.a, sum(t1.b) FROM t as t1 GROUP BY t1.a;
-- Workers scan table t1 in parallel and perform data aggregation separately.
-- Intermediate aggregation results are distributed to multiple workers of the next phase based on column t1.a.
-- Workers of the second phase complete aggregation calculations. Then, the results are gathered by the leader.
Parallel by-group aggregation: PQ_DISTINCT
PQ_DISTINCT
hint to specify the execution mode of by-group aggregation./*+ PQ_DISTINCT(strategy) */
Parameter | Description |
---|---|
strategy | The data distribution strategy. Valid values:
|
Parallel window functions
PQ_WINDOW
hint to specify the execution mode of a window function./*+ PQ_WINDOW([window_name] strategy) */
Parameter | Description |
---|---|
window_name | The name of the window function to which the data distribution strategy applies. If you do not specify this parameter, the data distribution strategy applies to all window functions. |
strategy | The data distribution strategy. Valid values:
|
SELECT /*+ PQ_WINDOW(PQ_ONEPHASE) PQ_WINDOW(win PQ_SERIAL) */
ROW_NUMBER() OVER(win) AS 'row_number',
RANK() over(partition by name order by salary desc)
FROM employee_salaries WINDOW win as (partition by dept order by salary desc);
-- The window function named win is called to run computations serially.
-- For other window functions, data is distributed to multiple workers based on the key specified in the PARTITION BY clause for parallel computations.
Parallel data sorting
PQ_ORDERBY
hint to specify the data sorting mode./*+ PQ_ORDERBY(strategy) */
Parameter | Description |
---|---|
strategy | The data distribution strategy. Valid values:
|
Parallel subqueries
You can use hints to specify the policy to run parallel subqueries. For more information, see the support for subqueries in View parallel query execution plans. The following hint syntax is provided:
/*+ PQ_PUSHDOWN [( [query_block])] */ The subqueries run in parallel based on the pushdown policy.
/*+ NO_PQ_PUSHDOWN [( [query_block])] */ The subqueries run in parallel based on the shared access policy.
Examples:
# The subqueries run in parallel based on the pushdown policy.
EXPLAIN SELECT /*+ PQ_PUSHDOWN(@qb1) */ * FROM t2 WHERE t2.a =
(SELECT /*+ qb_name(qb1) */ a FROM t1);
# The subqueries run in parallel based on the shared access policy.
EXPLAIN SELECT /*+ NO_PQ_PUSHDOWN(@qb1) */ * FROM t2 WHERE t2.a =
(SELECT /*+ qb_name(qb1) */ a FROM t1);
# No hint is added to the query block.
EXPLAIN SELECT * FROM t2 WHERE t2.a =
(SELECT /*+ NO_PQ_PUSHDOWN() */ a FROM t1);