Parallel hints allow you to specify whether to enable parallel queries. You can also specify the degree of parallelism (DOP), the tables that need to be scanned in parallel, and the parallel execution mode of each operator. PolarDB for MySQL 8.0.1 supports the following hints in parallel queries: PARALLEL and NO_PARALLEL. In addition to supporting PARALLEL and NO_PARALLEL, PolarDB for MySQL 8.0.2 also supports the PQ_DISTRIBUTE, PQ_GROUPBY, PQ_DISTINCT, PQ_WINDOW, and PQ_ORDERBY hints. The PQ_DISTRIBUTE hint is used to specify the parallel execution strategy of joins. The PQ_GROUPBY, PQ_DISTINCT, PQ_WINDOW, and PQ_ORDERBY hints are used to specify the parallel execution mode of each analytic operator.

Enable or disable parallel queries

  • Enable parallel queries
    You can execute one of the following statements to enable the parallel query feature:
    • SELECT /*+PARALLEL(x)*/ ... FROM ...;   -- x >0
    • SELECT /*+ SET_VAR(max_parallel_degree=n) */  *  FROM ...   // n > 0
  • Disable parallel queries
    You can execute one of the following statements to disable the parallel query feature:
    • SELECT /*+NO_PARALLEL()*/ ... FROM ...;
    • SELECT /*+ SET_VAR(max_parallel_degree=0) */  *  FROM ...

Use hints to specify the tables that need to be scanned in parallel

The parallel query feature supports two hints: PARALLEL and NO_PARALLEL, where:
  • You can use the PARALLEL hint to enforce parallel queries and specify the DOP and the tables that need to be scanned in parallel. The following syntax is supported:
    /*+ PARALLEL [( [query_block] [table_name]  [degree] )] */
  • You can use the NO_PARALLEL hint to enforce serial queries or specify the tables for which parallel scans are prohibited.
    /*+ NO_PARALLEL [( [query_block] [table_name][, table_name] )] */
The following table describes the required parameters.
Parameter Description
query_block The name of the query block to which the hint is applied.
table_name The name of the table to which the hint is applied.
degree The DOP.

Examples:

SELECT /*+PARALLEL()*/ * FROM t1, t2; 
-- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, parallel queries are forcibly executed. The default value of records_threshold_for_parallelism is 10000. 
-- The default max_parallel_degree parameter is used to specify the DOP. If max_parallel_degree is set to a value greater than 0, 
-- parallel queries are enabled. If max_parallel_degree is set to 0, parallel queries are disabled. 

SELECT /*+PARALLEL(8)*/ * FROM t1, t2; 
-- This statement enforces parallel queries in a DOP of 8.  
-- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, parallel queries are forcibly executed. The default value of records_threshold_for_parallelism is 10000. 
-- The max_parallel_degree parameter is set to 8. 

SELECT /*+ SET_VAR(max_parallel_degree=8) */  *  FROM ...   
-- The max_parallel_degree parameter is set to 8.  
-- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, such as 20000, parallel queries are automatically disabled. 

SELECT /*+PARALLEL(t1)*/ * FROM t1, t2; 
-- This statement enforces parallel queries on table t1 by using the /*+PARALLEL()*/ syntax.

SELECT /*+PARALLEL(t1 8)*/ * FROM t1, t2; 
-- This statement enforces parallel queries on table t1 in a DOP of 8 by using the /*+PARALLEL(8)*/ syntax.

SELECT /*+PARALLEL(@subq1)*/ SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+QB_NAME(subq1)*/ SUM(t1.a) FROM t1); 
-- This statement enforces parallel subqueries. The default parameter max_parallel_degree is used to specify the DOP.  
-- If max_parallel_degree is set to a value greater than 0, parallel queries are enabled. If max_parallel_degree is set to 0, parallel queries are disabled.

SELECT /*+PARALLEL(@subq1 8)*/ SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+QB_NAME(subq1)*/ SUM(t1.a) FROM t1); 
-- This statement enforces parallel subqueries. The max_parallel_degree parameter is set to 8.

SELECT SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+PARALLEL()*/ SUM(t1.a) FROM t1); 
-- This statement enforces parallel subqueries.  
-- The default max_parallel_degree parameter is used to specify the DOP.  
-- If max_parallel_degree is set to a value greater than 0, parallel queries are enabled. If max_parallel_degree is set to 0, parallel queries are disabled.

SELECT SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+PARALLEL(8)*/ SUM(t1.a) FROM t1); 
-- This statement enforces parallel subqueries. The max_parallel_degree parameter is set to 8.

SELECT /*+NO_PARALLEL()*/ * FROM t1, t2; 
-- This statement disables parallel queries.

SELECT /*+NO_PARALLEL(t1)*/ * FROM t1, t2; 
-- This statement prohibits parallel queries on only table t1. If the system enables parallel queries, the system may perform parallel scans and queries on table t2.

SELECT /*+NO_PARALLEL(t1, t2)*/ * FROM t1, t2; 
-- This statement prohibits parallel queries on table t1 and table t2.

SELECT /*+NO_PARALLEL(@subq1)*/ SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+QB_NAME(subq1)*/ SUM(t1.a) FROM t1); 
-- This statement prohibits parallel subqueries.

SELECT SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+NO_PARALLEL()*/ SUM(t1.a) FROM t1); 
 -- This statement prohibits parallel subqueries.
Note The PARALLEL hint is ineffective for queries that do not support parallel processing or tables that do not support parallel scans.

Parallel joins

You can use the PQ_DISTRIBUTE hint to specify the parallel query execution mode of a join.

  • Use the PQ_DISTRIBUTE hint to specify the data distribution strategy of a table:
    /*+ PQ_DISTRIBUTE([query_block] table_name  strategy )
    The following table describes the required parameters.
    Parameter Description
    query_block The name of the query block to which the hint is applied.
    table_name The name of the table to which the hint is applied.
    strategy The data distribution strategy. Valid values:
    • PQ_GATHER: Data is aggregated to a worker of the next phase.
    • PQ_HASH: Data is shuffled and distributed to multiple workers of the next phase.
    • PQ_BROADCAST: Data is broadcast to multiple workers of the next phase.
    • PQ_NONE: Data is not distributed.
    Examples:
    SELECT /*+ PARALLEL(t1) PQ_DISTRIBUTE(t1 PQ_GATHER) */ * FROM t as t1;
    -- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, parallel queries are forcibly executed. The default value of records_threshold_for_parallelism is 10000. 
    -- The default max_parallel_degree parameter is used to specify the DOP. If max_parallel_degree is set to a value greater than 0, 
    -- parallel queries are enabled. If max_parallel_degree is set to 0, parallel queries are disabled. 
    -- After the table is scanned in parallel, the data is not distributed, but the results are gathered by the leader.
    SELECT /*+ PARALLEL(t1) PQ_DISTRIBUTE(t1 PQ_HASH) */t1.a, SUM(t1.b) * FROM t as t1 GROUP BY t1.a;
    -- After the table is scanned in parallel, the data is distributed to workers of the next phase based on the group key.
  • Use the PQ_DISTRIBUTE hint to specify the joining mode of two tables:
    /*+ PQ_DISTRIBUTE([query_block] table_name strategy1 [strategy2])
    In the preceding statement, if only strategy1 is specified, it indicates that only the data distribution mode of the table_name table is specified. If both strategy1 and strategy2 are specified, it indicates that the parallel joining mode between the table_name table and a previous table is specified.
    Note The previous table can be a physical table or an intermediate table in the previous joining operation.
    Examples:
    SELECT /*+ PARALLEL(t1) PARALLEL(t2) PQ_DISTRIBUTE(t2 PQ_HASH PQ_HASH) */ * FROM t as t1 STRAIGHT_JOIN t as t2 on t1.b = t2.c;
    -- Table t1 is scanned in parallel, and the data is shuffled and distributed to workers of the next phase.
    -- Table t2 is scanned in parallel, and the data is shuffled and distributed to workers of the next phase.
    -- Workers of the second phase execute a co-location join. Then, the leader gathers the results.
    SELECT /*+ PARALLEL(t1) PARALLEL(t2) PQ_DISTRIBUTE(t2 PQ_GATHER PQ_GATHER) */ * FROM t as t1 STRAIGHT_JOIN t as t2 on t1.b = t2.c;
    -- Table t1 is scanned in parallel, and the data is distributed to the leader.
    -- Table t2 is scanned in parallel, and the data is distributed to the leader.
    -- The leader gathers the data and completes the joining operation.
    If queries do not support parallel processing or queries use two conflicting hints, the queries may fail. Example:
    SELECT /*+ PARALLEL(t1) PARALLEL(t2) PQ_DISTRIBUTE(t2 PQ_HASH PQ_GATHER) */ * FROM t as t1 STRAIGHT_JOIN t as t2 on t1.b = t2.c;
    -- Table t1 is scanned in parallel, and the data is distributed to multiple workers of the next phase.
    -- Table t2 is scanned in parallel, and the data is distributed to the leader.
    -- The data distribution modes of the two hints conflict with each other, so a parallel query plan cannot be generated.

Parallel by-group aggregation: PQ_GROUPBY

You can use the PQ_GROUPBY hint to specify the execution mode of by-group aggregation.
/*+ PQ_GROUPBY(strategy) */
The following table describes the required parameter.
Parameter Description
strategy The data distribution strategy. Valid values:
  • PQ_ONEPHASE: Data is distributed to multiple workers based on the group key to complete by-group aggregation in parallel at the same time.
  • PQ_TWOPHASE_GATHER: Each worker performs data aggregation separately and then the leader merges the partial results to produce the final result.
  • PQ_TWOPHASE_HASH: Each worker performs data aggregation separately and distributes the data to workers of the next phase based on the group key.
  • PQ_SERIAL: By-group aggregation is performed serially.
Examples:
SELECT /*+ PARALLEL(t1) PQ_GROUPBY(PQ_ONEPHASE) */ t1.a, sum(t1.b) FROM t as t1 GROUP BY t1.a;
-- Table t1 is scanned in parallel and the data is distributed to multiple workers of the next phase based on column t1.a.
-- Workers of the second phase complete aggregation calculations. Then, the results are gathered by the leader.

SELECT /*+ PARALLEL(t1) PQ_GROUPBY(PQ_TWOPHASE_HASH) */ t1.a, sum(t1.b) FROM t as t1 GROUP BY t1.a;
-- Workers scan table t1 in parallel and perform data aggregation separately.
-- Intermediate aggregation results are distributed to multiple workers of the next phase based on column t1.a.
-- Workers of the second phase complete aggregation calculations. Then, the results are gathered by the leader.

Parallel by-group aggregation: PQ_DISTINCT

You can use the PQ_DISTINCT hint to specify the execution mode of by-group aggregation.
/*+ PQ_DISTINCT(strategy) */
The following table describes the required parameter.
Parameter Description
strategy The data distribution strategy. Valid values:
  • PQ_TWOPHASE_GATHER: Each worker performs data aggregation separately and then the leader merges the partial results to produce the final result.
  • PQ_SERIAL: By-group aggregation is performed serially.

Parallel window functions

You can use the PQ_WINDOW hint to specify the execution mode of a window function.
/*+ PQ_WINDOW([window_name] strategy) */
Note A hint with a specified window name has a higher priority than a hint without a window name.
The following table describes the required parameters.
Parameter Description
window_name The name of the window function to which the data distribution strategy applies. If you do not specify this parameter, the data distribution strategy applies to all window functions.
strategy The data distribution strategy. Valid values:
  • PQ_ONEPHASE: Data is distributed by using the PARTITION BY clause, and the window function is called to run computations on multiple workers in parallel.
  • PQ_SERIAL: By-group aggregation is performed serially.
Example:
SELECT /*+ PQ_WINDOW(PQ_ONEPHASE)  PQ_WINDOW(win PQ_SERIAL) */
  ROW_NUMBER() OVER(win) AS 'row_number',
  RANK() over(partition by name order by salary desc)
FROM employee_salaries WINDOW win as (partition by dept order by salary desc);
-- The window function named win is called to run computations serially.
-- For other window functions, data is distributed to multiple workers based on the key specified in the PARTITION BY clause for parallel computations.

Parallel data sorting

You can use the PQ_ORDERBY hint to specify the data sorting mode.
/*+ PQ_ORDERBY(strategy) */
The following table describes the required parameter.
Parameter Description
strategy The data distribution strategy. Valid values:
  • PQ_TWOPHASE_GATHER: Workers sort data in parallel and then the leader merges the partial results to produce the final sorting result.
  • PQ_SERIAL: Data is sorted serially.

Parallel subqueries

You can use hints to specify the policy to run parallel subqueries. For more information, see the support for subqueries in View parallel query execution plans. The following hint syntax is provided:

/*+ PQ_PUSHDOWN [( [query_block])] */ The subqueries run in parallel based on the pushdown policy. 
/*+ NO_PQ_PUSHDOWN [( [query_block])] */ The subqueries run in parallel based on the shared access policy. 

Examples:

# The subqueries run in parallel based on the pushdown policy.
EXPLAIN SELECT /*+ PQ_PUSHDOWN(@qb1) */ * FROM t2 WHERE t2.a =
                 (SELECT /*+ qb_name(qb1) */ a FROM t1);

# The subqueries run in parallel based on the shared access policy.
EXPLAIN SELECT /*+ NO_PQ_PUSHDOWN(@qb1) */ * FROM t2 WHERE t2.a =
                 (SELECT /*+ qb_name(qb1) */ a FROM t1);
# No hint is added to the query block.
EXPLAIN SELECT * FROM t2 WHERE t2.a =
                 (SELECT /*+ NO_PQ_PUSHDOWN() */ a FROM t1);