All Products
Search
Document Center

PolarDB:Parallel hints

Last Updated:Mar 14, 2024

Parallel hints allow you to specify whether to enable the parallel query feature. You can also specify the degree of parallelism (DOP), the tables that need to be scanned in parallel, and the parallel execution mode of each operator. PolarDB for MySQL 8.0.1 supports the PARALLEL and NO_PARALLEL hints in the parallel query feature. PolarDB for MySQL 8.0.2 supports the following hints: PARALLEL, NO_PARALLEL, PQ_DISTRIBUTE, PQ_GROUPBY, PQ_DISTINCT, PQ_WINDOW, and PQ_ORDERBY. The PQ_DISTRIBUTE hint is added to specify the parallel execution policies of joint operations. The PQ_GROUPBY, PQ_DISTINCT, PQ_WINDOW, and PQ_ORDERBY hints are used to specify the parallel execution mode of each analytic operator.

Enable or disable the parallel query feature

  • Enable the parallel query feature

    You can execute one of the following statements to enable the parallel query feature:

    • SELECT /*+PARALLEL(x)*/ ... FROM ...;   -- x >0
    • SELECT /*+ SET_VAR(max_parallel_degree=n) */  *  FROM ...   -- n > 0
  • Disable the parallel query feature

    You can execute one of the following statements to disable the parallel query feature:

    • SELECT /*+NO_PARALLEL()*/ ... FROM ...;
    • SELECT /*+ SET_VAR(max_parallel_degree=0) */  *  FROM ...

Use hints to specify the tables that need to be scanned in parallel

The parallel query feature supports the PARALLEL and NO_PARALLEL hints. These hints specify tables that need to be scanned in parallel and ones that do not need to be scanned in parallel.

  • You can use the PARALLEL hint to enforce parallel queries and specify the DOP and the tables that need to be scanned in parallel. Sample syntax:

    /*+ PARALLEL [( [query_block] [table_name]  [degree] )] */
  • You can use the NO_PARALLEL hint to enforce sequential queries or specify the tables for which parallel scans are prohibited.

    /*+ NO_PARALLEL [( [query_block] [table_name][, table_name] )] */

The following table describes the parameters that you can configure.

Parameter

Description

query_block

The name of the query block to which the hint is applied.

table_name

The name of the table to which the hint is applied.

degree

The DOP.

Examples:

SELECT /*+PARALLEL()*/ * FROM t1, t2; 
-- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, parallel queries are forcibly executed. The default value of records_threshold_for_parallelism is 10000. 
-- The default parameter max_parallel_degree is used to specify the DOP. If max_parallel_degree is set to a value greater than 0, 
-- The parallel query feature is enabled. If max_parallel_degree is set to 0, the parallel query feature is disabled. 

SELECT /*+PARALLEL(8)*/ * FROM t1, t2; 
-- This statement enforces parallel queries in a DOP of 8.  
-- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, parallel queries are forcibly executed. The default value of records_threshold_for_parallelism is 10000. 
-- The max_parallel_degree parameter is set to 8. 

SELECT /*+ SET_VAR(max_parallel_degree=8) */  *  FROM ...   
-- The max_parallel_degree parameter is set to 8.  
-- When the number of rows in the table is less than the value of records_threshold_for_parallelism, such as 20000, the parallel query feature is automatically disabled. 

SELECT /*+PARALLEL(t1)*/ * FROM t1, t2; 
-- This statement enforces parallel queries on the t1 table by using the /*+PARALLEL()*/ syntax for the table.

SELECT /*+PARALLEL(t1 8)*/ * FROM t1, t2; 
-- This statement enforces parallel queries on the t1 table in a DOP of 8 by using the /*+PARALLEL(8)*/ syntax for the table.

SELECT /*+PARALLEL(@subq1)*/ SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+QB_NAME(subq1)*/ SUM(t1.a) FROM t1); 
-- This statement enforces parallel subqueries. The default parameter max_parallel_degree is used to specify the DOP.  
-- If max_parallel_degree is set to a value greater than 0, the parallel query feature is enabled. If max_parallel_degree is set to 0, the parallel query feature is disabled.

SELECT /*+PARALLEL(@subq1 8)*/ SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+QB_NAME(subq1)*/ SUM(t1.a) FROM t1); 
-- This statement enforces parallel subqueries. The max_parallel_degree parameter is set to 8.

SELECT SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+PARALLEL()*/ SUM(t1.a) FROM t1); 
-- This statement enforces parallel subqueries.  
-- The default parameter max_parallel_degree is used to specify the DOP.  
-- If max_parallel_degree is set to a value greater than 0, the parallel query feature is enabled. If max_parallel_degree is set to 0, the parallel query feature is disabled.

SELECT SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+PARALLEL(8)*/ SUM(t1.a) FROM t1); 
-- This statement enforces parallel subqueries. The max_parallel_degree parameter is set to 8.

SELECT /*+NO_PARALLEL()*/ * FROM t1, t2; 
-- This statement disables the parallel query feature.

SELECT /*+NO_PARALLEL(t1)*/ * FROM t1, t2; 
-- This statement prohibits parallel queries for only the t1 table. If the system enables the parallel query feature, the system may perform parallel scans and run parallel queries on the t2 table.

SELECT /*+NO_PARALLEL(t1, t2)*/ * FROM t1, t2; 
-- This statement prohibits parallel queries for the t1 and t2 tables.

SELECT /*+NO_PARALLEL(@subq1)*/ SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+QB_NAME(subq1)*/ SUM(t1.a) FROM t1); 
-- This statement prohibits parallel subqueries.

SELECT SUM(t.a) FROM t WHERE t.a = 
  (SELECT /*+NO_PARALLEL()*/ SUM(t1.a) FROM t1); 
 -- This statement prohibits parallel subqueries.
Note

The PARALLEL hint is ineffective for queries that do not support parallel processing or tables that do not support parallel scans.

Parallel join operations

You can use the PQ_DISTRIBUTE hint to specify the parallel query execution mode of a join operation.

  • Use the PQ_DISTRIBUTE hint to specify the data distribution strategy of a table:

    /*+ PQ_DISTRIBUTE([query_block] table_name  strategy ) */

    The following table describes the parameters that you can configure.

    Parameter

    Description

    query_block

    The name of the query block to which the hint is applied.

    table_name

    The name of the table to which the hint is applied.

    strategy

    The data distribution strategy. Valid values:

    • PQ_GATHER: Data is aggregated to a worker of the previous phase.

    • PQ_HASH: Data is shuffled and distributed to multiple workers of the previous phase.

    • PQ_BROADCAST: Data is broadcast to multiple workers of the previous phase.

    • PQ_NONE: Data is not distributed.

    Examples:

    SELECT /*+ PARALLEL(t1) PQ_DISTRIBUTE(t1 PQ_GATHER) */ * FROM t as t1;
    -- When the number of rows in the table is less than the specified value of records_threshold_for_parallelism, parallel queries are forcibly executed. The default value of records_threshold_for_parallelism is 10000. 
    -- The default parameter max_parallel_degree is used to specify the DOP. If max_parallel_degree is set to a value greater than 0,
    -- The parallel query feature is enabled. If max_parallel_degree is set to 0, the parallel query feature is disabled. 
    -- After the table is scanned in parallel, the data is not distributed, but the results are gathered by the leader.
    SELECT /*+ PARALLEL(t1) PQ_DISTRIBUTE(t1 PQ_HASH) */t1.a, SUM(t1.b) * FROM t as t1 GROUP BY t1.a;
    -- After the table is scanned in parallel, the data is distributed to the workers of the next phase based on the group key.
  • Use the PQ_DISTRIBUTE hint to specify the joining mode of two tables:

    /*+ PQ_DISTRIBUTE([query_block] table_name strategy1 [strategy2]) */

    In the preceding statement, if only strategy1 is specified, it indicates that the data distribution method of the table_name table is specified. If both strategy1 and strategy2 are specified, it indicates that the parallel joining mode between the table_name table and a previous table is specified.

    Note

    The previous table can be a physical table or an intermediate table in the previous join operation.

    Examples:

    SELECT /*+ PARALLEL(t1) PARALLEL(t2) PQ_DISTRIBUTE(t2 PQ_HASH PQ_HASH) */ * FROM t as t1 STRAIGHT_JOIN t as t2 on t1.b = t2.c;
    -- The t1 table is scanned in parallel, and the data is shuffled and distributed to the workers of the next phase.
    -- The t2 table is scanned in parallel, and the data is shuffled and distributed to the workers of the next phase.
    -- The workers of the next phase execute a co-location join. Then, the results are gathered by the leader.
    SELECT /*+ PARALLEL(t1) PARALLEL(t2) PQ_DISTRIBUTE(t2 PQ_GATHER PQ_GATHER) */ * FROM t as t1 STRAIGHT_JOIN t as t2 on t1.b = t2.c;
    -- The t1 table is scanned in parallel, and the data is distributed to the leader.
    -- The t2 table is scanned in parallel, and the data is distributed to the leader.
    -- The leader gathers the data and completes the join operation.

    If queries do not support parallel processing or queries use two conflicting hints, the queries may fail. Sample syntax:

    SELECT /*+ PARALLEL(t1) PARALLEL(t2) PQ_DISTRIBUTE(t2 PQ_HASH PQ_GATHER) */ * FROM t as t1 STRAIGHT_JOIN t as t2 on t1.b = t2.c;
    -- The t1 table is scanned in parallel, and the data is distributed to multiple workers of the next phase.
    -- The t2 table is scanned in parallel, and the data is distributed to the leader.
    -- The data distribution modes of the two hints conflict with each other, so a parallel query plan cannot be generated.

Parallel by-group aggregation: PQ_GROUPBY

You can use the PQ_GROUPBY hint to specify the execution mode of by-group aggregation.

/*+ PQ_GROUPBY(strategy) */

The following table describes the parameters that you can configure.

Parameter

Description

strategy

The data distribution strategy. Valid values:

  • PQ_ONEPHASE: Data is distributed to multiple workers based on the group key to complete by-group aggregation in parallel at the same time.

  • PQ_TWOPHASE_GATHER: Each worker performs data aggregation separately and then the leader merges the partial results to produce the final result.

  • PQ_TWOPHASE_HASH: Each worker performs data aggregation separately and distributes the data to multiple workers of the next phase based on the group key.

  • PQ_SERIAL: By-group aggregation is performed serially.

Examples:

SELECT /*+ PARALLEL(t1) PQ_GROUPBY(PQ_ONEPHASE) */ t1.a, sum(t1.b) FROM t as t1 GROUP BY t1.a;
-- The t1 table is scanned in parallel and the data is distributed to multiple workers of the next phase based on the t1.a column.
-- Multiple workers of the next phase complete aggregation calculations. Then, the results are gathered by the leader.

SELECT /*+ PARALLEL(t1) PQ_GROUPBY(PQ_TWOPHASE_HASH) */ t1.a, sum(t1.b) FROM t as t1 GROUP BY t1.a;
-- Workers scan the t1 table in parallel and perform data aggregation separately.
-- Intermediate aggregation results are distributed to multiple workers of the next phase based on the t1.a column.
-- Multiple workers of the next phase complete aggregation calculations. Then, the results are gathered by the leader.

Parallel by-group aggregation: PQ_DISTINCT

You can use the PQ_DISTINCT hint to specify the execution mode of by-group aggregation.

/*+ PQ_DISTINCT(strategy) */

The following table describes the parameters that you can configure.

Parameter

Description

strategy

The data distribution strategy. Valid values:

  • PQ_TWOPHASE_GATHER: Each worker performs data aggregation separately and then the leader merges the partial results to produce the final result.

  • PQ_SERIAL: By-group aggregation is performed serially.

Parallel window functions

You can use the PQ_WINDOW hint to specify the execution mode of a window function.

/*+ PQ_WINDOW([window_name] strategy) */
Note

A hint with a specified window name has a higher priority than a hint without a window name.

The following table describes the parameters that you can configure.

Parameter

Description

window_name

The name of the window function to which the data distribution strategy applies. If you do not specify this parameter, the data distribution strategy applies to all window functions.

strategy

The data distribution strategy. Valid values:

  • PQ_ONEPHASE: Data is distributed by using the PARTITION BY clause, and the window function is called to run computations on multiple workers in parallel.

  • PQ_SERIAL: By-group aggregation is performed serially.

Examples:

SELECT /*+ PQ_WINDOW(PQ_ONEPHASE)  PQ_WINDOW(win PQ_SERIAL) */
  ROW_NUMBER() OVER(win) AS 'row_number',
  RANK() over(partition by name order by salary desc)
FROM employee_salaries WINDOW win as (partition by dept order by salary desc);
-- The window function named win is called to run computations serially.
-- For other window functions, data is distributed to multiple workers based on the key specified in the PARTITION BY clause for parallel computations.

Parallel data sorting

You can use the PQ_ORDERBY hint to specify the data sorting mode.

/*+ PQ_ORDERBY(strategy) */

The following table describes the parameters that you can configure.

Parameter

Description

strategy

The data distribution strategy. Valid values:

  • PQ_TWOPHASE_GATHER: Workers sort data in parallel and then the leader merges the partial results to produce the final sorting result.

  • PQ_SERIAL: Data is sorted serially.

Parallel subqueries

You can use hints to specify the policy to run parallel subqueries. For more information, see the support for subqueries in View parallel query execution plans. Sample syntax:

/*+ PQ_PUSHDOWN [( [query_block])] */      # The subqueries run in parallel based on the pushdown policy. 
/*+ NO_PQ_PUSHDOWN [( [query_block])] */  # The subqueries run in parallel based on the shared access policy.

Examples:

# The subqueries run in parallel based on the pushdown policy.
EXPLAIN SELECT /*+ PQ_PUSHDOWN(@qb1) */ * FROM t2 WHERE t2.a =
                 (SELECT /*+ qb_name(qb1) */ a FROM t1);

# The subqueries run in parallel based on the shared access policy.
EXPLAIN SELECT /*+ NO_PQ_PUSHDOWN(@qb1) */ * FROM t2 WHERE t2.a =
                 (SELECT /*+ qb_name(qb1) */ a FROM t1);
# No hint is added to the query block.
EXPLAIN SELECT * FROM t2 WHERE t2.a =
                 (SELECT /*+ NO_PQ_PUSHDOWN() */ a FROM t1);