All Products
Search
Document Center

Hints related to parallel execution

Last Updated: Jun 18, 2021

PARALLEL Hint

The PARALLEL hint is a statement-level hint that instructs the optimizer to specify the number of parallel servers that can be used for parallel operations. The PARALLEL hint overwrites the value of the PARALLEL_DEGREE_ POLICY initialization parameter. The PARALLEL hint is applicable to the SELECT, INSERT, MERGE, UPDATE, and DELETE portions of a statement and to the table scan portion. If a parallel constraint is violated, the PARALLEL hint is ignored.

The PARALLEL hint uses the following syntax:

/*+ PARALLEL [ ( DEFAULT | AUTO | MANUAL | integer ) ] */
Notice

If sorting or grouping is also performed, the number of servers that can be used is twice the value specified in the PARALLEL hint.

You can set the following parameters in the PARALLEL hint:

  • PARALLEL: the degree of parallelism that is calculated by the database. The value can be 2 or a greater value. The statement is always executed as a set of operations that are performed in parallel.
  • PARALLEL(DEFAULT): the degree of parallelism that is calculated by the optimizer. The parameter value is equal to the product of the value of the PARALLEL_THREADS_PER_CPU initialization parameter and the number of CPUs available on all of the participating instances.
  • PARALLEL(AUTO): the degree of parallelism that is calculated by the database. The value can be greater than or equal to 1. If the calculated degree of parallelism is 1, the statement is executed as a set of operations that are performed in sequence.
  • PARALLEL(MANUAL): the degree of parallelism that the optimizer is forced to use. The degree of parallelism is specified by the parallel settings of the objects in the statement.
  • PARALLEL(integer): the degree of parallelism that is used by the optimizer and specified by the integer parameter. The value of the integer parameter is an integer.

In the following example, the database calculates the degree of parallelism and the statement is always executed as a set of operations that are performed in parallel:

SELECT /*+ PARALLEL */ last_name
  FROM employees;

In the following example, the database calculates the degree of parallelism, but the degree of parallelism is 1. Therefore, the statement is executed as a set of operations that are performed in sequence:

SELECT /*+ PARALLEL (AUTO) */ last_name
  FROM employees;

In the following example, 5 is specified in the statement as the degree of parallelism and takes effect on the current table. The PARALLEL hint recommends that the optimizer use 5 as the degree of parallelism:

CREATE TABLE parallel_table (col1 number, col2 VARCHAR2(10)) PARALLEL 5; 
SELECT /*+ PARALLEL (MANUAL) */ col2
 FROM parallel_table;

USE_PX Hint

The USE_PX hint instructs the server to execute SQL statements in PX mode. In PX mode, multithreading is allowed when statements are executed. In general, the USE_PX hint is used in conjunction with the PARALLEL hint.

The USE_PX hint uses the following syntax:

/*+ USE_PX  */

Example:

SELECT /*+ USE_PX PARALLEL(4)*/ e.department_id, sum(e.salary)
 FROM employees e
 WHERE e.department_id = 1001;
 GROUP BY e.department_id;

NO_USE_PX Hint

The NO_USE_PX hint instructs the server not to use the PX mode to execute SQL statements.

The NO_USE_PX hint uses the following syntax:

/*+ NO_USE_PX  */

Example:

SELECT /*+ NO_USE_PX*/ e.department_id, sum(e.salary)
 FROM employees e
 WHERE e.department_id = 1001;
 GROUP BY e.department_id;

PQ_DISTRIBUTE Hint

The PQ_DISTRIBUTE hint instructs the optimizer on how to distribute rows between producer (query) servers and consumer (load) servers. You can use this hint to control row distribution for joins or loads. The PQ_DISTRIBUTE hint uses the following syntax:

/*+ PQ_DISTRIBUTE
  ( [ @ queryblock ] tablespec 
    { distribution | outer_distribution inner_distribution }
  ) */

Control load distribution

You can control the row distribution for parallel INSERT ... SELECT and CREATE TABLE ... AS SELECT statements to determine how rows are distributed between producer (query) servers and consumer (load) servers. Use the upper branch of the syntax to specify a distribution method. The following table lists the values and semantics of distribution methods.

Distribution method

Description

NONE

Row distribution is not performed. This means that query and load operations are combined on each query server. Each server loads all of the partitions. In this distribution method, row distribution is not performed. This helps you avoid the overhead of row distribution when no skew occurs. Skew may occur due to null fields. Skew may also occur because a predicate in the statement filters out all of the rows that are evaluated by the query. If skew occurs due to this distribution method, use RANDOM or RANDOM_ LOCAL distribution instead.

Notice

Use this distribution method with caution. A minimum of 512 KB Program Global Area (PGA) memory is required for the loading of each process. If compression is also used, each server consumes about 1.5 MB PGA memory.

PARTITION

This method uses the partition information of tablespec to distribute rows from query servers to consumer servers. Use this method when the action of combining query and load operations is not possible or expected. Use this method also when the number of partitions being loaded is greater than or equal to the number of load servers and no skew occurs. If no skew occurs, the input data is evenly distributed across the partitions being loaded.

RANDOM

This method distributes rows from producer servers to consumer servers in a round-robin manner. Use this method when the input data is highly skewed.

RANDOM_LOCAL

This method distributes rows from producer servers to a group of servers that are responsible for maintaining a given group of servers. Two or more servers can load the same partition, but none of the servers loads all of the partitions. Use this distribution method when the input data is skewed and query and load operations cannot be combined due to memory limits.

For example, in the following direct-load INSERT operation, the query and load portions of the operation are combined on each query server:

INSERT /*+ APPEND PARALLEL(target_table, 16) PQ_DISTRIBUTE(target_table, NONE) */
  INTO target_table
  SELECT * FROM source_table;

In the following example, the optimizer uses the partitioning of the target_table table to distribute rows when the table is created:

CREATE /*+ PQ_DISTRIBUTE(target_table, PARTITION) */ TABLE target_table
  NOLOGGING PARALLEL 16
  PARTITION BY HASH (l_orderkey) PARTITIONS 512
  AS SELECT * FROM source_table;

Control the distribution for joins

You can specify two methods to control the distribution for joins. In the lower branch of the syntax, one of the distribution methods is for the outer table and the other distribution method is for the inner table:

  • outside_distribution specifies the distribution method for the outer table.
  • inner_distribution specifies the distribution method for the inner table.

The values for distribution methods are HASH, BROADCAST, PARTITION, and NONE. Only the six combinations of distribution methods in the following table are valid.

Distribution method

Description

HASH, HASH

The rows of each table are mapped to consumer query servers by using a hash function on the join keys. After the mapping is completed, each query server performs a join operation on a pair of result partitions. We recommend that you use this distribution method when the sizes of tables are comparable and the join operation is implemented by using a hash join or a sort merge join.

BROADCAST, NONE

All of the rows in the outer table are broadcast to each producer query server. The rows in the inner table are randomly distributed across partitions. We recommend that you use this distribution method when the size of the outer table is extremely smaller than that of the inner table. In some scenarios, the product of the inner table size and the number of query servers is greater than the outer table size. In general, we also recommend that you use this distribution method in these scenarios.

NONE, BROADCAST

All of the rows in the inner table are broadcast to each consumer query server. The rows in the outer table are randomly distributed across partitions. We recommend that you use this distribution method when the size of the inner table is extremely smaller than that of the outer table. In some scenarios, the product of the inner table size and the number of query servers is smaller than the outer table size. In general, we also recommend that you use this distribution method in these scenarios.

PARTITION, NONE

The rows in the outer table are mapped based on the partitioning of the inner table. The inner table must be partitioned based on the join key. We recommend that you use this distribution method when the number of outer table partitions is equal to or nearly equal to a multiple of the number of query servers. For example, the number of partitions is 14 and the number of query servers is 15.

Notice

If the inner table is not partitioned or an equijoin is not performed on the inner table based on the partition key, the optimizer ignores this hint.

NONE, PARTITION

The rows in the inner table are mapped based on the partitioning of the outer table. The outer table must be partitioned based on the join key. We recommend that you use this distribution method when the number of outer table partitions is equal to or nearly equal to a multiple of the number of query servers. For example, the number of partitions is 14 and the number of query servers is 15.

Notice

If the outer table is not partitioned or an equijoin is not performed on the outer table based on the partition key, the optimizer ignores this hint.

NONE, NONE

Each query server performs a join operation on a pair of matching partitions. Each table contains a partition that consists of the pair. The two tables must be equally joined based on the join keys.

In the following example, a hash join is implemented on the two given tables r and s and the query contains a hint that instructs the system to use hash distribution:

SELECT /*+ORDERED PQ_DISTRIBUTE(s HASH, HASH) USE_HASH (s)*/ column_list
  FROM r,s
  WHERE r.c=s.c;

To broadcast the outer table r, execute the following query statement:

SELECT /*+ORDERED PQ_DISTRIBUTE(s BROADCAST, NONE) USE_HASH (s) */ column_list
  FROM r,s
  WHERE r.c=s.c;