This topic describes the pg_hint_plan extension of ApsaraDB RDS for PostgreSQL. You can use this extension to add hints to change the execution plans of SQL statements on an ApsaraDB RDS for PostgreSQL instance.

Prerequisites

Background information

PostgreSQL uses a cost-based optimizer that works based on data statistics rather than static rules. The optimizer evaluates the cost of every possible execution plan for each SQL statement and selects the execution plan that has the lowest cost. However, the optimizer does not consider the possible internal relationships among data. Therefore, the final execution plan may not be the best plan. You can use the pg_hint_plan extension to add hints to SQL statements. The hints specify how you want to execute the SQL statements. This way, you can optimize the execution plans of the SQL statements.

Basic usage

A hint starts with a forward slash, an asterisk, and a plus sign (/*+) and ends with an asterisk and a forward slash (*/). A hint consists of the hint name and the parameters. The hint name precedes the parameters. The parameters are enclosed in a pair of parentheses () and are separated with each other by spaces. For readability purposes, you can separate each hint with a line break.

Example:

In this example, the HashJoin hint specifies that the SeqScan method is used to scan the pgbench_accounts table.

 /*+
    HashJoin(a b)
    SeqScan(a)
  */
 EXPLAIN SELECT *
    FROM pgbench_branches b
    JOIN pgbench_accounts a ON b.bid = a.bid
   ORDER BY a.aid;

The following result is returned:

                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Sort  (cost=31465.84..31715.84 rows=100000 width=197)
   Sort Key: a.aid
   ->  Hash Join  (cost=1.02..4016.02 rows=100000 width=197)
         Hash Cond: (a.bid = b.bid)
         ->  Seq Scan on pgbench_accounts a  (cost=0.00..2640.00 rows=100000 width=97)
         ->  Hash  (cost=1.01..1.01 rows=1 width=100)
               ->  Seq Scan on pgbench_branches b  (cost=0.00..1.01 rows=1 width=100)
(7 rows)

Hint table

Hints can be used to optimize the execution plans of SQL statements. However, this is convenient only when SQL statements are editable. If SQL statements are not editable, you can place hints in a table named hint_plan.hints. The hint_plan.hints table contains the columns that are described in the following table.

Note By default, the user who creates the pg_hint_plan extension has the permissions on the hint_plan.hints table. The hints in the hint_plan.hints table take precedence over the hints that you add by using the pg_hint_plan extension.
FieldDescription
idThe ID of the hint. The ID is unique and automatically generated.
norm_query_stringThe pattern that matches the SQL statement to which you want to add the hint. The constants in the SQL statement must be replaced with wildcards (?). Spaces are crucial parts of the pattern.
application_nameThe name of the application to which the hint is applied. If this parameter is left empty, the hint is applied to all applications.
hintsThe comment that contains the hint. No need to include comment marks.

Example:

INSERT INTO hint_plan.hints(norm_query_string, application_name, hints)
    VALUES (
        'EXPLAIN (COSTS false) SELECT * FROM t1 WHERE t1.id = ?;',
        '',
        'SeqScan(t1)'
    );
INSERT 0 1
postgres=# UPDATE hint_plan.hints
postgres-#    SET hints = 'IndexScan(t1)'
postgres-#  WHERE id = 1;
UPDATE 1
postgres=# DELETE FROM hint_plan.hints
postgres-#  WHERE id = 1;
DELETE 1

Hint types

Hints are divided into the following types based on how they affect execution plans:

  • Hints for scan methods

    This type of hint specifies the method that is used to scan the specified table. If the specified table has an alias, the pg_hint_plan extension identifies the table based on the alias. The supported scan methods include SeqScan, IndexScan, and NoSeqScan.

    The hints for scan methods are valid on ordinary tables, inherited tables, unlogged tables, temporary tables, and system tables. The hints for scan methods are invalid on external tables, table functions, statements in which the values of constants are specified, universal expressions, views, and subqueries.

    Example:

    /*+
        SeqScan(t1)
        IndexScan(t2 t2_pkey)
     */
    SELECT * FROM table1 t1 JOIN table table2 t2 ON (t1.key = t2.key);
  • Hints for join methods

    This type of hint specifies the method that is used to join the specified tables.

    The hints for join methods are valid on ordinary tables, inherited tables, unlogged tables, temporary tables, external tables, system tables, table functions, statements in which the values of constants are specified, and universal expressions. The hints for join methods are invalid on views and subqueries.

  • Hints for join order

    This type of hint specifies the order in which two or more tables are joined. You can use one of the following methods to specify a hint that specifies the join order:

    • Specify the order in which you want to join the specified tables without the need to restrict the direction at each join level.
    • Specify the order in which you want to join the specified tables and the direction at each join level.

    Example:

    /*+
        NestLoop(t1 t2)
        MergeJoin(t1 t2 t3)
        Leading(t1 t2 t3)
     */
    SELECT * FROM table1 t1
        JOIN table table2 t2 ON (t1.key = t2.key)
        JOIN table table3 t3 ON (t2.key = t3.key);
  • Hints for row number correction

    This type of hint corrects row number errors that are caused by the optimizer.

    Example:

    /*+ Rows(a b #10) */ SELECT... ;     //Set the row number to 10. 
    /*+ Rows(a b +10) */ SELECT... ;     //Increase the row number by 10. 
    /*+ Rows(a b -10) */ SELECT... ;     //Decrease the row number by 10. 
    /*+ Rows(a b *10) */ SELECT... ;     //Increase the row number by 10 times. 
  • Hints for parallel execution

    This type of hint specifies the plan that is used to execute SQL statements in parallel.

    The hints for parallel execution are valid on ordinary tables, inherited tables, unlogged tables, and system tables. The hints for parallel execution are invalid on external tables, clauses in which the values of constants are specified, universal expressions, views, and subqueries. You can specify the internal tables of a view based on their real names or aliases.

    The following examples show how an SQL statement is executed in a different way on each table:

    • Example 1:
      explain /*+ Parallel(c1 3 hard) Parallel(c2 5 hard) */
             SELECT c2.a FROM c1 JOIN c2 ON (c1.a = c2.a);

      The following result is returned:

                                        QUERY PLAN                                   
      -------------------------------------------------------------------------------
       Hash Join  (cost=2.86..11406.38 rows=101 width=4)
         Hash Cond: (c1.a = c2.a)
         ->  Gather  (cost=0.00..7652.13 rows=1000101 width=4)
               Workers Planned: 3
               ->  Parallel Seq Scan on c1  (cost=0.00..7652.13 rows=322613 width=4)
         ->  Hash  (cost=1.59..1.59 rows=101 width=4)
               ->  Gather  (cost=0.00..1.59 rows=101 width=4)
                     Workers Planned: 5
                     ->  Parallel Seq Scan on c2  (cost=0.00..1.59 rows=59 width=4)
    • Example 2:
      EXPLAIN /*+ Parallel(tl 5 hard) */ SELECT sum(a) FROM tl;

      The following result is returned:

                                          QUERY PLAN                                  
      -----------------------------------------------------------------------------------
       Finalize Aggregate  (cost=693.02..693.03 rows=1 width=8)
         ->  Gather  (cost=693.00..693.01 rows=5 width=8)
               Workers Planned: 5
               ->  Partial Aggregate  (cost=693.00..693.01 rows=1 width=8)
                     ->  Parallel Seq Scan on tl  (cost=0.00..643.00 rows=20000 width=4)
  • Hints for GUC parameter setting

    This type of hint temporarily changes the value of a GUC parameter. The values of GUC parameters in the execution plan help you achieve the effect that you expect. However, this does not apply if the specified hint conflicts with the execution plans of other SQL statements. If you configure a GUC parameter more than once, the most recent value takes effect.

    Example:

    /*+ Set(random_page_cost 2.0) */
    SELECT * FROM table1 t1 WHERE key = 'value';

The following table describes all hints that are supported by the pg_hint_plan extension.

TypeFormatDescription
Hints for scan methods SeqScan(table)Specifies a sequential scan.
TidScan(table)Specifies a TID scan.
IndexScan(table[ index...])Specifies an index scan. You can specify an index.
IndexOnlyScan(table[ index...])Specifies an index-only scan. You can specify an index.
BitmapScan(table[ index...])Specifies a bitmap scan.
NoSeqScan(table)Prohibits a sequential scan.
NoTidScan(table)Prohibits a TID scan.
NoIndexScan(table)Prohibits an index scan.
NoIndexOnlyScan(table)Prohibits an index scan. Only tables are scanned.
NoBitmapScan(table)Prohibits a bitmap scan.
Hints for join methods NestLoop(table table[ table...])Specifies a nested loop join.
HashJoin(table table[ table...])Specifies a hash join.
MergeJoin(table table[ table...])Specifies a merge join.
NoNestLoop(table table[ table...])Prohibits a nested loop join.
NoHashJoin(table table[ table...])Prohibits a hash join.
NoMergeJoin(table table[ table...])Prohibits a merge join.
Hints for join orderLeading(table table[ table...])Specifies the join order.
Leading(<join pair>)Specifies the join order and direction.
Hints for row number correctionRows(table table[ table...] correction)Corrects the row number of the join result that is obtained from the specified tables. The following operators are supported: #<n>, + <n>, -<n>, and * <n>. The <n> operator is supported by the strtod function.
Hints for parallel executionParallel(table <# of workers> [soft|hard])Specifies or prohibits the parallel execution of the specified tables. The <worker#> parameter specifies the number of working programs that are required. The value 0 specifies that parallel execution is prohibited.

If you set the third parameter to soft, only the value of the max_parallel_workers_per_gather parameter is changed and the other parameters are specified by the optimizer. If you set the third parameter to hard, the values of all related parameters are changed. The default value of the third parameter is soft.

Hints for GUC parameter settingSet(GUC-param value)Specifies the value of a GUC parameter when the optimizer runs.

For more information, visit the PostgreSQL website.