A query optimizer uses data statistics maintained by a database to choose the query plan with the lowest possible cost. Cost is measured in disk I/O and is shown as the number of disk page fetches. You can use EXPLAIN and EXPLAIN ANALYZE statements to identify and optimize a query plan. The syntax of an EXPLAIN statement is as follows:
EXPLAIN [ANALYZE] [VERBOSE] statement

An EXPLAIN statement displays the cost estimated by a query optimizer for a query plan. Example:

EXPLAIN SELECT * FROM names WHERE id=22; 
An EXPLAIN ANALYZE statement enables the statement to be executed, and displays a query plan and additional information. Such information includes the number of executed rows and the runtime. Example:
EXPLAIN ANALYZE SELECT * FROM names WHERE id=22;

EXPLAIN statement output

A query plan is a tree of nodes. Each node in a query plan represents a single operation such as a table scan, join, aggregation, or sorting. Query plans must be read from the bottom up because each node feeds rows into the node directly above it. The bottom nodes of a query plan are usually table scan operations such as sequential scans, index-based scans, or bitmap index-based scans. If a query requires operations such as join, aggregation, and sorting on rows, there are additional nodes above the scan nodes to perform these operations. The topmost plan nodes are AnalyticDB for PostgreSQL motion nodes: redistribute, broadcast, or gather. These operations move rows between compute nodes during query processing.

The output of an EXPLAIN statement has one line for each node in a plan and shows the node type and the estimates of the execution costs for each plan node:

  • cost: It is measured based on the number of disk page fetches. 1.0 equals one sequential disk page read. The first estimate is the startup cost of getting the first row and the second estimate is the total cost of getting all rows.
  • rows: indicates the total number of rows generated by a plan node. This number is usually less than the number of rows processed or scanned by the plan node, because of the filter criterion specified in a WHERE clause. The estimate for the topmost node approximates the number of rows that the query actually returns, updates, or deletes.
  • width: indicates the total bytes of all the rows that the plan node generates.

Note that:

  • The cost of a node includes the cost of all its child nodes. The topmost plan node has the estimated total execution cost for the plan. This is the number the query optimizer intends to minimize.
  • The cost only reflects the time taken to execute the query plan in AnalyticDB for PostgreSQL. In particular, the cost does not consider the time taken to transmit result rows to the client.

Example EXPLAIN statement

The following example describes how to read a query plan displayed by an EXPLAIN statement:

EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
                     QUERY PLAN
------------------------------------------------------------
Gather Motion 4:1 (slice1) (cost=0.00..20.88 rows=1 width=13)

   -> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
         Filter: name::text ~~ 'Joelle'::text

The query optimizer sequentially scans the names table based on the filter criterion specified in the WHERE clause. This means that the query optimizer checks the criterion for each row it scans and only generates rows that meet the criterion. The results of the scan operation are passed up to a gather motion. A gather motion is that compute nodes send rows to the coordinator node. In this example, four compute nodes send rows to the coordinator node, which is indicated by "4:1". The estimated startup cost for this plan is 00.00 (no cost) and the total cost of disk page fetches is 20.88. The query optimizer estimates that this query will return one row.

An EXPLAIN ANALYZE statement displays a query plan and executes statements. The query plan displayed by an EXPLAIN ANALYZE statement shows the actual execution cost along with the estimates provided by the query optimizer. In addition, an EXPLAIN ANALYZE statement displays the following information:

  • The total runtime (in milliseconds) during which the query is executed.
  • The memory used by each slice of a query plan and the memory reserved for the whole query statement.
  • The number of compute nodes involved in a plan node operation. Only compute nodes that return rows are counted.
  • The maximum number of rows returned by the compute node that produced the most rows for an operation. If multiple compute nodes produce an equal number of rows, the EXPLAIN ANALYZE statement displays the number of rows generated by the compute node that takes the longest time to produce the rows.
  • The ID of the compute node that produces the most rows for an operation.
  • The amount of memory required for an operation (work_mem). If the available memory is insufficient to perform an operation, the plan shows the amount of data spilled to disk for the lowest-performing compute node. Example:
    Work_mem used: 64K bytes avg, 64K bytes max (seg0).
    Work_mem wanted: 90K bytes avg, 90K byes max (seg0) to lessen
    workfile I/O affecting 2 workers.
  • The time (in milliseconds) used by the compute node that produces the most rows to retrieve the first row, and the time taken for that compute node to retrieve all rows.
The following example uses the same query to describe how to read a query plan displayed by an EXPLAIN ANALYZE statement. The bold parts of the plan show actual timing and rows returned for each plan node, along with memory and time statistics for the whole query.
EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';
                     QUERY PLAN
------------------------------------------------------------
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..20.88 rows=1 width=13)
Rows out: 1 rows at destination with 0.305 ms to first row, 0.537 ms to end, start offset by 0.289 ms.
        -> Seq Scan on names (cost=0.00..20.88 rows=1 width=13)
Rows out: Avg 1 rows x 2 workers. Max 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end, start offset by 0.968 ms.
                 Filter: name = 'Joelle'::text
 Slice statistics:

      (slice0) Executor memory: 135K bytes.

    (slice1) Executor memory: 151K bytes avg x 2 workers, 151K bytes max (seg0).

Statement statistics: Memory used: 128000K bytes Total runtime: 22.548 ms
The total time to run this query is 22.548 milliseconds. The sequential scan operation only has one compute node (seg0) that returns rows, and this compute node only returns one row. It takes 0.255 milliseconds to retrieve the first row and 0.486 milliseconds to scan all rows. The gather motion (compute nodes sending data to the coordinator node) receives one row. The total time for this operation is 0.537 milliseconds.

Common query operators

Scan operators scan rows in a table to find a set of rows. There are the following scan operators:
  • Seq Scan: scans all rows in a table.
  • Append-only Scan: scans rows in append-optimized row-oriented tables.
  • Append-only Columnar Scan: scans rows in append-optimized column-oriented tables.
  • Index Scan: traverses a B-tree index to fetch rows from a table.
  • Bitmap Append-only Row-oriented Scan: gathers the pointers of rows in an append-optimized table from an index and sorts the pointers by location on a disk.
  • Dynamic Table Scan: uses one of the following functions to choose partitions to scan. The Function Scan node contains the function.
    • gp_partition_expansion: selects all partitions in a table.
    • gp_partition_selection: selects a partition based on an equality expression.
    • gp_partition_inversion: selects partitions based on a range expression.
    The Function Scan node passes the list of dynamically selected partitions to the Result node which then passes the list to the Sequence node.

Join operators include:

  • Hash Join: builds a hash table from a smaller table with the join column as a hash key, scans a larger table to calculate the hash key for the join column, and probes the hash table to find the rows with the same hash key. Hash joins are typically the fastest joins in AnalyticDB for PostgreSQL. Hash Cond in the query plan identifies the columns that are joined.
  • Nested Loop Join: iterates through rows in a larger table and scans the rows in a smaller table on each iteration. This operator requires the broadcast of one table so that all rows in that table can be compared to all rows in the other table. Nested Loop Join performs well for small tables or the tables that are limited by using an index. There are performance implications when Nested Loop Join is used with large tables. Set the enable_nestloop parameter to OFF (default value) to make a query optimizer favor Hash Join.
  • Merge Join: sorts two tables and merges them together. This operator is fast for pre-ordered data. To make a query optimizer favor Merge Join, set the enable_mergejoin parameter to ON.
Motion operators move rows between compute nodes. There are the following motion operators:
  • Broadcast motion: Each compute node sends its rows to all the other compute nodes so that every compute node has a complete copy of a table. In most cases, a query optimizer only selects a Broadcast motion for small tables. The Broadcast motion is not suitable for large tables. If data is not distributed on the join key, required rows are dynamically redistributed from one of the tables to another compute node.
  • Redistribute motion: Each compute node rehashes data and sends its rows to appropriate compute nodes based on the hash key.
  • Gather motion: Result data from all compute nodes is assembled and then sent to the coordinator node. This is the final operation for most query plans.
Other operators that occur in query plans include:
  • Materialize: materializes a subselect.
  • InitPlan: indicates a pre-query. This operator is used in dynamic partition elimination and is executed if the system does not know the values of partitions to be scanned by the query optimizer.
  • Sort: sorts rows in preparation for another operation that requires ordered rows, such as Aggregation or Merge Join.
  • Group By: groups rows by one or more columns.
  • Group/Hash Aggregate: uses a hash algorithm to aggregate rows.
  • Append: concatenates data sets when rows scanned from partitions in a partitioned table are combined.
  • Filter: selects rows by using the filter criterion specified in a WHERE clause.
  • Limit: limits the number of rows returned.

Query optimizer determination

You can view EXPLAIN statement outputs to determine if an ORCA or legacy query optimizer is used to generate a query plan. This information appears at the end of an EXPLAIN statement output. The Settings line displays the setting of the OPTIMIZER parameter. The Optimizer status line displays whether an ORCA or legacy query optimizer generates the query plan.

The following EXPLAIN statement output shows that the query plan is generated by a GPORCA query optimizer.
                       QUERY PLAN
------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..296.14 rows=1 width=8)
   ->  Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..295.10 rows=1 width=8)
         ->  Aggregate  (cost=0.00..294.10 rows=1 width=8)
               ->  Table Scan on part  (cost=0.00..97.69 rows=100040 width=1)
 Settings:  optimizer=on
 Optimizer status: PQO version 1.609
(5 rows)
explain select count(*) from part;
The following EXPLAIN statement output shows that the query plan is generated by a legacy query optimizer.
                       QUERY PLAN
----------------------------------------------------------------------------------------
 Aggregate  (cost=3519.05..3519.06 rows=1 width=8)
   ->  Gather Motion 2:1  (slice1; segments: 2)  (cost=3518.99..3519.03 rows=1 width=8)
         ->  Aggregate  (cost=3518.99..3519.00 rows=1 width=8)
               ->  Seq Scan on part  (cost=0.00..3018.79 rows=100040 width=1)
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(5 rows)