An execution plan is a tree of nodes. Each node within an execution plan represents a single operation such as a table scan, join, aggregation, or sort.

Execution plans must be read from the bottom up because the execution results of each node feed into the node directly above it. The lower-level nodes of an execution plan are typically table scan operations such as sequential scans, index-based scans, or bitmap index-based scans. If a query requires operations such as joins, aggregations, and sorts on rows, these operations are performed by the nodes above the scan nodes. The topmost plan nodes are typically motion nodes that handle motion operations, such as redistribute, broadcast, and gather. These operations move data between compute nodes when queries are processed.

The output of an EXPLAIN statement has one line for each node in an execution plan and shows the node type and the estimates of the execution costs for each plan node.

  • cost: the number of disk page fetches. 1.0 indicates one sequential disk page read. The first estimate is the startup cost of obtaining the first row. The second estimate is the total cost of obtaining all rows.
  • rows: the total number of rows generated by a plan node. This number may be less than the number of rows processed or scanned by the plan node, because of the filter conditions specified in a WHERE clause. The estimate for the topmost node approximates the number of rows returned, updated, or deleted in the query.
  • width: the total bytes of all the rows that the plan node generates.
Note The cost of a node includes the cost of all its child nodes. The topmost plan node has the estimated total execution cost for the plan. This is the number the query optimizer intends to minimize. The cost reflects only the amount of time taken to execute the execution plan in AnalyticDB for PostgreSQL. In particular, the amount of time taken to transmit result rows to the client is not included in the cost.

Sample EXPLAIN statement

The following example demonstrates how to read an execution plan displayed by an EXPLAIN statement:

EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
         QUERY PLAN
------------------------------------------------------------
Gather Motion 4:1 (slice1) (cost=0.00..20.88 rows=1 width=13)
-> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)      
Filter: name::text ~~ 'Joelle'::text

The query optimizer sequentially scans the names table based on the filter conditions specified in the WHERE clause. This means that the query optimizer checks each row but generates only the rows that meet the conditions. The results of the scan operation are passed up to a gather motion. During a gather motion, compute nodes send rows to the coordinator node. In this example, four compute nodes send rows to the coordinator node, which is indicated as a ratio of 4:1. The estimated startup cost for this plan is 00.00 (no cost), and the total cost of disk page fetches is 20.88. The query optimizer estimates that this query returns one row.

EXPLAIN ANALYZE

An EXPLAIN ANALYZE statement displays an execution plan and executes statements. The execution plan displayed by an EXPLAIN ANALYZE statement shows the actual execution cost along with the estimates provided by the query optimizer. In addition, an EXPLAIN ANALYZE statement displays the following information:

  • The total runtime in milliseconds during which the query is executed.
  • The memory used by each slice of an execution plan and the memory reserved for the whole query statement.
  • The number of compute nodes involved in a plan node operation. Only compute nodes that return rows are counted.
  • The maximum number of rows returned by compute nodes. If multiple compute nodes produce an equal number of rows, the EXPLAIN ANALYZE statement displays the number of rows generated by the compute node that takes the longest time to produce the rows.
  • The ID of the compute node that produces the most rows for an operation.
  • The amount of memory required for an operation (work_mem). If the available memory is insufficient to perform an operation, the plan shows the amount of data spilled to disk for the lowest-performing compute node. Example:
    Work_mem used: 64K bytes avg, 64K bytes max (seg0).
    Work_mem wanted: 90K bytes avg, 90K byes max (seg0) to lessen
    workfile I/O affecting 2 workers.
  • The amount of time in milliseconds spent by the compute node that produces the most rows to retrieve the first row, and the amount of time taken for that compute node to retrieve all rows.

The following example demonstrates how to read an execution plan displayed by an EXPLAIN ANALYZE statement. A query referenced in the preceding EXPLAIN statement is used. The bold parts of the plan show actual timing and rows returned for each plan node, along with memory and time statistics for the whole query.

EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';
QUERY PLAN
------------------------------------------------------------Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..20.88 rows=1 width=13)
Rows out: 1 rows at destination with 0.305 ms to first row, 0.537 ms to end, start offset by 0.289 ms.
    -> Seq Scan on names (cost=0.00..20.88 rows=1 width=13)
Rows out: Avg 1 rows x 2 workers. Max 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end, start offset by 0.968 ms.
        Filter: name = 'Joelle'::text
Slice statistics:
    (slice0) Executor memory: 135K bytes.
    (slice1) Executor memory: 151K bytes avg x 2 workers, 151K bytes max (seg0).
Statement statistics: Memory used: 128000K bytes Total runtime: 22.548 ms

The total amount of time used to execute this query is 22.548 milliseconds. The sequential scan operation has only a single compute node (seg0) that returns rows, and this compute node returns only a single row. It takes 0.255 milliseconds to retrieve the first row and another 0.486 milliseconds to scan all rows. The gather motion that compute nodes send data to the coordinator node receives one row. The total amount of time for this operation is 0.537 milliseconds.

Common query operators

  • Scan operators

    Scan operators scan table rows to find a set of rows. The following types of scan operators are supported:

    • Seq scan: scans all rows in a table.
    • Append-only scan: scans rows in append-optimized row-oriented (AORO) tables.
    • Append-only columnar scan: scans rows in append-optimized column-oriented (AOCO) tables.
    • Index scan: traverses a B-tree index to fetch rows from a table.
    • Bitmap append-only row-oriented scan: gathers the pointers of rows in an append-optimized (AO) table from an index and sorts the pointers by location on a disk.
    • Dynamic table scan: uses one of the following functions to choose partitions to scan.
    • The Function Scan node contains the function.
      • gp_partition_expansion: selects all partitions in a table.
      • gp_partition_selection: selects a partition based on an equality expression.
      • gp_partition_inversion: selects partitions based on a range expression.

      The Function Scan node passes the list of dynamically selected partitions to the Result node. Then, the Result node passes the list to the Sequence node.

  • Join operators

    The following types of join operators are supported:

    • Hash join: builds a hash table from a smaller table by using the join column as a hash key, scans a larger table to calculate the hash key for the join column, and then probes the hash table to find the rows with the same hash key. Hash joins are typically the fastest joins in AnalyticDB for PostgreSQL. Hash Cond in the execution plan identifies the columns that are joined.
    • Nested loop join: iterates through rows in a larger table and scans the rows in a smaller table on each iteration. This operator requires the broadcast of one table so that all rows in that table can be connected to all rows in the other table. Nested loop joins perform well for small tables or the tables that are limited by using an index. The performance may be reduced when nested loop joins are used with large tables.
    • Merge join: sorts two tables and merges them together. This operator is fast for pre-ordered data.
  • Motion operators

    Motion operators move rows between compute nodes. The following types of motion operators are supported:

    • Broadcast motion: Each compute node sends its rows to all the other compute nodes so that every compute node has a complete copy of a table. In most cases, a query optimizer selects broadcast motions only for small tables. Broadcast motions are not suitable for large tables. If data is not distributed on the join key, required rows are dynamically redistributed from a table to another compute node.
    • Redistribute motion: Each compute node rehashes data and sends its rows to appropriate compute nodes based on the hash key.
    • Gather motion: Result data from all compute nodes is assembled and then sent to the coordinator node. This is the final operation for most execution plans.
  • Other operators

    The following operators are also supported in execution plans:

    • Materialize: materializes a subselect.
    • InitPlan: indicates a pre-query. This operator is used in dynamic partition elimination and is executed if the system does not know the values of partitions to be scanned by the query optimizer.
    • Sort: sorts rows in preparation for another operation that requires ordered rows, such as aggregation or merge join.
    • Group By: groups rows based on one or more columns.
    • Group/hash aggregate: uses a hash algorithm to aggregate rows.
    • Append: concatenates data sets when rows scanned from partitions in a partitioned table are combined.
    • Filter: selects rows by using the filter conditions specified in a WHERE clause.
    • Limit: limits the number of rows returned.

Query optimizer determination

You can view the output of an EXPLAIN statement to determine whether an execution plan is generated by an ORCA or legacy query optimizer. This information appears at the end of the EXPLAIN statement output. The Settings line displays the settings of the OPTIMIZER parameter. The Optimizer status line displays whether the execution plan is generated by an ORCA or legacy query optimizer.

The following example demonstrates an execution plan generated by a GPORCA query optimizer:

QUERY PLAN
------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..862.00 rows=1 width=4)
   ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..862.00 rows=1 width=4)
         ->  Hash Join  (cost=0.00..862.00 rows=1 width=4)
               Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
               ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
                     Hash Key: partsupp.ps_suppkey
                     ->  Seq Scan on partsupp  (cost=0.00..431.00 rows=1 width=8)
               ->  Hash  (cost=431.00..431.00 rows=1 width=4)
                     ->  Seq Scan on supplier  (cost=0.00..431.00 rows=1 width=4)
 Optimizer: Pivotal Optimizer (GPORCA) version 3.86.0
(10 rows)

The following example demonstrates an execution plan generated by a legacy query optimizer:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Limit  (cost=235.00..251.29 rows=100 width=4)
   ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=235.00..251.29 rows=100 width=4)
         ->  Limit  (cost=235.00..249.29 rows=34 width=4)
               ->  Hash Join  (cost=235.00..6064.00 rows=13600 width=4)
                     Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
                     ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..304.00 rows=2267 width=8)
                           Hash Key: partsupp.ps_suppkey
                           ->  Seq Scan on partsupp  (cost=0.00..168.00 rows=2267 width=8)
                     ->  Hash  (cost=160.00..160.00 rows=2000 width=4)
                           ->  Seq Scan on supplier  (cost=0.00..160.00 rows=2000 width=4)
 Optimizer: Postgres query optimizer
(11 rows)