The query optimizer selects a query plan by minimizing cost — a measure of disk I/O expressed as the number of disk page reads. Use EXPLAIN and EXPLAIN ANALYZE to inspect and optimize query plans.
Syntax:
EXPLAIN [ANALYZE] [VERBOSE] statementHow EXPLAIN and EXPLAIN ANALYZE differ
| Statement | What it does |
|---|---|
EXPLAIN | Shows the estimated cost for each node in the query plan without executing the query |
EXPLAIN ANALYZE | Executes the query and shows both estimated and actual costs, including runtime and row counts |
Example — estimated plan only:
EXPLAIN SELECT * FROM names WHERE id=22;Example — with actual execution data:
EXPLAIN ANALYZE SELECT * FROM names WHERE id=22;Read an EXPLAIN output
A query plan is a tree of nodes. Each node represents one operation: a table scan, join, aggregation, or sort. Read the plan from the bottom up — each node feeds rows into the node directly above it.
Bottom nodes are scan operations (sequential scans, index scans, or bitmap index scans). Upper nodes handle joins, aggregations, and sorts. Topmost nodes are always AnalyticDB for PostgreSQL motion nodes — redistribute, broadcast, or gather — which move rows between compute nodes.
Each node in the output shows three cost estimates:
| Field | Description |
|---|---|
cost | Estimated disk I/O in disk page reads. Format: startup_cost..total_cost. Startup cost is the time to return the first row; total cost covers all rows. 1.0 equals one sequential disk page read. |
rows | Estimated number of rows the node outputs. Usually less than rows scanned due to WHERE clause filtering. For the topmost node, this approximates the rows the query returns, updates, or deletes. |
width | Estimated total bytes of all output rows. |
Two important rules for reading costs:
A node's cost includes the costs of all child nodes. The topmost node shows the plan's total estimated cost — the number the optimizer minimizes.
Cost reflects only execution time within AnalyticDB for PostgreSQL. It does not include time to transmit result rows to the client.
Walk through an EXPLAIN example
EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
QUERY PLAN
------------------------------------------------------------
Gather Motion 4:1 (slice1) (cost=0.00..20.88 rows=1 width=13)
-> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
Filter: name::text ~~ 'Joelle'::textRead bottom to top:
Seq Scan on names — the optimizer scans the entire table and applies the filter
name = 'Joelle'to each row. Only matching rows are passed up.Gather Motion 4:1 — all 4 compute nodes send their rows to the coordinator node (
4:1means 4 senders, 1 receiver). Startup cost is 0.00 (no setup overhead); total cost is 20.88 disk page reads. The optimizer estimates this query returns 1 row.
Walk through an EXPLAIN ANALYZE example
EXPLAIN ANALYZE executes the query and augments each node with actual row counts and timing. The output below shows the same query with real execution data (shown in bold):
EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';
QUERY PLAN
------------------------------------------------------------
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..20.88 rows=1 width=13)
**Rows out: 1 rows at destination with 0.305 ms to first row, 0.537 ms to end, start offset by 0.289 ms.**
-> Seq Scan on names (cost=0.00..20.88 rows=1 width=13)
**Rows out: Avg 1 rows x 2 workers. Max 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end, start offset by 0.968 ms.**
Filter: name = 'Joelle'::text
Slice statistics:
(slice0) Executor memory: 135K bytes.
(slice1) Executor memory: 151K bytes avg x 2 workers, 151K bytes max (seg0).
**Statement statistics: Memory used: 128000K bytes Total runtime: 22.548 ms**Read bottom to top:
Seq Scan (seg0): Only seg0 returned rows — 1 row. It took 0.255 ms to retrieve the first row and 0.486 ms to scan all rows. This matches the optimizer's estimate of 1 row.
Gather Motion: The coordinator received 1 row. The total time for this operation was 0.537 ms.
Total runtime: 22.548 ms.
In addition to estimated costs, EXPLAIN ANALYZE reports:
Total runtime — wall-clock time for the full query execution, in milliseconds.
Memory per slice — executor memory used by each query plan slice, and memory reserved for the whole statement.
Compute nodes involved — only nodes that returned rows are counted.
Maximum rows returned — the row count from the compute node that produced the most rows. If multiple nodes produce equal rows, the count comes from the node that took the longest.
Compute node ID — the segment ID of the node that produced the most rows.
work_mem usage — memory used for the operation. If available memory was insufficient, the plan shows how much data spilled to disk for the worst-performing node:
Work_mem used: 64K bytes avg, 64K bytes max (seg0). Work_mem wanted: 90K bytes avg, 90K byes max (seg0) to lessen workfile I/O affecting 2 workers.Time to first and last row — for the compute node that produced the most rows: time to retrieve the first row and time to scan all rows.
Query plan operators
Scan operators
| Operator | Description |
|---|---|
| Seq Scan | Scans all rows in a table. |
| Append-only Scan | Scans rows in append-optimized row-oriented tables. |
| Append-only Columnar Scan | Scans rows in append-optimized column-oriented tables. |
| Index Scan | Traverses a B-tree index to fetch rows from a table. |
| Bitmap Append-only Row-oriented Scan | Gathers row pointers from an index on an append-optimized table, then sorts them by disk location before reading. |
| Dynamic Table Scan | Chooses partitions to scan dynamically using one of these functions, contained in a Function Scan node: gp_partition_expansion (all partitions), gp_partition_selection (equality match), or gp_partition_inversion (range match). The Function Scan node passes the selected partitions to the Result node, which passes them to the Sequence node. |
Join operators
| Operator | Description |
|---|---|
| Hash Join | Builds a hash table from the smaller table using the join column as a hash key, then scans the larger table to find matching rows. Typically the fastest join in AnalyticDB for PostgreSQL. The join condition appears as Hash Cond in the plan. |
| Nested Loop Join | Iterates over the larger table and scans the smaller table on each iteration. Requires one table to be broadcast so all its rows are available on every compute node. Performs well for small tables or index-restricted scans; avoid for large tables. To discourage the optimizer from choosing this join, set enable_nestloop to OFF (the default). |
| Merge Join | Sorts both tables and merges them. Fast for pre-sorted data. To have the optimizer prefer this join, set enable_mergejoin to ON. |
Motion operators
Motion operators move rows between compute nodes during query execution.
| Operator | Description |
|---|---|
| Broadcast motion | Each compute node sends its rows to all other compute nodes, so every node holds a complete copy of the table. In most cases, a query optimizer only selects a Broadcast motion for small tables. The Broadcast motion is not suitable for large tables. If data is not distributed on the join key, rows are redistributed dynamically. |
| Redistribute motion | Each compute node rehashes its rows and sends them to the appropriate compute node based on the hash key. |
| Gather motion | All compute nodes send their result rows to the coordinator node. This is the final operation in most query plans. |
Other operators
| Operator | Description |
|---|---|
| Materialize | Materializes a subselect. |
| InitPlan | A pre-query used in dynamic partition elimination, executed when the optimizer does not know which partition values to scan at planning time. |
| Sort | Sorts rows to prepare for an operation that requires ordered input, such as aggregation or merge join. |
| Group By | Groups rows by one or more columns. |
| Group/Hash Aggregate | Aggregates rows using a hash algorithm. |
| Append | Concatenates row sets when scanning partitions from a partitioned table. |
| Filter | Applies the filter condition from a WHERE clause to select matching rows. |
| Limit | Limits the number of rows returned. |
Determine which optimizer generated the plan
The last lines of any EXPLAIN output show which optimizer generated the plan. Settings shows the optimizer parameter value; Optimizer status identifies the optimizer.
GPORCA (when optimizer=on and GPORCA generates the plan):
EXPLAIN SELECT count(*) FROM part;
QUERY PLAN
------------------------------------------------------------------------------------
Aggregate (cost=0.00..296.14 rows=1 width=8)
-> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..295.10 rows=1 width=8)
-> Aggregate (cost=0.00..294.10 rows=1 width=8)
-> Table Scan on part (cost=0.00..97.69 rows=100040 width=1)
Settings: optimizer=on
Optimizer status: PQO version 1.609
(5 rows)Legacy query optimizer (when optimizer=off):
QUERY PLAN
----------------------------------------------------------------------------------------
Aggregate (cost=3519.05..3519.06 rows=1 width=8)
-> Gather Motion 2:1 (slice1; segments: 2) (cost=3518.99..3519.03 rows=1 width=8)
-> Aggregate (cost=3518.99..3519.00 rows=1 width=8)
-> Seq Scan on part (cost=0.00..3018.79 rows=100040 width=1)
Settings: optimizer=off
Optimizer status: legacy query optimizer
(5 rows)