When a query runs slowly or consumes excessive memory, the execution plan tells you exactly where time and resources are going. EXPLAIN shows the optimizer's planned path for a query before execution. EXPLAIN ANALYZE runs the query and captures actual distributed execution costs—operator-level durations, peak memory usage, and input/output row counts—so you can compare estimates against reality and pinpoint bottlenecks.
Prerequisites
Before you begin, ensure that:
Your AnalyticDB for MySQL cluster is version V3.1.3 or later. To check the version, see How can I view the version of an AnalyticDB for MySQL cluster?. To upgrade, Submit a ticket.
EXPLAIN
EXPLAIN evaluates the planned execution path for an SQL query. The output is an estimate from the optimizer and may not match actual execution results.
Syntax
EXPLAIN (format text) <SELECT statement>;Add (format text) for non-complex queries to improve the readability of the tree hierarchy in the output.
Example
EXPLAIN (format text)
SELECT count(*)
FROM nation, region, customer
WHERE c_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'ASIA';The output is a tree where each node is an operator. The Outputs field at each node shows what columns the operator produces, and Estimates shows the optimizer's row count and data volume forecast, which it uses to determine join ordering and data shuffling strategy.
Output[count(*)]
│ Outputs: [count:bigint]
│ Estimates: {rows: 1 (8B)}
│ count(*) := count
└─ Aggregate(FINAL)
│ Outputs: [count:bigint]
│ Estimates: {rows: 1 (8B)}
│ count := count(`count_1`)
└─ LocalExchange[SINGLE] ()
│ Outputs: [count_0_1:bigint]
│ Estimates: {rows: 1 (8B)}
└─ RemoteExchange[GATHER]
│ Outputs: [count_0_2:bigint]
│ Estimates: {rows: 1 (8B)}
└─ Aggregate(PARTIAL)
│ Outputs: [count_0_4:bigint]
│ Estimates: {rows: 1 (8B)}
│ count_4 := count(*)
└─ InnerJoin[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue_0_6]
│ Outputs: []
│ Estimates: {rows: 302035 (4.61MB)}
│ Distribution: REPLICATED
├─ Project[]
│ │ Outputs: [c_nationkey:integer, $hashvalue:bigint]
│ │ Estimates: {rows: 1500000 (5.72MB)}
│ │ $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`c_nationkey`), 0))
│ └─ RuntimeFilter
│ │ Outputs: [c_nationkey:integer]
│ │ Estimates: {rows: 1500000 (5.72MB)}
│ ├─ TableScan[adb:AdbTableHandle{schema=tpch, tableName=customer, partitionColumnHandles=[c_custkey]}]
│ │ Outputs: [c_nationkey:integer]
│ │ Estimates: {rows: 1500000 (5.72MB)}
│ │ c_nationkey := AdbColumnHandle{columnName=c_nationkey, type=4, isIndexed=true}
│ └─ RuntimeCollect
│ │ Outputs: [n_nationkey:integer]
│ │ Estimates: {rows: 5 (60B)}
│ └─ LocalExchange[ROUND_ROBIN] ()
│ │ Outputs: [n_nationkey:integer]
│ │ Estimates: {rows: 5 (60B)}
│ └─ RuntimeScan
│ Outputs: [n_nationkey:integer]
│ Estimates: {rows: 5 (60B)}
└─ LocalExchange[HASH][$hashvalue_0_6] ("n_nationkey")
│ Outputs: [n_nationkey:integer, $hashvalue_0_6:bigint]
│ Estimates: {rows: 5 (60B)}
└─ Project[]
│ Outputs: [n_nationkey:integer, $hashvalue_0_10:bigint]
│ Estimates: {rows: 5 (60B)}
│ $hashvalue_10 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_nationkey`), 0))
└─ RemoteExchange[REPLICATE]
│ Outputs: [n_nationkey:integer]
│ Estimates: {rows: 5 (60B)}
└─ InnerJoin[(`n_regionkey` = `r_regionkey`)][$hashvalue_0_7, $hashvalue_0_8]
│ Outputs: [n_nationkey:integer]
│ Estimates: {rows: 5 (60B)}
│ Distribution: REPLICATED
├─ Project[]
│ │ Outputs: [n_nationkey:integer, n_regionkey:integer, $hashvalue_0_7:bigint]
│ │ Estimates: {rows: 25 (200B)}
│ │ $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_regionkey`), 0))
│ └─ RuntimeFilter
│ │ Outputs: [n_nationkey:integer, n_regionkey:integer]
│ │ Estimates: {rows: 25 (200B)}
│ ├─ TableScan[adb:AdbTableHandle{schema=tpch, tableName=nation, partitionColumnHandles=[]}]
│ │ Outputs: [n_nationkey:integer, n_regionkey:integer]
│ │ Estimates: {rows: 25 (200B)}
│ │ n_nationkey := AdbColumnHandle{columnName=n_nationkey, type=4, isIndexed=true}
│ │ n_regionkey := AdbColumnHandle{columnName=n_regionkey, type=4, isIndexed=true}
│ └─ RuntimeCollect
│ │ Outputs: [r_regionkey:integer]
│ │ Estimates: {rows: 1 (4B)}
│ └─ LocalExchange[ROUND_ROBIN] ()
│ │ Outputs: [r_regionkey:integer]
│ │ Estimates: {rows: 1 (4B)}
│ └─ RuntimeScan
│ Outputs: [r_regionkey:integer]
│ Estimates: {rows: 1 (4B)}
└─ LocalExchange[HASH][$hashvalue_0_8] ("r_regionkey")
│ Outputs: [r_regionkey:integer, $hashvalue_0_8:bigint]
│ Estimates: {rows: 1 (4B)}
└─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=region, partitionColumnHandles=[]}]
Outputs: [r_regionkey:integer, $hashvalue_0_9:bigint]
Estimates: {rows: 1 (4B)}/{rows: 1 (B)}
$hashvalue_9 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`r_regionkey`), 0))
r_regionkey := AdbColumnHandle{columnName=r_regionkey, type=4, isIndexed=true}| Parameter | Description |
|---|---|
Outputs: [symbol:type] | The output column and data type of each operator. |
Estimates: {rows: %s (%sB)} | The optimizer's estimated row count and data volume for each operator. The optimizer uses these estimates to determine join ordering and data shuffling strategy. |
EXPLAIN ANALYZE
EXPLAIN ANALYZE runs the query and returns actual distributed execution costs—execution durations, peak memory usage, and input/output row counts for each operator and fragment.
Syntax
EXPLAIN ANALYZE <SELECT statement>;Example
EXPLAIN ANALYZE
SELECT count(*)
FROM nation, region, customer
WHERE c_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'ASIA';The output is organized by fragment—each fragment corresponds to a stage of the distributed execution plan. For each operator, you can see actual row counts, memory usage, and wall time alongside the optimizer's estimates. Compare Estimates against Output to identify where the optimizer's model diverges from real execution, and use PeakMemory and WallTime to find the costliest operators.
Fragment 1 [SINGLE]
Output: 1 row (9B), PeakMemory: 178KB, WallTime: 1.00ns, Input: 32 rows (288B); per task: avg.: 32.00 std.dev.: 0.00
Output layout: [count]
Output partitioning: SINGLE []
Aggregate(FINAL)
│ Outputs: [count:bigint]
│ Estimates: {rows: 1 (8B)}
│ Output: 2 rows (18B), PeakMemory: 24B (0.00%), WallTime: 70.39us (0.03%)
│ count := count(`count_1`)
└─ LocalExchange[SINGLE] ()
│ Outputs: [count1:bigint]
│ Estimates: {rows: ? (?)}
│ Output: 64 rows (576B), PeakMemory: 8KB (0.07%), WallTime: 238.69us (0.10%)
└─ RemoteSource[2]
Outputs: [count2:bigint]
Estimates:
Output: 32 rows (288B), PeakMemory: 32KB (0.27%), WallTime: 182.82us (0.08%)
Input avg.: 4.00 rows, Input std.dev.: 264.58%
Fragment 2 [adb:AdbPartitioningHandle{schema=tpch, tableName=customer, dimTable=false, shards=32, tableEngineType=Cstore, partitionColumns=c_custkey, prunedBuckets= empty}]
Output: 32 rows (288B), PeakMemory: 6MB, WallTime: 164.00ns, Input: 1500015 rows (20.03MB); per task: avg.: 500005.00 std.dev.: 21941.36
Output layout: [count4]
Output partitioning: SINGLE []
Aggregate(PARTIAL)
│ Outputs: [count4:bigint]
│ Estimates: {rows: 1 (8B)}
│ Output: 64 rows (576B), PeakMemory: 336B (0.00%), WallTime: 1.01ms (0.42%)
│ count_4 := count(*)
└─ INNER Join[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue6]
│ Outputs: []
│ Estimates: {rows: 302035 (4.61MB)}
│ Output: 300285 rows (210B), PeakMemory: 641KB (5.29%), WallTime: 99.08ms (41.45%)
│ Left (probe) Input avg.: 46875.00 rows, Input std.dev.: 311.24%
│ Right (build) Input avg.: 0.63 rows, Input std.dev.: 264.58%
│ Distribution: REPLICATED
├─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=customer, partitionColumnHandles=[c_custkey]}]
│ Outputs: [c_nationkey:integer, $hashvalue:bigint]
│ Estimates: {rows: 1500000 (5.72MB)}/{rows: 1500000 (5.72MB)}
│ Output: 1500000 rows (20.03MB), PeakMemory: 5MB (44.38%), WallTime: 68.29ms (28.57%)
│ $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`c_nationkey`), 0))
│ c_nationkey := AdbColumnHandle{columnName=c_nationkey, type=4, isIndexed=true}
│ Input: 1500000 rows (7.15MB), Filtered: 0.00%
└─ LocalExchange[HASH][$hashvalue6] ("n_nationkey")
│ Outputs: [n_nationkey:integer, $hashvalue6:bigint]
│ Estimates: {rows: 5 (60B)}
│ Output: 30 rows (420B), PeakMemory: 394KB (3.26%), WallTime: 455.03us (0.19%)
└─ Project[]
│ Outputs: [n_nationkey:integer, $hashvalue10:bigint]
│ Estimates: {rows: 5 (60B)}
│ Output: 15 rows (210B), PeakMemory: 24KB (0.20%), WallTime: 83.61us (0.03%)
│ Input avg.: 0.63 rows, Input std.dev.: 264.58%
│ $hashvalue_10 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_nationkey`), 0))
└─ RemoteSource[3]
Outputs: [n_nationkey:integer]
Estimates:
Output: 15 rows (75B), PeakMemory: 24KB (0.20%), WallTime: 45.97us (0.02%)
Input avg.: 0.63 rows, Input std.dev.: 264.58%
Fragment 3 [adb:AdbPartitioningHandle{schema=tpch, tableName=nation, dimTable=true, shards=32, tableEngineType=Cstore, partitionColumns=, prunedBuckets= empty}]
Output: 5 rows (25B), PeakMemory: 185KB, WallTime: 1.00ns, Input: 26 rows (489B); per task: avg.: 26.00 std.dev.: 0.00
Output layout: [n_nationkey]
Output partitioning: BROADCAST []
INNER Join[(`n_regionkey` = `r_regionkey`)][$hashvalue7, $hashvalue8]
│ Outputs: [n_nationkey:integer]
│ Estimates: {rows: 5 (60B)}
│ Output: 11 rows (64B), PeakMemory: 152KB (1.26%), WallTime: 255.86us (0.11%)
│ Left (probe) Input avg.: 25.00 rows, Input std.dev.: 0.00%
│ Right (build) Input avg.: 0.13 rows, Input std.dev.: 264.58%
│ Distribution: REPLICATED
├─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=nation, partitionColumnHandles=[]}]
│ Outputs: [n_nationkey:integer, n_regionkey:integer, $hashvalue7:bigint]
│ Estimates: {rows: 25 (200B)}/{rows: 25 (200B)}
│ Output: 25 rows (475B), PeakMemory: 16KB (0.13%), WallTime: 178.81us (0.07%)
│ $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_regionkey`), 0))
│ n_nationkey := AdbColumnHandle{columnName=n_nationkey, type=4, isIndexed=true}
│ n_regionkey := AdbColumnHandle{columnName=n_regionkey, type=4, isIndexed=true}
│ Input: 25 rows (250B), Filtered: 0.00%
└─ LocalExchange[HASH][$hashvalue8] ("r_regionkey")
│ Outputs: [r_regionkey:integer, $hashvalue8:bigint]
│ Estimates: {rows: 1 (4B)}
│ Output: 2 rows (28B), PeakMemory: 34KB (0.29%), WallTime: 57.41us (0.02%)
└─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=region, partitionColumnHandles=[]}]
Outputs: [r_regionkey:integer, $hashvalue9:bigint]
Estimates: {rows: 1 (4B)}/{rows: 1 (4B)}
Output: 1 row (14B), PeakMemory: 8KB (0.07%), WallTime: 308.99us (0.13%)
$hashvalue_9 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`r_regionkey`), 0))
r_regionkey := AdbColumnHandle{columnName=r_regionkey, type=4, isIndexed=true}
Input: 1 row (5B), Filtered: 0.00%| Parameter | What it tells you |
|---|---|
Outputs: [symbol:type] | The output column and data type of each operator. |
Estimates: {rows: %s (%sB)} | The optimizer's estimated row count and data volume. Compare against Output to see where estimates diverge from actual execution. |
PeakMemory: %s | Peak memory used by the operator or fragment. Use this to identify the main memory consumer when diagnosing memory bottlenecks. |
WallTime: %s | The total execution duration of operators. This duration is not the actual execution duration due to parallel computing. Use it to analyze the causes for bottlenecks that occur during computing operations. |
Input: %s rows (%sB) | Number of rows and bytes read as input. |
per task: avg.: %s std.dev.: %s | Average input rows per task and the standard deviation across tasks. A high standard deviation indicates data skew—some tasks process far more data than others, which limits parallel efficiency. |
Output: %s row (%sB) | Number of rows and bytes produced as output. |
Diagnose common issues
Filter not pushed down
When a filter cannot be pushed down to the storage layer, the query scans the full table and then applies the filter in memory—significantly increasing I/O and memory usage.
The following two queries illustrate the difference:
SQL 1 — filter on a raw column value (can be pushed down):
SELECT count(*) FROM test WHERE string_test = 'a';SQL 2 — filter wrapped in a function (cannot be pushed down):
SELECT count(*) FROM test WHERE length(string_test) = 1;
Run EXPLAIN ANALYZE on both queries and compare Fragment 2.
SQL 1 output (filter pushed down): The fragment uses the TableScan operator and Input avg. is 0.00 rows, meaning the storage layer filtered out all rows before returning data.
Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}]
Output: 4 rows (36B), PeakMemory: 0B, WallTime: 6.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00
Output layout: [count_0_1]
Output partitioning: SINGLE []
Aggregate(PARTIAL)
│ Outputs: [count_0_1:bigint]
│ Estimates: {rows: 1 (8B)}
│ Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 212.92us (3.99%)
│ count_0_1 := count(*)
└─ TableScan[adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}]
Outputs: []
Estimates: {rows: 4 (0B)}
Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 4.76ms (89.12%)
Input avg.: 0.00 rows, Input std.dev.: ?%SQL 2 output (filter not pushed down): The fragment uses ScanFilterProject instead of TableScan, and Input shows 9999 rows. The filterPredicate field confirms that the filter is applied after the scan, not at the storage layer.
Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}]
Output: 4 rows (36B), PeakMemory: 0B, WallTime: 102.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00
Output layout: [count_0_1]
Output partitioning: SINGLE []
Aggregate(PARTIAL)
│ Outputs: [count_0_1:bigint]
│ Estimates: {rows: 1 (8B)}
│ Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 252.23us (0.12%)
│ count_0_1 := count(*)
└─ ScanFilterProject[table = adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}, filterPredicate = (`test4dmp`.`length`(`string_test`) = BIGINT '1')]
Outputs: []
Estimates: {rows: 9999 (312.47kB)}/{rows: 9999 (312.47kB)}/{rows: ? (?)}
Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 101.31ms (49.84%)
string_test := AdbColumnHandle{columnName=string_test, type=13, isIndexed=true}
Input: 9999 rows (110.32kB), Filtered: 100.00%To diagnose this pattern, look for ScanFilterProject in the output. If Input shows a large row count and Filtered is close to 100%, the query is scanning far more data than it needs. Rewrite the filter to use a form that can be pushed down, or add conditions that narrow the scan.
High memory usage
Use PeakMemory at the fragment level to identify which stage consumes the most memory. Common causes of high PeakMemory include:
Unintended broadcast joins resulting in high memory consumption
Large intermediate join results due to data bloat
A large probe-side (left) table in a join
TableScanreading a large volume of data
After identifying the fragment, use per-operator PeakMemory to pinpoint the specific operator driving consumption. Add filtering conditions based on business requirements to limit the data volume, or restructure the query to reduce intermediate result sizes.