Use the EXPLAIN statement to display the execution plan of a DML statement in MaxCompute SQL. The plan shows which jobs and tasks run, in what order, and which SQL clauses generate data shuffles — so you can identify performance bottlenecks and optimize queries.
The execution plan is structured in three levels: jobs, tasks within each job, and operators within each task.
For complex queries, the EXPLAIN output may exceed the row limit of the calling application and be truncated. To work around this, split the query into subqueries and run EXPLAIN on each one separately.
Syntax
explain <dml_query>;dml_query: Required. A SELECT statement. For the full syntax, see SELECT syntax.
Output structure
The EXPLAIN output contains three sections.
Job dependencies
Shows the dependency relationships between jobs. A query with a single job displays one line:
job0 is root jobTask dependencies
Shows, within each job, which tasks can run in parallel and which must wait for others to complete:
In Job job0:
root Tasks: M1_Stg1, M2_Stg1
J3_1_2_Stg1 depends on: M1_Stg1, M2_Stg1In this example, M1_Stg1 and M2_Stg1 run in parallel. J3_1_2_Stg1 runs after both complete.
Task naming conventions
Task names encode their type, ID, and dependencies:
| Component | Position | Meaning | Example |
|---|---|---|---|
| Type prefix | First character | Task type: M (map), R (reduce), J (join), local work | M in M1_Stg1 |
| Task ID | Digit(s) after prefix | Unique ID within the query | 1 in M1_Stg1 |
| Dependencies | Underscore-separated digits | IDs of tasks this task depends on | J3_1_2_Stg1 → task 3 depends on tasks 1 and 2 |
Operator chain
Shows the execution logic within each task as a tree of operators. Each operator corresponds to a clause or operation in the original SQL statement.
In Task M2:
Data source: mf_mc_bj.sale_detail_jt/sale_date=2013/region=china
TS: mf_mc_bj.sale_detail_jt/sale_date=2013/region=china
FIL: ISNOTNULL(customer_id)
RS: order: +
nullDirection: *
optimizeOrderBy: False
valueDestLimit: 0
dist: HASH
keys:
customer_id
values:
customer_id (string)
total_price (double)
partitions:
customer_id
In Task J3_1_2:
JOIN:
StreamLineRead1 INNERJOIN StreamLineRead2
keys:
0:customer_id
1:customer_id
AGGREGATE: group by:customer_id
UDAF: SUM(total_price) (__agg_0_sum)[Complete],SUM(total_price) (__agg_1_sum)[Complete]
RS: order: +
nullDirection: *
optimizeOrderBy: True
valueDestLimit: 10
dist: HASH
keys:
customer_id
values:
customer_id (string)
__agg_0 (double)
__agg_1 (double)
partitions:
In Task R4_3:
SEL: customer_id,__agg_0,__agg_1
LIM:limit 10
FS: output: Screen
schema:
customer_id (string) AS ashop
__agg_0 (double) AS ap
__agg_1 (double) AS bpOperator reference
| Abbreviation | Full name | What it shows |
|---|---|---|
| TS | TableScanOperator | The FROM clause — the input table alias |
| SEL | SelectOperator | The SELECT clause — columns passed to the next operator. A column appears as <alias>.<column_name>, an expression as a nested function list, and a constant as its value. |
| FIL | FilterOperator | The WHERE clause — the filter expression |
| JOIN | JoinOperator | The JOIN clause — which tables are joined and the join type |
| AGGREGATE | GroupByOperator | Aggregate operations — appears when an aggregate function is used, showing the function and its arguments |
| RS | ReduceSinkOperator | Data distribution between tasks — the sort order, hash keys, distributed values, and partition columns used at the output stage of a task |
| FS | FileSinkOperator | Final data storage — if the query includes an INSERT statement, shows the target table name |
| LIM | LimitOperator | The LIMIT clause — the maximum number of rows returned |
| HASHJOIN | MapjoinOperator | Hash join on large tables — similar to JoinOperator, used when a MapJoin hint applies |
Sample data
The examples in this topic use the following tables.
-- Create two partitioned tables
create table if not exists sale_detail
(
shop_name string,
customer_id string,
total_price double
)
partitioned by (sale_date string, region string);
create table if not exists sale_detail_jt
(
shop_name string,
customer_id string,
total_price double
)
partitioned by (sale_date string, region string);
-- Add partitions
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
alter table sale_detail_jt add partition (sale_date='2013', region='china');
-- Insert sample data
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
insert into sale_detail_jt partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2);The tables contain:
select * from sale_detail;
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+
select * from sale_detail_jt;
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s5 | c2 | 100.2 | 2013 | china |
+------------+-------------+-------------+------------+------------+-- Create the join target table
create table shop as select shop_name, customer_id, total_price from sale_detail;Examples
Example 1: Inner join with hash distribution
This example shows how EXPLAIN represents a standard inner join. The plan uses HASHJOIN and collapses the query into a single job with two union tasks — one scanning each input table.
-- Run the query
select a.customer_id as ashop, sum(a.total_price) as ap, count(b.total_price) as bp
from (select * from sale_detail_jt where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.customer_id=b.customer_id
group by a.customer_id
order by a.customer_id
limit 10;
-- Get the execution plan
explain
select a.customer_id as ashop, sum(a.total_price) as ap, count(b.total_price) as bp
from (select * from sale_detail_jt where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.customer_id=b.customer_id
group by a.customer_id
order by a.customer_id
limit 10;Output:
job0 is root job
In Job job0:
root Tasks: M1
In Task M1_U0:
TS: doc_test_dev.sale_detail_jt/sale_date=2013/region=china
FIL: ISNOTNULL(customer_id)
HASHJOIN:
Filter1 INNERJOIN Filter2
keys:
0:customer_id
1:customer_id
non-equals:
0:
1:
bigTable: Filter1
LocalSortBy: order: +
nullDirection: *
keys:customer_id
AGGREGATE: group by:customer_id
UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
LIM:limit 10
FS: output: Screen
schema:
customer_id (string) AS ashop
__agg_0 (double) AS ap
__agg_1 (bigint) AS bp
In Task M1_U1:
TS: doc_test_dev.sale_detail/sale_date=2013/region=china
FIL: ISNOTNULL(customer_id)
HASHJOIN:
Filter1 INNERJOIN Filter2
keys:
0:customer_id
1:customer_id
non-equals:
0:
1:
bigTable: Filter1
LocalSortBy: order: +
nullDirection: *
keys:customer_id
AGGREGATE: group by:customer_id
UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
LIM:limit 10
FS: output: Screen
schema:
customer_id (string) AS ashop
__agg_0 (double) AS ap
__agg_1 (bigint) AS bpExample 2: MapJoin hint with a non-equality join condition
This example shows how adding a /*+ mapjoin(a) */ hint and changing the join condition to a non-equality comparison (<) affects the plan. The HASHJOIN operator reflects the hint, and bigTable shifts to TableScan2 (the larger table).
-- Run the query
select /*+ mapjoin(a) */
a.customer_id as ashop, sum(a.total_price) as ap, count(b.total_price) as bp
from (select * from sale_detail_jt
where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.total_price<b.total_price
group by a.customer_id
order by a.customer_id
limit 10;
-- Get the execution plan
explain
select /*+ mapjoin(a) */
a.customer_id as ashop, sum(a.total_price) as ap, count(b.total_price) as bp
from (select * from sale_detail_jt
where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.total_price<b.total_price
group by a.customer_id
order by a.customer_id
limit 10;Output:
job0 is root job
In Job job0:
root Tasks: M1
In Task M1_U0:
TS: doc_test_dev.sale_detail_jt/sale_date=2013/region=china
HASHJOIN:
TableScan1 INNERJOIN TableScan2
keys:
0:
1:
non-equals:
0:
1:
bigTable: TableScan2
FIL: LT(total_price,total_price)
LocalSortBy: order: +
nullDirection: *
keys:customer_id
AGGREGATE: group by:customer_id
UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
LIM:limit 10
FS: output: Screen
schema:
customer_id (string) AS ashop
__agg_0 (double) AS ap
__agg_1 (bigint) AS bp
In Task M1_U1:
TS: doc_test_dev.sale_detail/sale_date=2013/region=china
HASHJOIN:
TableScan1 INNERJOIN TableScan2
keys:
0:
1:
non-equals:
0:
1:
bigTable: TableScan2
FIL: LT(total_price,total_price)
LocalSortBy: order: +
nullDirection: *
keys:customer_id
AGGREGATE: group by:customer_id
UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
LIM:limit 10
FS: output: Screen
schema:
customer_id (string) AS ashop
__agg_0 (double) AS ap
__agg_1 (bigint) AS bp