Use EXPLAIN to analyze SQL query execution plans - MaxCompute

Use the EXPLAIN statement to display the execution plan of a DML statement in MaxCompute SQL. The plan shows which jobs and tasks run, in what order, and which SQL clauses generate data shuffles — so you can identify performance bottlenecks and optimize queries.

The execution plan is structured in three levels: jobs, tasks within each job, and operators within each task.

Note

For complex queries, the EXPLAIN output may exceed the row limit of the calling application and be truncated. To work around this, split the query into subqueries and run EXPLAIN on each one separately.

Syntax

explain <dml_query>;

dml_query: Required. A SELECT statement. For the full syntax, see SELECT syntax.

Output structure

The EXPLAIN output contains three sections.

Job dependencies

Shows the dependency relationships between jobs. A query with a single job displays one line:

job0 is root job

Task dependencies

Shows, within each job, which tasks can run in parallel and which must wait for others to complete:

In Job job0:
root Tasks: M1_Stg1, M2_Stg1
J3_1_2_Stg1 depends on: M1_Stg1, M2_Stg1

In this example, M1_Stg1 and M2_Stg1 run in parallel. J3_1_2_Stg1 runs after both complete.

Task naming conventions

Task names encode their type, ID, and dependencies:

Component	Position	Meaning	Example
Type prefix	First character	Task type: M (map), R (reduce), J (join), local work	`M` in `M1_Stg1`
Task ID	Digit(s) after prefix	Unique ID within the query	`1` in `M1_Stg1`
Dependencies	Underscore-separated digits	IDs of tasks this task depends on	`J3_1_2_Stg1` → task 3 depends on tasks 1 and 2

Operator chain

Shows the execution logic within each task as a tree of operators. Each operator corresponds to a clause or operation in the original SQL statement.

In Task M2:
    Data source: mf_mc_bj.sale_detail_jt/sale_date=2013/region=china
    TS: mf_mc_bj.sale_detail_jt/sale_date=2013/region=china
        FIL: ISNOTNULL(customer_id)
            RS: order: +
                nullDirection: *
                optimizeOrderBy: False
                valueDestLimit: 0
                dist: HASH
                keys:
                      customer_id
                values:
                      customer_id (string)
                      total_price (double)
                partitions:
                      customer_id


In Task J3_1_2:
    JOIN:
         StreamLineRead1 INNERJOIN StreamLineRead2
         keys:
             0:customer_id
             1:customer_id

        AGGREGATE: group by:customer_id
         UDAF: SUM(total_price) (__agg_0_sum)[Complete],SUM(total_price) (__agg_1_sum)[Complete]
            RS: order: +
                nullDirection: *
                optimizeOrderBy: True
                valueDestLimit: 10
                dist: HASH
                keys:
                      customer_id
                values:
                      customer_id (string)
                      __agg_0 (double)
                      __agg_1 (double)
                partitions:


In Task R4_3:
    SEL: customer_id,__agg_0,__agg_1
        LIM:limit 10
            FS: output: Screen
                schema:
                  customer_id (string) AS ashop
                  __agg_0 (double) AS ap
                  __agg_1 (double) AS bp

Operator reference

Abbreviation	Full name	What it shows
TS	TableScanOperator	The `FROM` clause — the input table alias
SEL	SelectOperator	The `SELECT` clause — columns passed to the next operator. A column appears as `<alias>.<column_name>`, an expression as a nested function list, and a constant as its value.
FIL	FilterOperator	The `WHERE` clause — the filter expression
JOIN	JoinOperator	The `JOIN` clause — which tables are joined and the join type
AGGREGATE	GroupByOperator	Aggregate operations — appears when an aggregate function is used, showing the function and its arguments
RS	ReduceSinkOperator	Data distribution between tasks — the sort order, hash keys, distributed values, and partition columns used at the output stage of a task
FS	FileSinkOperator	Final data storage — if the query includes an `INSERT` statement, shows the target table name
LIM	LimitOperator	The `LIMIT` clause — the maximum number of rows returned
HASHJOIN	MapjoinOperator	Hash join on large tables — similar to JoinOperator, used when a MapJoin hint applies

Sample data

The examples in this topic use the following tables.

-- Create two partitioned tables
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

create table if not exists sale_detail_jt
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

-- Add partitions
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
alter table sale_detail_jt add partition (sale_date='2013', region='china');

-- Insert sample data
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
insert into sale_detail_jt partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2);

The tables contain:

select * from sale_detail;
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
| null       | c5          | NULL        | 2014       | shanghai   |
| s6         | c6          | 100.4       | 2014       | shanghai   |
| s7         | c7          | 100.5       | 2014       | shanghai   |
+------------+-------------+-------------+------------+------------+

select * from sale_detail_jt;
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s5         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

-- Create the join target table
create table shop as select shop_name, customer_id, total_price from sale_detail;

Examples

Example 1: Inner join with hash distribution

This example shows how EXPLAIN represents a standard inner join. The plan uses HASHJOIN and collapses the query into a single job with two union tasks — one scanning each input table.

-- Run the query
select a.customer_id as ashop, sum(a.total_price) as ap, count(b.total_price) as bp
from (select * from sale_detail_jt where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.customer_id=b.customer_id
group by a.customer_id
order by a.customer_id
limit 10;

-- Get the execution plan
explain
select a.customer_id as ashop, sum(a.total_price) as ap, count(b.total_price) as bp
from (select * from sale_detail_jt where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.customer_id=b.customer_id
group by a.customer_id
order by a.customer_id
limit 10;

Output:

job0 is root job

In Job job0:
root Tasks: M1

In Task M1_U0:
    TS: doc_test_dev.sale_detail_jt/sale_date=2013/region=china
        FIL: ISNOTNULL(customer_id)
            HASHJOIN:
                     Filter1 INNERJOIN Filter2
                     keys:
                         0:customer_id
                         1:customer_id
                     non-equals:
                         0:
                         1:
                     bigTable: Filter1

                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp


In Task M1_U1:
    TS: doc_test_dev.sale_detail/sale_date=2013/region=china
        FIL: ISNOTNULL(customer_id)
            HASHJOIN:
                     Filter1 INNERJOIN Filter2
                     keys:
                         0:customer_id
                         1:customer_id
                     non-equals:
                         0:
                         1:
                     bigTable: Filter1

                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp

Example 2: MapJoin hint with a non-equality join condition

This example shows how adding a /*+ mapjoin(a) */ hint and changing the join condition to a non-equality comparison (<) affects the plan. The HASHJOIN operator reflects the hint, and bigTable shifts to TableScan2 (the larger table).

-- Run the query
select /*+ mapjoin(a) */
       a.customer_id as ashop, sum(a.total_price) as ap, count(b.total_price) as bp
from (select * from sale_detail_jt
where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.total_price<b.total_price
group by a.customer_id
order by a.customer_id
limit 10;

-- Get the execution plan
explain
select /*+ mapjoin(a) */
       a.customer_id as ashop, sum(a.total_price) as ap, count(b.total_price) as bp
from (select * from sale_detail_jt
where sale_date='2013' and region='china') a
inner join (select * from sale_detail where sale_date='2013' and region='china') b
on a.total_price<b.total_price
group by a.customer_id
order by a.customer_id
limit 10;

Output:

job0 is root job

In Job job0:
root Tasks: M1

In Task M1_U0:
    TS: doc_test_dev.sale_detail_jt/sale_date=2013/region=china
        HASHJOIN:
                 TableScan1 INNERJOIN TableScan2
                 keys:
                     0:
                     1:
                 non-equals:
                     0:
                     1:
                 bigTable: TableScan2

            FIL: LT(total_price,total_price)
                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp


In Task M1_U1:
    TS: doc_test_dev.sale_detail/sale_date=2013/region=china
        HASHJOIN:
                 TableScan1 INNERJOIN TableScan2
                 keys:
                     0:
                     1:
                 non-equals:
                     0:
                     1:
                 bigTable: TableScan2

            FIL: LT(total_price,total_price)
                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp