In the actual development process, you usually need to analyze query statements or table schemas to analyze performance bottlenecks. MaxCompute SQL provides the EXPLAIN statement to perform this operation. This topic describes the features, syntax, and examples of the EXPLAIN statement.

Description

The EXPLAIN statement can display the execution plan structure of a DML statement in MaxCompute SQL. This statement helps you understand how an SQL statement is executed and provides guidance for optimizing SQL statements. One query statement corresponds to multiple jobs, and one job corresponds to multiple tasks.

Note If a query statement is complex, the EXPLAIN statement returns a large number of output results. In this case, API limits may be reached, and complete output results may fail to be returned. To address this issue, you can split the query statement into multiple subqueries and execute the EXPLAIN statement on each subquery to obtain the structure of the job.

Syntax

explain <dml query>;

dml query: required. The SELECT statement. For more information, see SELECT syntax.

Execution result

The execution result of the EXPLAIN statement includes the following information:
  • Dependencies between jobs

    For example, job0 is a root job. If a query requires only job0, only one row of data is displayed.

  • Dependencies between tasks
    In Job job0:
    root Tasks: M1_Stg1, M2_Stg1
    J3_1_2_Stg1 depends on: M1_Stg1, M2_Stg1

    job0 contains three tasks, M1_Stg1, M2_Stg1, and J3_1_2_Stg1. MaxCompute executes the J3_1_2_Stg1 task after the M1_Stg1 and M2_Stg1 tasks are executed.

    Naming conventions of tasks:
    • MaxCompute provides four task types: map, reduce, join, and local work. The first letter in a task name indicates the type of the task. For example, M2Stg1 is a map task.
    • The digit that follows the first letter indicates the task ID. This ID is unique among all tasks that correspond to a specific query.
    • Digits separated by underscores (_) represent the direct dependency of a task. For example, J3_1_2_Stg1 indicates that the task with the ID of 3 depends on the M1_Stg1 and M2_Stg1 tasks.
  • Dependencies among all operators in a task
    The operator string describes the execution semantics of a task. Operator string structure:
    In Task M1_Stg1:
      Data source: yudi_2.src                       # Data source describes the input of the task. 
      TS: alias: a                                  # TableScanOperator
          RS: order: +                              # ReduceSinkOperator
              keys:
                   a.value
              values:
                   a.key
              partitions:
                   a.value
    In Task J3_1_2_Stg1:
      JOIN: a INNER JOIN b                          # JoinOperator
          SEL: Abs(UDFToDouble(a._col0)), b._col5   # SelectOperator
              FS: output: None                      # FileSinkOperator
    In Task M2_Stg1:
      Data source: yudi_2.src1
      TS: alias: b
          RS: order: +
              keys:
                   b.value
              values:
                   b.value
              partitions:
                   b.value
    Operator descriptions:
    • TableScanOperator: describes the logic of FROM statement blocks in a query statement. The alias of the input table is displayed in the execution results of the EXPLAIN statement.
    • SelectOperator: describes the logic of SELECT statement blocks in a query statement. The columns that are transferred to the next operator are displayed in the execution results of the EXPLAIN statement. Multiple columns are separated by commas (,).
      • If a column is specified, the value is displayed in the <alias>.<column_name> format.
      • If an expression is specified, the value is displayed as a list of functions, such as func1(arg1_1, arg1_2, func2(arg2_1, arg2_2)).
      • If a constant is specified, the constant value is displayed.
    • FilterOperator: describes the logic of WHERE statement blocks in a query statement. The execution results of the EXPLAIN statement include a WHERE expression, which is in a form that is similar to that of SelectOperator.
    • JoinOperator: describes the logic of JOIN statement blocks in a query statement. The execution results of the EXPLAIN statement show which tables are joined in which way.
    • GroupByOperator: describes the logic of aggregate operations. This operator is displayed if an aggregate function is used in a query. The content of the aggregate function is displayed in the execution results of the EXPLAIN statement.
    • ReduceSinkOperator: describes the logic of data distribution between tasks. If the result of a task is transferred to another task, ReduceSinkOperator must be used to distribute data at the last stage of the task. The output sorting method, the distributed keys, values, and columns that are used to calculate the hash value are displayed in the execution results of the EXPLAIN statement.
    • FileSinkOperator: describes the logic of storage operations on final data records. If INSERT statement blocks are included in a query statement, the destination table name is displayed in the execution results of the EXPLAIN statement.
    • LimitOperator: describes the logic of LIMIT statement blocks in a query statement. The number of returned rows specified in a LIMIT statement block is displayed in the execution results of the EXPLAIN statement.
    • MapjoinOperator: describes JOIN operations on large tables. This operator is similar to JoinOperator.

 
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);
create table if not exists sale_detail_jt
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

 
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
alter table sale_detail_jt add partition (sale_date='2013', region='china');

 
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
insert into sale_detail_jt partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2);

Examples

Execute the following statements based on the sample data:
-- Execute the query statement. 
select a.shop_name as ashop, b.shop_name as bshop from sale_detail_jt a 
       inner join sale_detail b on a.shop_name=b.shop_name;
-- Obtain the execution plan of the query statement. 
explain select a.shop_name as ashop, b.shop_name as bshop from sale_detail_jt a 
       inner join sale_detail b on a.shop_name=b.shop_name;
The following result is returned:
job0 is root job

In Job job0:
root Tasks: M2, M1
J3_1_2 depends on: M1, M2

In Task M2:
    Data source: project_name.sale_detail_jt/sale_date=2013/region=china
    TS: project_name.sale_detail_jt/sale_date=2013/region=china
        FIL: ISNOTNULL(shop_name)
            RS: order: +
                nullDirection: *
                optimizeOrderBy: False
                valueDestLimit: 0
                dist: HASH
                keys:
                      shop_name
                values:
                      shop_name (string)
                partitions:
                      shop_name


In Task J3_1_2:
    JOIN:
         StreamLineRead1 INNERJOIN StreamLineRead2
         keys:
             0:shop_name
             1:shop_name

        FS: output: Screen
            schema:
              shop_name (string) AS ashop
              shop_name (string) AS bshop


In Task M1:
    Data source: project_name.sale_detail/sale_date=2013/region=china, project_name.sale_detail/sale_date=2014/region=shanghai
    TS: project_name.sale_detail/sale_date=2013/region=china, project_name.sale_detail/sale_date=2014/region=shanghai
        FIL: ISNOTNULL(shop_name)
            RS: order: +
                nullDirection: *
                optimizeOrderBy: False
                valueDestLimit: 0
                dist: HASH
                keys:
                      shop_name
                values:
                      shop_name (string)
                partitions:
                      shop_name