全部產品
Search
文件中心

MaxCompute:EXPLAIN

更新時間:Feb 28, 2024

在實際開發過程中,通常需要分析查詢語句或表結構來分析效能瓶頸,MaxCompute SQL為您提供explain語句實現此功能。本文為您介紹explain的功能、命令格式及使用樣本。

功能介紹

explain語句可以顯示MaxCompute SQL對應的DML語句執行計畫(執行SQL語義的程式)的結構,協助您瞭解SQL語句的處理過程,為最佳化SQL語句提供協助。一個查詢語句作業會對應多個Job,一個Job對應多個Task。

說明

如果查詢語句足夠複雜,explain的結果較多,超過4 MB則會觸發API的限制,無法得到完整的EXPLAIN結果。此時您可以拆分查詢語句,對各部分分別執行explain語句,以瞭解Job的結構。

命令格式

explain <dml query>;

dml query:必填。select語句,更多資訊請參見SELECT文法

返回說明

explain的執行結果包含如下資訊:

  • Job間的依賴關係

    例如job0 is root job。如果查詢只需要一個Job(job0),只會顯示一行資訊。

  • Task間的依賴關係

    In Job job0:
    root Tasks: M1, M2
    J3_1_2_Stg1 depends on: M1, M2

    job0包含三個Task,M1M2J3_1_2_Stg1。系統會先執行M1M2兩個Task,執行完成後,再執行J3_1_2_Stg1

    Task的命名規則如下:

    • 在MaxCompute中,共有四種Task類型:MapTask、ReduceTask、JoinTask和LocalWork。Task名稱的第一個字母表示了當前Task的類型,例如M2Stg1就是一個MapTask。

    • 緊跟著第一個字母后的數字,代表了當前Task的ID。這個ID在當前查詢對應的所有Task中是唯一的。

    • 用底線(_)分隔的數字代表當前Task的直接依賴,例如J3_1_2_Stg1表示當前Task ID為3,依賴ID為1(M1)和ID為2(M2)的兩個Task。

  • Task中所有Operator的依賴結構。

    Operator串描述了一個Task的執行語義。結構樣本如下:

    In Task M2:
        Data source: mf_mc_bj.sale_detail_jt/sale_date=2013/region=china  # "Data source"描述了當前Task的輸入內容。
        TS: mf_mc_bj.sale_detail_jt/sale_date=2013/region=china           # TableScanOperator
            FIL: ISNOTNULL(customer_id)                                   # FilterOperator
                RS: order: +                                              # ReduceSinkOperator
                    nullDirection: *
                    optimizeOrderBy: False
                    valueDestLimit: 0
                    dist: HASH
                    keys:
                          customer_id
                    values:
                          customer_id (string)
                          total_price (double)
                    partitions:
                          customer_id
    
    
    In Task J3_1_2:
        JOIN:                                                           # JoinOperator
             StreamLineRead1 INNERJOIN StreamLineRead2
             keys:
                 0:customer_id
                 1:customer_id
    
            AGGREGATE: group by:customer_id                            # GroupByOperator
             UDAF: SUM(total_price) (__agg_0_sum)[Complete],SUM(total_price) (__agg_1_sum)[Complete]
                RS: order: +
                    nullDirection: *
                    optimizeOrderBy: True
                    valueDestLimit: 10
                    dist: HASH
                    keys:
                          customer_id
                    values:
                          customer_id (string)
                          __agg_0 (double)
                          __agg_1 (double)
                    partitions:
    
    
    In Task R4_3:
        SEL: customer_id,__agg_0,__agg_1                               # SelectOperator
            LIM:limit 10                                               # LimitOperator
                FS: output: Screen                                     # FileSinkOperator
                    schema:
                      customer_id (string) AS ashop
                      __agg_0 (double) AS ap
                      __agg_1 (double) AS bp

    各Operator的含義如下:

    • TableScanOperator(TS):描述查詢語句中的from語句塊的邏輯。explain結果中會顯示輸入表的名稱(Alias)。

    • SelectOperator(SEL):描述查詢語句中的select語句塊的邏輯。explain結果中會顯示向下一個Operator傳遞的列,多個列由逗號分隔。

      • 如果是列的引用,則顯示為<alias>.<column_name>

      • 如果是運算式的結果,則顯示為函數形式,例如func1(arg1_1, arg1_2, func2(arg2_1, arg2_2))

      • 如果是常量,則直接顯示常量值。

    • FilterOperator(FIL):描述查詢語句中的where語句塊的邏輯。explain結果中會顯示一個where條件運算式,形式類似SelectOperator的顯示規則。

    • JoinOperator(JOIN):描述查詢語句中的join語句塊的邏輯。explain結果中會顯示哪些表以哪種方式Join在一起。

    • GroupByOperator(例如AGGREGATE):描述彙總操作的邏輯。如果查詢中使用了彙總函式,就會出現該結構,explain結果中會顯示彙總函式的內容。

    • ReduceSinkOperator(RS):描述Task間資料分發操作的邏輯。如果當前Task的結果會傳遞給另一個Task,則必然需要在當前Task的最後,使用ReduceSinkOperator執行資料分發操作。explain的結果中會顯示輸出結果的排序方式、分發的Key、Value以及用來求Hash值的列。

    • FileSinkOperator(FS):描述最終資料的儲存操作。如果查詢中有insert語句塊,explain結果中會顯示目標表名稱。

    • LimitOperator(LIM):描述查詢語句中的limit語句塊的邏輯。explain結果中會顯示limit數。

    • MapjoinOperator(HASHJOIN):類似JoinOperator,描述大表的join操作。

樣本資料

為便於理解,本文為您提供來源資料,基於來源資料提供相關樣本。建立表sale_detail和sale_detail_jt,並添加資料,命令樣本如下:

--建立分區表sale_detail和sale_detail_jt。
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);
create table if not exists sale_detail_jt
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

--向源表增加分區。
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
alter table sale_detail_jt add partition (sale_date='2013', region='china');

--向源表追加資料。
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
insert into sale_detail_jt partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2);

查詢表sale_detail和sale_detail_jt中的資料,命令樣本如下:
select * from sale_detail;
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
| null       | c5          | NULL        | 2014       | shanghai   |
| s6         | c6          | 100.4       | 2014       | shanghai   |
| s7         | c7          | 100.5       | 2014       | shanghai   |
+------------+-------------+-------------+------------+------------+
select * from sale_detail_jt;
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s5         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

--建立做關聯的表。
SET odps.sql.allow.fullscan=true;
create table shop as select shop_name, customer_id, total_price from sale_detail;

使用樣本

基於樣本資料,執行命令如下。

--查詢語句。
select a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp 
from (select * from sale_detail_jt where sale_date='2013' and region='china') a 
inner join (select * from sale_detail where sale_date='2013' and region='china') b 
on a.customer_id=b.customer_id 
group by a.customer_id 
order by a.customer_id 
limit 10;
--擷取查詢語句語義。
explain 
select a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp 
from (select * from sale_detail_jt where sale_date='2013' and region='china') a 
inner join (select * from sale_detail where sale_date='2013' and region='china') b 
on a.customer_id=b.customer_id 
group by a.customer_id 
order by a.customer_id 
limit 10;

返回結果如下。

job0 is root job

In Job job0:
root Tasks: M1

In Task M1_U0:
    TS: doc_test_dev.sale_detail_jt/sale_date=2013/region=china
        FIL: ISNOTNULL(customer_id)
            HASHJOIN:
                     Filter1 INNERJOIN Filter2
                     keys:
                         0:customer_id
                         1:customer_id
                     non-equals:
                         0:
                         1:
                     bigTable: Filter1

                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp


In Task M1_U1:
    TS: doc_test_dev.sale_detail/sale_date=2013/region=china
        FIL: ISNOTNULL(customer_id)
            HASHJOIN:
                     Filter1 INNERJOIN Filter2
                     keys:
                         0:customer_id
                         1:customer_id
                     non-equals:
                         0:
                         1:
                     bigTable: Filter1

                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp

基於樣本資料,執行命令如下。

--查詢語句。
select /*+ mapjoin(a) */
       a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp 
 from (select * from sale_detail_jt 
where sale_date='2013' and region='china') a 
inner join (select * from sale_detail where sale_date='2013' and region='china') b 
on a.total_price<b.total_price 
group by a.customer_id 
order by a.customer_id 
limit 10;
--擷取查詢語句語義。
explain 
select /*+ mapjoin(a) */
       a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp 
 from (select * from sale_detail_jt 
where sale_date='2013' and region='china') a 
inner join (select * from sale_detail where sale_date='2013' and region='china') b 
on a.total_price<b.total_price 
group by a.customer_id 
order by a.customer_id 
limit 10;

返回結果如下。

job0 is root job

In Job job0:
root Tasks: M1

In Task M1_U0:
    TS: doc_test_dev.sale_detail_jt/sale_date=2013/region=china
        HASHJOIN:
                 TableScan1 INNERJOIN TableScan2
                 keys:
                     0:
                     1:
                 non-equals:
                     0:
                     1:
                 bigTable: TableScan2

            FIL: LT(total_price,total_price)
                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp


In Task M1_U1:
    TS: doc_test_dev.sale_detail/sale_date=2013/region=china
        HASHJOIN:
                 TableScan1 INNERJOIN TableScan2
                 keys:
                     0:
                     1:
                 non-equals:
                     0:
                     1:
                 bigTable: TableScan2

            FIL: LT(total_price,total_price)
                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp