使用GROUPING SETS CUBE與ROLLUP實現多維彙總分析-雲原生MaxCompute MaxCompute-阿里雲 - MaxCompute

在多維度資料彙總分析中，需要對A列、B列及A、B兩列進行彙總。您可以使用grouping sets解決此問題。本文為您介紹如何使用grouping sets進行多維彙總。

功能介紹

grouping sets是對select語句中group by子句的擴充，允許您採用多種方式對結果分組，而不必使用多個select語句再union all來實現。這樣能夠使MaxCompute的引擎給出更有效執行計畫，從而提高執行效能。

與grouping sets相關聯的文法如下。

類型	說明
`cube`	特殊的`grouping sets`，將指定列的所有可能組合作為`grouping sets`，也可以與`grouping sets`組合使用。 `group by cube (a, b, c) --等效於以下語句。 grouping sets ((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),()) group by cube ((a,b),(c,d)) --等效於以下語句。 grouping sets ( (a,b,c,d), (a,b), (c,d), () ) group by a, cube (b,c), grouping sets ((d),(e)) --等效於以下語句。 group by grouping sets ( (a, b, c, d), (a, b, c, e), (a, b, d), (a, b, e), (a, c, d), (a, c, e), (a, d), (a, e) )`
`rollup`	特殊的`grouping sets`，以按層級彙總的方式產生`grouping sets`，也可以與`grouping sets`組合使用。 `group by rollup (a,b,c) --等效價於以下語句。 grouping sets ((a,b,c),(a,b),(a),()) group by rollup (a,(b,c),d) --等效於以下語句。 grouping sets ( (a,b,c,d), (a,b,c), (a), () ) group by grouping sets((b),(c),rollup(a,b,c)) --等效於以下語句。 group by grouping sets ( (b),(c),(a,b,c),(a,b),(a),() )`
`grouping`	`grouping`用於區分佔位符NULL與資料中的NULL。接受一個列名作為參數，如果結果行使用了參數列做彙總，返回0，否則返回1。
`grouping_id`	接受一個或多個列名作為參數。結果是將參數列的`grouping`結果按照`Bitmap`的方式組成整數。
`grouping__id`	`grouping__id`不帶參數，用於相容Hive查詢。等價於`grouping_id（group by參數列表）`，參數與`group by`的順序一致。說明 MaxCompute和Hive 2.3.0及以上版本相容該函數，但在Hive 2.3.0以下版本中輸出不一致，因此並不推薦使用此函數。

GROUPING SETS使用樣本

grouping sets使用樣本如下：

準備資料。

create table requests lifecycle 20 as 
select * from values 
    (1, 'windows', 'PC', 'Beijing'),
    (2, 'windows', 'PC', 'Shijiazhuang'),
    (3, 'linux', 'Phone', 'Beijing'),
    (4, 'windows', 'PC', 'Beijing'),
    (5, 'ios', 'Phone', 'Shijiazhuang'),
    (6, 'linux', 'PC', 'Beijing'),
    (7, 'windows', 'Phone', 'Shijiazhuang') 
as t(id, os, device, city);

對資料進行分組。您可以通過如下兩種方式進行分組：

使用多個select語句進行分組。

select NULL, NULL, NULL, count(*)
from requests
union all
select os, device, NULL, count(*)
from requests group by os, device
union all
select null, null, city, count(*)
from requests group by city;

使用grouping sets進行分組。

select os,device, city ,count(*)
from requests
group by grouping sets((os, device), (city), ());

返回結果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | NULL       | Beijing    | 4          |
| NULL       | NULL       | Shijiazhuang | 3          |
| ios        | Phone      | NULL       | 1          |
| linux      | PC         | NULL       | 1          |
| linux      | Phone      | NULL       | 1          |
| windows    | PC         | NULL       | 3          |
| windows    | Phone      | NULL       | 1          |
+------------+------------+------------+------------+

說明

分組集中不使用的運算式，會使用NULL充當預留位置，使得這些結果集可以做操作。例如結果第4~8行的city列。

CUBE | ROLLUP使用樣本

基於grouping sets樣本表，cube及rollup使用樣本如下：

樣本1：通過cube枚舉os、device、city的所有可能列為grouping sets。

select os,device, city, count(*)
from requests 
group by cube (os, device, city);
--等效於如下語句。
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());

返回結果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | NULL       | Beijing    | 4          |
| NULL       | NULL       | Shijiazhuang | 3          |
| NULL       | PC         | NULL       | 4          |
| NULL       | PC         | Beijing    | 3          |
| NULL       | PC         | Shijiazhuang | 1          |
| NULL       | Phone      | NULL       | 3          |
| NULL       | Phone      | Beijing    | 1          |
| NULL       | Phone      | Shijiazhuang | 2          |
| ios        | NULL       | NULL       | 1          |
| ios        | NULL       | Shijiazhuang | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | NULL       | Beijing    | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | NULL       | Beijing    | 2          |
| windows    | NULL       | Shijiazhuang | 2          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

樣本2：通過cube枚舉（os, device）,（device, city）所有可能列為grouping sets。

select os,device, city, count(*) 
from requests 
group by cube ((os, device), (device, city));
--等效於如下語句。
select os,device, city, count(*) 
from requests 
group by grouping sets ((os, device, city),(os, device),(device,city),());

返回結果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | PC         | Beijing    | 3          |
| NULL       | PC         | Shijiazhuang | 1          |
| NULL       | Phone      | Beijing    | 1          |
| NULL       | Phone      | Shijiazhuang | 2          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

樣本3：通過rollup對os、device、city以按層級彙總的方式產生grouping sets。

select os,device, city, count(*)
from requests 
group by rollup (os, device, city);
--等效於如下語句。
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());

返回結果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| ios        | NULL       | NULL       | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

樣本4：通過rollup對os, (os,device), city以按層級彙總的方式產生grouping sets。

select os,device, city, count(*)
from requests 
group by rollup (os, (os,device), city);
--等效於如下語句。
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());

返回結果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| ios        | NULL       | NULL       | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

樣本5：通過group by、cube、grouping sets組合產生grouping sets。

select os,device, city, count(*)
from requests 
group by os, cube(os,device), grouping sets(city);
--等效於如下語句。
select os,device, city, count(*)
from requests 
group by grouping sets((os,device,city),(os,city),(os,device,city));

返回結果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| ios        | NULL       | Shijiazhuang | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | Beijing    | 2          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | Beijing    | 2          |
| windows    | NULL       | Shijiazhuang | 2          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

GROUPING | GROUPING_ID使用樣本

grouping及grouping_id使用樣本如下：

select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid 
from values (1,2,3) as t(a,b,c)
group by cube(a,b,c);

返回結果如下：

+------------+------------+------------+------------+------------+------------+------------+------------+
| a          | b          | c          | _c3        | ga         | gb         | gc         | groupingid |
+------------+------------+------------+------------+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 1          | 1          | 1          | 1          | 7          |
| NULL       | NULL       | 3          | 1          | 1          | 1          | 0          | 6          |
| NULL       | 2          | NULL       | 1          | 1          | 0          | 1          | 5          |
| NULL       | 2          | 3          | 1          | 1          | 0          | 0          | 4          |
| 1          | NULL       | NULL       | 1          | 0          | 1          | 1          | 3          |
| 1          | NULL       | 3          | 1          | 0          | 1          | 0          | 2          |
| 1          | 2          | NULL       | 1          | 0          | 0          | 1          | 1          |
| 1          | 2          | 3          | 1          | 0          | 0          | 0          | 0          |
+------------+------------+------------+------------+------------+------------+------------+------------+

預設情況，group by列表中未使用的列會被填充為NULL。您可以通過grouping輸出更有實際意義的值。基於grouping sets樣本表，命令樣本如下：

select
  if(grouping(os) == 0, os, 'ALL') as os,
  if(grouping(device) == 0, device, 'ALL') as device,
  if(grouping(city) == 0, city, 'ALL') as city, 
  count(*) as count 
from requests 
group by os, device, city grouping sets((os, device), (city), ());

返回結果如下：

+------------+------------+------------+------------+
| os         | device     | city       | count      |
+------------+------------+------------+------------+
| ALL        | ALL        | ALL        | 7          |
| ALL        | ALL        | Beijing    | 4          |
| ALL        | ALL        | Shijiazhuang | 3          |
| ios        | Phone      | ALL        | 1          |
| linux      | PC         | ALL        | 1          |
| linux      | Phone      | ALL        | 1          |
| windows    | PC         | ALL        | 3          |
| windows    | Phone      | ALL        | 1          |
+------------+------------+------------+------------+

GROUPING__ID使用樣本

grouping__id不帶參數的命令樣本如下：

set odps.sql.hive.compatible=true;
select      
a, b, c, count(*), grouping__id 
from values (1,2,3) as t(a,b,c) 
group by a, b, c grouping sets ((a,b,c), (a));
--等效於如下語句。
select      
a, b, c, count(*), grouping_id(a,b,c)  
from values (1,2,3) as t(a,b,c) 
group by a, b, c grouping sets ((a,b,c), (a));

返回結果如下：

+------------+------------+------------+------------+------------+
| a          | b          | c          | _c3        | _c4        |
+------------+------------+------------+------------+------------+
| 1          | NULL       | NULL       | 1          | 3          |
| 1          | 2          | 3          | 1          | 0          |
+------------+------------+------------+------------+------------+