In multidimensional data aggregation and analysis, if you want to aggregate Column a, aggregate Column b, and also aggregate Column a and Column b, you can use GROUPING SETS to perform these operations. This topic describes how to use GROUPING SETS for multidimensional aggregation.
Description
GROUPING SETS is an extension of a GROUP BY clause in a SELECT statement. The GROUPING SETS clause allows you to group your results in multiple ways, without the need to execute multiple SELECT statements and UNION ALL in sequence. This allows MaxCompute to generate more efficient execution plans with higher performance.
The following table describes the syntax that is associated with GROUPING SETS.
Type | Description |
| A special form of |
| A special form of |
|
|
|
|
|
Note If you use Hive 2.3.0 or later, we recommend that you use this function in MaxCompute. If you use a Hive version earlier than 2.3.0, we recommend that you do not use this function in MaxCompute. |
Examples
Example for using GROUPING SETS:
Prepare data.
create table requests lifecycle 20 as select * from values (1, 'windows', 'PC', 'Beijing'), (2, 'windows', 'PC', 'Shijiazhuang'), (3, 'linux', 'Phone', 'Beijing'), (4, 'windows', 'PC', 'Beijing'), (5, 'ios', 'Phone', 'Shijiazhuang'), (6, 'linux', 'PC', 'Beijing'), (7, 'windows', 'Phone', 'Shijiazhuang') as t(id, os, device, city);Use one of the following methods to group data:
Execute multiple
SELECTstatements to group data.select NULL, NULL, NULL, count(*) from requests union all select os, device, NULL, count(*) from requests group by os, device union all select null, null, city, count(*) from requests group by city;Use
GROUPING SETSto group data.select os,device, city ,count(*) from requests group by grouping sets((os, device), (city), ());The following result is returned:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | NULL | Beijing | 4 | | NULL | NULL | Shijiazhuang | 3 | | ios | Phone | NULL | 1 | | linux | PC | NULL | 1 | | linux | Phone | NULL | 1 | | windows | PC | NULL | 3 | | windows | Phone | NULL | 1 | +------------+------------+------------+------------+
NoteIf some expressions are not used in GROUPING SETS, NULL is used as placeholders for these expressions, for example, NULL of the city column in the fourth row to the eighth row. This way, you can perform operations on the result sets.
Examples for using CUBE or ROLLUP
Examples for using CUBE or ROLLUP based on the syntax of GROUPING SETS:
Example 1: Use
CUBEto list all the possible columnsos,device, andcityasgrouping sets.select os,device, city, count(*) from requests group by cube (os, device, city); -- The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());The following result is returned:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | NULL | Beijing | 4 | | NULL | NULL | Shijiazhuang | 3 | | NULL | PC | NULL | 4 | | NULL | PC | Beijing | 3 | | NULL | PC | Shijiazhuang | 1 | | NULL | Phone | NULL | 3 | | NULL | Phone | Beijing | 1 | | NULL | Phone | Shijiazhuang | 2 | | ios | NULL | NULL | 1 | | ios | NULL | Shijiazhuang | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | NULL | Beijing | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | NULL | Beijing | 2 | | windows | NULL | Shijiazhuang | 2 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+Example 2: Use
CUBEto list all the possible combinations of columns,(os, device)and(device, city), asgrouping sets.select os,device, city, count(*) from requests group by cube ((os, device), (device, city)); -- The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(device,city),());The following result is returned:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | PC | Beijing | 3 | | NULL | PC | Shijiazhuang | 1 | | NULL | Phone | Beijing | 1 | | NULL | Phone | Shijiazhuang | 2 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+Example 3: Use
ROLLUPto aggregate theos,device, andcitycolumns in a hierarchical manner to generate multiplegrouping sets.select os,device, city, count(*) from requests group by rollup (os, device, city); -- The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os),());The following result is returned:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | ios | NULL | NULL | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+Example 4: Use
ROLLUPto aggregateos,(os,device), andcityin a hierarchical manner to generate multiplegrouping sets.select os,device, city, count(*) from requests group by rollup (os, (os,device), city); -- The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os),());The following result is returned:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | ios | NULL | NULL | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+Example 5: Use
GROUP BY,CUBE, andGROUPING SETSto generate multiplegrouping sets.select os,device, city, count(*) from requests group by os, cube(os,device), grouping sets(city); -- The preceding statement is equivalent to the following statement: select os,device, city, count(*) from requests group by grouping sets((os,device,city),(os,city),(os,device,city));The following result is returned:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | ios | NULL | Shijiazhuang | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | Beijing | 2 | | linux | PC | Beijing | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | Beijing | 2 | | windows | NULL | Shijiazhuang | 2 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
Example for using GROUPING and GROUPING_ID
Sample statement:
select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid
from values (1,2,3) as t(a,b,c)
group by cube(a,b,c);The following result is returned:
+------------+------------+------------+------------+------------+------------+------------+------------+
| a | b | c | _c3 | ga | gb | gc | groupingid |
+------------+------------+------------+------------+------------+------------+------------+------------+
| NULL | NULL | NULL | 1 | 1 | 1 | 1 | 7 |
| NULL | NULL | 3 | 1 | 1 | 1 | 0 | 6 |
| NULL | 2 | NULL | 1 | 1 | 0 | 1 | 5 |
| NULL | 2 | 3 | 1 | 1 | 0 | 0 | 4 |
| 1 | NULL | NULL | 1 | 0 | 1 | 1 | 3 |
| 1 | NULL | 3 | 1 | 0 | 1 | 0 | 2 |
| 1 | 2 | NULL | 1 | 0 | 0 | 1 | 1 |
| 1 | 2 | 3 | 1 | 0 | 0 | 0 | 0 |
+------------+------------+------------+------------+------------+------------+------------+------------+By default, the columns that are not specified in GROUP BY are filled with NULL. You can use GROUPING to specify the values that you require. Sample statement that is based on the syntax of GROUPING SETS:
select
if(grouping(os) == 0, os, 'ALL') as os,
if(grouping(device) == 0, device, 'ALL') as device,
if(grouping(city) == 0, city, 'ALL') as city,
count(*) as count
from requests
group by os, device, city grouping sets((os, device), (city), ());The following result is returned:
+------------+------------+------------+------------+
| os | device | city | count |
+------------+------------+------------+------------+
| ALL | ALL | ALL | 7 |
| ALL | ALL | Beijing | 4 |
| ALL | ALL | Shijiazhuang | 3 |
| ios | Phone | ALL | 1 |
| linux | PC | ALL | 1 |
| linux | Phone | ALL | 1 |
| windows | PC | ALL | 3 |
| windows | Phone | ALL | 1 |
+------------+------------+------------+------------+Example for using GROUPING__ID
Example for using GROUPING__ID without parameters specified:
set odps.sql.hive.compatible=true;
select
a, b, c, count(*), grouping__id
from values (1,2,3) as t(a,b,c)
group by a, b, c grouping sets ((a,b,c), (a));
-- The preceding statement is equivalent to the following statement:
select
a, b, c, count(*), grouping_id(a,b,c)
from values (1,2,3) as t(a,b,c)
group by a, b, c grouping sets ((a,b,c), (a));The following result is returned:
+------------+------------+------------+------------+------------+
| a | b | c | _c3 | _c4 |
+------------+------------+------------+------------+------------+
| 1 | NULL | NULL | 1 | 3 |
| 1 | 2 | 3 | 1 | 0 |
+------------+------------+------------+------------+------------+