GROUPING SETS is an extension of the GROUP BY clause that lets you aggregate data across multiple dimensions in a single query—without writing multiple SELECT ... UNION ALL statements. MaxCompute generates a single, optimized execution plan for the entire operation.
Syntax extensions
The following syntax extensions work with GROUPING SETS.
| Syntax | Description |
|---|---|
CUBE | Generates all possible combinations of the specified columns as grouping sets. For N columns, CUBE produces 2^N grouping sets. |
ROLLUP | Generates grouping sets in a hierarchical manner—from the most granular level to the grand total. For N elements, ROLLUP produces N+1 grouping sets. |
GROUPING | Distinguishes null placeholders from actual null values in data. Takes a column name as a parameter. Returns 0 if the rows are aggregated by that column, or 1 otherwise. |
GROUPING_ID | Takes one or more column names as parameters and combines their GROUPING results into a single integer using a bitmap representation. |
GROUPING__ID | A Hive-compatible variant that takes no parameters. Equivalent to GROUPING_ID(<GROUP BY column list>), with parameters in the same order as the GROUP BY clause. Requires set odps.sql.hive.compatible=true. Recommended for use with Hive 2.3.0 or later. |
How null placeholders work
When a column is not part of a grouping set, GROUPING SETS inserts NULL in that column as a placeholder—not because the value is missing from the data, but because the row represents an aggregation that does not include that dimension.
Use GROUPING to tell the two cases apart. For example:
if(grouping(city) == 0, city, 'ALL')This replaces the null placeholder with 'ALL' when the row is a cross-city aggregate, while preserving an actual null from the data as-is.
Syntax equivalences
CUBE
CUBE over N columns generates all 2^N grouping sets:
group by cube (a, b, c)
-- Equivalent to:
grouping sets ((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),())
group by cube ((a,b),(c,d))
-- Equivalent to:
grouping sets (
(a,b,c,d),
(a,b),
(c,d),
()
)
group by a, cube (b,c), grouping sets ((d),(e))
-- Equivalent to:
group by grouping sets (
(a, b, c, d), (a, b, c, e),
(a, b, d), (a, b, e),
(a, c, d), (a, c, e),
(a, d), (a, e)
)ROLLUP
ROLLUP over N elements generates N+1 grouping sets, from the most granular level down to the grand total:
group by rollup (a,b,c)
-- Equivalent to:
grouping sets ((a,b,c),(a,b),(a),())
group by rollup (a,(b,c),d)
-- Equivalent to:
grouping sets (
(a,b,c,d),
(a,b,c),
(a),
()
)
group by grouping sets((b),(c),rollup(a,b,c))
-- Equivalent to:
group by grouping sets (
(b),(c),(a,b,c),(a,b),(a),()
)List columns inROLLUPfrom the highest level of the hierarchy to the lowest—for example,ROLLUP(year, month, day)rather thanROLLUP(day, month, year). Reversing the order produces different aggregation subtotals becauseROLLUPgenerates each intermediate level by dropping the last column, so the rollup path depends on column order.
Examples
Prepare sample data
All examples in this topic use the following table:
create table requests lifecycle 20 as
select * from values
(1, 'windows', 'PC', 'Beijing'),
(2, 'windows', 'PC', 'Shijiazhuang'),
(3, 'linux', 'Phone', 'Beijing'),
(4, 'windows', 'PC', 'Beijing'),
(5, 'ios', 'Phone', 'Shijiazhuang'),
(6, 'linux', 'PC', 'Beijing'),
(7, 'windows', 'Phone', 'Shijiazhuang')
as t(id, os, device, city);The table has 7 rows with columns id, os, device, and city.
Use GROUPING SETS
The following example groups data by (os, device), by city, and by the grand total—all in one query.
Without GROUPING SETS, you need three separate queries joined by UNION ALL:
select NULL, NULL, NULL, count(*)
from requests
union all
select os, device, NULL, count(*)
from requests group by os, device
union all
select null, null, city, count(*)
from requests group by city;With GROUPING SETS, a single query produces the same result:
select os, device, city, count(*)
from requests
group by grouping sets((os, device), (city), ());This query produces three levels of aggregation:
Grand total across all rows (the
()grouping set)Count by city
Count by OS and device combination
Result:
+------------+------------+------------+------------+
| os | device | city | _c3 |
+------------+------------+------------+------------+
| NULL | NULL | NULL | 7 |
| NULL | NULL | Beijing | 4 |
| NULL | NULL | Shijiazhuang | 3 |
| ios | Phone | NULL | 1 |
| linux | PC | NULL | 1 |
| linux | Phone | NULL | 1 |
| windows | PC | NULL | 3 |
| windows | Phone | NULL | 1 |
+------------+------------+------------+------------+Columns not included in a given grouping set appear as NULL. For example, city is NULL in rows 4–8 because those rows are grouped by (os, device), not by city.
Use CUBE
Example 1: All combinations of three columns
CUBE(os, device, city) generates all 2^3 = 8 grouping sets, covering every possible combination of the three columns.
select os, device, city, count(*)
from requests
group by cube (os, device, city);
-- Equivalent to:
select os, device, city, count(*)
from requests
group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());This query produces eight levels of aggregation:
Count by OS, device, and city (most granular)
Count by OS and device
Count by OS and city
Count by device and city
Count by OS
Count by device
Count by city
Grand total
Result (27 rows):
+------------+------------+------------+------------+
| os | device | city | _c3 |
+------------+------------+------------+------------+
| NULL | NULL | NULL | 7 |
| NULL | NULL | Beijing | 4 |
| NULL | NULL | Shijiazhuang | 3 |
| NULL | PC | NULL | 4 |
| NULL | PC | Beijing | 3 |
| NULL | PC | Shijiazhuang | 1 |
| NULL | Phone | NULL | 3 |
| NULL | Phone | Beijing | 1 |
| NULL | Phone | Shijiazhuang | 2 |
| ios | NULL | NULL | 1 |
| ios | NULL | Shijiazhuang | 1 |
| ios | Phone | NULL | 1 |
| ios | Phone | Shijiazhuang | 1 |
| linux | NULL | NULL | 2 |
| linux | NULL | Beijing | 2 |
| linux | PC | NULL | 1 |
| linux | PC | Beijing | 1 |
| linux | Phone | NULL | 1 |
| linux | Phone | Beijing | 1 |
| windows | NULL | NULL | 4 |
| windows | NULL | Beijing | 2 |
| windows | NULL | Shijiazhuang | 2 |
| windows | PC | NULL | 3 |
| windows | PC | Beijing | 2 |
| windows | PC | Shijiazhuang | 1 |
| windows | Phone | NULL | 1 |
| windows | Phone | Shijiazhuang | 1 |
+------------+------------+------------+------------+Example 2: All combinations of two column groups
CUBE((os, device), (device, city)) treats each parenthesized group as a single unit and generates all 2^2 = 4 grouping sets.
select os, device, city, count(*)
from requests
group by cube ((os, device), (device, city));
-- Equivalent to:
select os, device, city, count(*)
from requests
group by grouping sets ((os, device, city),(os, device),(device,city),());This query produces four levels of aggregation:
Count by OS, device, and city
Count by OS and device
Count by device and city
Grand total
Result (16 rows):
+------------+------------+------------+------------+
| os | device | city | _c3 |
+------------+------------+------------+------------+
| NULL | NULL | NULL | 7 |
| NULL | PC | Beijing | 3 |
| NULL | PC | Shijiazhuang | 1 |
| NULL | Phone | Beijing | 1 |
| NULL | Phone | Shijiazhuang | 2 |
| ios | Phone | NULL | 1 |
| ios | Phone | Shijiazhuang | 1 |
| linux | PC | NULL | 1 |
| linux | PC | Beijing | 1 |
| linux | Phone | NULL | 1 |
| linux | Phone | Beijing | 1 |
| windows | PC | NULL | 3 |
| windows | PC | Beijing | 2 |
| windows | PC | Shijiazhuang | 1 |
| windows | Phone | NULL | 1 |
| windows | Phone | Shijiazhuang | 1 |
+------------+------------+------------+------------+Use ROLLUP
Example 3: Hierarchical aggregation over three columns
ROLLUP(os, device, city) generates 4 grouping sets: the full combination, two intermediate levels, and the grand total.
select os, device, city, count(*)
from requests
group by rollup (os, device, city);
-- Equivalent to:
select os, device, city, count(*)
from requests
group by grouping sets ((os, device, city),(os, device),(os),());This query produces four levels of aggregation:
Count by OS, device, and city (most granular)
Count by OS and device
Count by OS
Grand total
Result (15 rows):
+------------+------------+------------+------------+
| os | device | city | _c3 |
+------------+------------+------------+------------+
| NULL | NULL | NULL | 7 |
| ios | NULL | NULL | 1 |
| ios | Phone | NULL | 1 |
| ios | Phone | Shijiazhuang | 1 |
| linux | NULL | NULL | 2 |
| linux | PC | NULL | 1 |
| linux | PC | Beijing | 1 |
| linux | Phone | NULL | 1 |
| linux | Phone | Beijing | 1 |
| windows | NULL | NULL | 4 |
| windows | PC | NULL | 3 |
| windows | PC | Beijing | 2 |
| windows | PC | Shijiazhuang | 1 |
| windows | Phone | NULL | 1 |
| windows | Phone | Shijiazhuang | 1 |
+------------+------------+------------+------------+Example 4: Hierarchical aggregation with a composite group
ROLLUP(os, (os,device), city) treats (os, device) as one unit in the hierarchy.
select os, device, city, count(*)
from requests
group by rollup (os, (os,device), city);
-- Equivalent to:
select os, device, city, count(*)
from requests
group by grouping sets ((os, device, city),(os, device),(os),());This query produces the same four levels as Example 3, with (os, device) treated as a single composite element in the rollup hierarchy.
Result (15 rows):
+------------+------------+------------+------------+
| os | device | city | _c3 |
+------------+------------+------------+------------+
| NULL | NULL | NULL | 7 |
| ios | NULL | NULL | 1 |
| ios | Phone | NULL | 1 |
| ios | Phone | Shijiazhuang | 1 |
| linux | NULL | NULL | 2 |
| linux | PC | NULL | 1 |
| linux | PC | Beijing | 1 |
| linux | Phone | NULL | 1 |
| linux | Phone | Beijing | 1 |
| windows | NULL | NULL | 4 |
| windows | PC | NULL | 3 |
| windows | PC | Beijing | 2 |
| windows | PC | Shijiazhuang | 1 |
| windows | Phone | NULL | 1 |
| windows | Phone | Shijiazhuang | 1 |
+------------+------------+------------+------------+Example 5: Combining GROUP BY, CUBE, and GROUPING SETS
select os, device, city, count(*)
from requests
group by os, cube(os,device), grouping sets(city);
-- Equivalent to:
select os, device, city, count(*)
from requests
group by grouping sets((os,device,city),(os,city),(os,device,city));Result (10 rows):
+------------+------------+------------+------------+
| os | device | city | _c3 |
+------------+------------+------------+------------+
| ios | NULL | Shijiazhuang | 1 |
| ios | Phone | Shijiazhuang | 1 |
| linux | NULL | Beijing | 2 |
| linux | PC | Beijing | 1 |
| linux | Phone | Beijing | 1 |
| windows | NULL | Beijing | 2 |
| windows | NULL | Shijiazhuang | 2 |
| windows | PC | Beijing | 2 |
| windows | PC | Shijiazhuang | 1 |
| windows | Phone | Shijiazhuang | 1 |
+------------+------------+------------+------------+Use GROUPING and GROUPING_ID
The following example uses GROUPING and GROUPING_ID to identify which columns participated in each grouping set.
select a, b, c, count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid
from values (1,2,3) as t(a,b,c)
group by cube(a,b,c);Result:
+------------+------------+------------+------------+------------+------------+------------+------------+
| a | b | c | _c3 | ga | gb | gc | groupingid |
+------------+------------+------------+------------+------------+------------+------------+------------+
| NULL | NULL | NULL | 1 | 1 | 1 | 1 | 7 |
| NULL | NULL | 3 | 1 | 1 | 1 | 0 | 6 |
| NULL | 2 | NULL | 1 | 1 | 0 | 1 | 5 |
| NULL | 2 | 3 | 1 | 1 | 0 | 0 | 4 |
| 1 | NULL | NULL | 1 | 0 | 1 | 1 | 3 |
| 1 | NULL | 3 | 1 | 0 | 1 | 0 | 2 |
| 1 | 2 | NULL | 1 | 0 | 0 | 1 | 1 |
| 1 | 2 | 3 | 1 | 0 | 0 | 0 | 0 |
+------------+------------+------------+------------+------------+------------+------------+------------+ga, gb, and gc show which columns were used for grouping in each row:
0: the row is grouped by this column (the value is real data, not a placeholder)1: the row is not grouped by this column (the value is a null placeholder)
groupingid combines ga, gb, and gc into a single integer using a bitmap. For example, groupingid=7 means ga=1, gb=1, gc=1—none of the three columns participate in this grouping set (the grand total row).
To replace null placeholders with a readable label, use GROUPING in an IF expression:
select
if(grouping(os) == 0, os, 'ALL') as os,
if(grouping(device) == 0, device, 'ALL') as device,
if(grouping(city) == 0, city, 'ALL') as city,
count(*) as count
from requests
group by os, device, city grouping sets((os, device), (city), ());Result:
+------------+------------+------------+------------+
| os | device | city | count |
+------------+------------+------------+------------+
| ALL | ALL | ALL | 7 |
| ALL | ALL | Beijing | 4 |
| ALL | ALL | Shijiazhuang | 3 |
| ios | Phone | ALL | 1 |
| linux | PC | ALL | 1 |
| linux | Phone | ALL | 1 |
| windows | PC | ALL | 3 |
| windows | Phone | ALL | 1 |
+------------+------------+------------+------------+Use GROUPING__ID
GROUPING__ID is a Hive-compatible function that requires no parameters. It is equivalent to GROUPING_ID applied to the full GROUP BY column list, in the same order.
set odps.sql.hive.compatible=true;
select
a, b, c, count(*), grouping__id
from values (1,2,3) as t(a,b,c)
group by a, b, c grouping sets ((a,b,c), (a));
-- Equivalent to:
select
a, b, c, count(*), grouping_id(a,b,c)
from values (1,2,3) as t(a,b,c)
group by a, b, c grouping sets ((a,b,c), (a));Result:
+------------+------------+------------+------------+------------+
| a | b | c | _c3 | _c4 |
+------------+------------+------------+------------+------------+
| 1 | NULL | NULL | 1 | 3 |
| 1 | 2 | 3 | 1 | 0 |
+------------+------------+------------+------------+------------+We recommend that you use GROUPING__ID with Hive 2.3.0 or later in MaxCompute. We recommend that you do not use this function with Hive versions earlier than 2.3.0.