Perform multidimensional aggregation using GROUPING SETS - MaxCompute

GROUPING SETS is an extension of the GROUP BY clause that lets you compute aggregations across multiple grouping dimensions in a single query—without chaining multiple SELECT statements with UNION ALL. This allows MaxCompute to generate more efficient execution plans with higher performance.

Key concepts

The following extensions work with GROUPING SETS.

Extension	Description
`CUBE`	Generates aggregations for all possible combinations of the specified columns.
`ROLLUP`	Generates aggregations at each level of a column hierarchy, from most granular to a grand total.
`GROUPING`	Returns `0` if a row is aggregated on the specified column (NULL is a real value), or `1` if the column is not part of the current grouping set (NULL is a placeholder).
`GROUPING_ID`	Returns a bitmap integer that encodes the `GROUPING` result for one or more columns.
`GROUPING__ID`	A no-parameter variant for Hive-compatible queries. Equivalent to `GROUPING_ID(<GROUP BY column list>)`, with columns in the same order as the `GROUP BY` clause. Recommended for use with Apache Hive 2.3.0 or later.

Usage notes

NULL in results

When GROUPING SETS merges multiple result sets—each grouped by different criteria—columns that are not part of the current grouping set are filled with NULL. This NULL is a placeholder, not a real null value. Use the GROUPING function to distinguish between the two (see GROUPING and GROUPING_ID).

Combining multiple grouping expressions

When you mix GROUP BY, CUBE, ROLLUP, and GROUPING SETS in the same clause, MaxCompute applies cross-product semantics—each combination of grouping sets from all expressions is included in the final result.

For example:

GROUP BY os, CUBE (os, device), GROUPING SETS (city)
-- Equivalent to:
GROUP BY GROUPING SETS ((os, device, city), (os, city), (os, device, city))

Prepare sample data

All examples in this topic use the same requests table.

CREATE TABLE requests LIFECYCLE 20 AS
SELECT * FROM VALUES
    (1, 'windows', 'PC',    'Beijing'),
    (2, 'windows', 'PC',    'Shijiazhuang'),
    (3, 'linux',   'Phone', 'Beijing'),
    (4, 'windows', 'PC',    'Beijing'),
    (5, 'ios',     'Phone', 'Shijiazhuang'),
    (6, 'linux',   'PC',    'Beijing'),
    (7, 'windows', 'Phone', 'Shijiazhuang')
AS t(id, os, device, city);

GROUPING SETS examples

Basic GROUPING SETS

The following example groups requests by (os, device), by city, and produces a grand total—all in one query.

Without GROUPING SETS (three separate scans):

SELECT NULL,   NULL,   NULL, COUNT(*) FROM requests
UNION ALL
SELECT os,     device, NULL, COUNT(*) FROM requests GROUP BY os, device
UNION ALL
SELECT NULL,   NULL,   city, COUNT(*) FROM requests GROUP BY city;

With GROUPING SETS (one scan):

SELECT os, device, city, COUNT(*)
FROM requests
GROUP BY GROUPING SETS ((os, device), (city), ());

Result:

+------------+------------+--------------+-----+
| os         | device     | city         | _c3 |
+------------+------------+--------------+-----+
| NULL       | NULL       | NULL         | 7   |
| NULL       | NULL       | Beijing      | 4   |
| NULL       | NULL       | Shijiazhuang | 3   |
| ios        | Phone      | NULL         | 1   |
| linux      | PC         | NULL         | 1   |
| linux      | Phone      | NULL         | 1   |
| windows    | PC         | NULL         | 3   |
| windows    | Phone      | NULL         | 1   |
+------------+------------+--------------+-----+

NULL in the city column for rows 4–8 is a placeholder indicating that city was not part of that grouping set—not a null value in the data.

CUBE examples

CUBE expands to all possible combinations of the specified columns.

Example 1: All individual columns

SELECT os, device, city, COUNT(*)
FROM requests
GROUP BY CUBE (os, device, city);
-- Equivalent to:
-- GROUP BY GROUPING SETS ((os,device,city),(os,device),(os,city),(device,city),(os),(device),(city),())

Result:

+------------+------------+--------------+-----+
| os         | device     | city         | _c3 |
+------------+------------+--------------+-----+
| NULL       | NULL       | NULL         | 7   |
| NULL       | NULL       | Beijing      | 4   |
| NULL       | NULL       | Shijiazhuang | 3   |
| NULL       | PC         | NULL         | 4   |
| NULL       | PC         | Beijing      | 3   |
| NULL       | PC         | Shijiazhuang | 1   |
| NULL       | Phone      | NULL         | 3   |
| NULL       | Phone      | Beijing      | 1   |
| NULL       | Phone      | Shijiazhuang | 2   |
| ios        | NULL       | NULL         | 1   |
| ios        | NULL       | Shijiazhuang | 1   |
| ios        | Phone      | NULL         | 1   |
| ios        | Phone      | Shijiazhuang | 1   |
| linux      | NULL       | NULL         | 2   |
| linux      | NULL       | Beijing      | 2   |
| linux      | PC         | NULL         | 1   |
| linux      | PC         | Beijing      | 1   |
| linux      | Phone      | NULL         | 1   |
| linux      | Phone      | Beijing      | 1   |
| windows    | NULL       | NULL         | 4   |
| windows    | NULL       | Beijing      | 2   |
| windows    | NULL       | Shijiazhuang | 2   |
| windows    | PC         | NULL         | 3   |
| windows    | PC         | Beijing      | 2   |
| windows    | PC         | Shijiazhuang | 1   |
| windows    | Phone      | NULL         | 1   |
| windows    | Phone      | Shijiazhuang | 1   |
+------------+------------+--------------+-----+

Example 2: Column groups

When you pass column groups to CUBE, each group is treated as a single unit in the combination logic.

SELECT os, device, city, COUNT(*)
FROM requests
GROUP BY CUBE ((os, device), (device, city));
-- Equivalent to:
-- GROUP BY GROUPING SETS ((os,device,city),(os,device),(device,city),())

Result:

+------------+------------+--------------+-----+
| os         | device     | city         | _c3 |
+------------+------------+--------------+-----+
| NULL       | NULL       | NULL         | 7   |
| NULL       | PC         | Beijing      | 3   |
| NULL       | PC         | Shijiazhuang | 1   |
| NULL       | Phone      | Beijing      | 1   |
| NULL       | Phone      | Shijiazhuang | 2   |
| ios        | Phone      | NULL         | 1   |
| ios        | Phone      | Shijiazhuang | 1   |
| linux      | PC         | NULL         | 1   |
| linux      | PC         | Beijing      | 1   |
| linux      | Phone      | NULL         | 1   |
| linux      | Phone      | Beijing      | 1   |
| windows    | PC         | NULL         | 3   |
| windows    | PC         | Beijing      | 2   |
| windows    | PC         | Shijiazhuang | 1   |
| windows    | Phone      | NULL         | 1   |
| windows    | Phone      | Shijiazhuang | 1   |
+------------+------------+--------------+-----+

ROLLUP examples

ROLLUP generates aggregations at each level of a hierarchy, from the most granular grouping down to a grand total.

Example 3: Simple hierarchy

SELECT os, device, city, COUNT(*)
FROM requests
GROUP BY ROLLUP (os, device, city);
-- Equivalent to:
-- GROUP BY GROUPING SETS ((os,device,city),(os,device),(os),())

Result:

+------------+------------+--------------+-----+
| os         | device     | city         | _c3 |
+------------+------------+--------------+-----+
| NULL       | NULL       | NULL         | 7   |
| ios        | NULL       | NULL         | 1   |
| ios        | Phone      | NULL         | 1   |
| ios        | Phone      | Shijiazhuang | 1   |
| linux      | NULL       | NULL         | 2   |
| linux      | PC         | NULL         | 1   |
| linux      | PC         | Beijing      | 1   |
| linux      | Phone      | NULL         | 1   |
| linux      | Phone      | Beijing      | 1   |
| windows    | NULL       | NULL         | 4   |
| windows    | PC         | NULL         | 3   |
| windows    | PC         | Beijing      | 2   |
| windows    | PC         | Shijiazhuang | 1   |
| windows    | Phone      | NULL         | 1   |
| windows    | Phone      | Shijiazhuang | 1   |
+------------+------------+--------------+-----+

Example 4: Hierarchy with a column group

SELECT os, device, city, COUNT(*)
FROM requests
GROUP BY ROLLUP (os, (os, device), city);
-- Equivalent to:
-- GROUP BY GROUPING SETS ((os,device,city),(os,device),(os),())

Result:

+------------+------------+--------------+-----+
| os         | device     | city         | _c3 |
+------------+------------+--------------+-----+
| NULL       | NULL       | NULL         | 7   |
| ios        | NULL       | NULL         | 1   |
| ios        | Phone      | NULL         | 1   |
| ios        | Phone      | Shijiazhuang | 1   |
| linux      | NULL       | NULL         | 2   |
| linux      | PC         | NULL         | 1   |
| linux      | PC         | Beijing      | 1   |
| linux      | Phone      | NULL         | 1   |
| linux      | Phone      | Beijing      | 1   |
| windows    | NULL       | NULL         | 4   |
| windows    | PC         | NULL         | 3   |
| windows    | PC         | Beijing      | 2   |
| windows    | PC         | Shijiazhuang | 1   |
| windows    | Phone      | NULL         | 1   |
| windows    | Phone      | Shijiazhuang | 1   |
+------------+------------+--------------+-----+

Example 5: Combining GROUP BY, CUBE, and GROUPING SETS

The following example applies cross-product semantics across three grouping expressions.

SELECT os, device, city, COUNT(*)
FROM requests
GROUP BY os, CUBE (os, device), GROUPING SETS (city);
-- Equivalent to:
-- GROUP BY GROUPING SETS ((os,device,city),(os,city),(os,device,city))

Result:

+------------+------------+--------------+-----+
| os         | device     | city         | _c3 |
+------------+------------+--------------+-----+
| ios        | NULL       | Shijiazhuang | 1   |
| ios        | Phone      | Shijiazhuang | 1   |
| linux      | NULL       | Beijing      | 2   |
| linux      | PC         | Beijing      | 1   |
| linux      | Phone      | Beijing      | 1   |
| windows    | NULL       | Beijing      | 2   |
| windows    | NULL       | Shijiazhuang | 2   |
| windows    | PC         | Beijing      | 2   |
| windows    | PC         | Shijiazhuang | 1   |
| windows    | Phone      | Shijiazhuang | 1   |
+------------+------------+--------------+-----+

Examples for GROUPING and GROUPING_ID

Because GROUPING SETS fills non-grouped columns with NULL, you cannot tell from the result alone whether a NULL came from the data or from the aggregation. GROUPING and GROUPING_ID resolve this ambiguity.

Distinguish NULL placeholders with GROUPING

GROUPING(col) returns:

0 — the row is aggregated on col (the NULL, if any, is a real null value)
1 — col is not part of the current grouping set (the NULL is a placeholder)

GROUPING_ID(col1, col2, ...) encodes the GROUPING result for multiple columns as a bitmap integer.

SELECT a, b, c, COUNT(*),
    GROUPING(a)          AS ga,
    GROUPING(b)          AS gb,
    GROUPING(c)          AS gc,
    GROUPING_ID(a, b, c) AS groupingid
FROM VALUES (1, 2, 3) AS t(a, b, c)
GROUP BY CUBE (a, b, c);

Result:

+------+------+------+-----+----+----+----+------------+
| a    | b    | c    | _c3 | ga | gb | gc | groupingid |
+------+------+------+-----+----+----+----+------------+
| NULL | NULL | NULL | 1   | 1  | 1  | 1  | 7          |
| NULL | NULL | 3    | 1   | 1  | 1  | 0  | 6          |
| NULL | 2    | NULL | 1   | 1  | 0  | 1  | 5          |
| NULL | 2    | 3    | 1   | 1  | 0  | 0  | 4          |
| 1    | NULL | NULL | 1   | 0  | 1  | 1  | 3          |
| 1    | NULL | 3    | 1   | 0  | 1  | 0  | 2          |
| 1    | 2    | NULL | 1   | 0  | 0  | 1  | 1          |
| 1    | 2    | 3    | 1   | 0  | 0  | 0  | 0          |
+------+------+------+-----+----+----+----+------------+

Replace NULL placeholders with readable labels

Use GROUPING inside an IF expression to display a label instead of NULL for placeholder rows.

SELECT
    IF(GROUPING(os)     = 0, os,     'ALL') AS os,
    IF(GROUPING(device) = 0, device, 'ALL') AS device,
    IF(GROUPING(city)   = 0, city,   'ALL') AS city,
    COUNT(*) AS count
FROM requests
GROUP BY os, device, city GROUPING SETS ((os, device), (city), ());

Result:

+------------+------------+--------------+-------+
| os         | device     | city         | count |
+------------+------------+--------------+-------+
| ALL        | ALL        | ALL          | 7     |
| ALL        | ALL        | Beijing      | 4     |
| ALL        | ALL        | Shijiazhuang | 3     |
| ios        | Phone      | ALL          | 1     |
| linux      | PC         | ALL          | 1     |
| linux      | Phone      | ALL          | 1     |
| windows    | PC         | ALL          | 3     |
| windows    | Phone      | ALL          | 1     |
+------------+------------+--------------+-------+

Example for GROUPING__ID

GROUPING__ID is a no-parameter alias for GROUPING_ID(<GROUP BY column list>), provided for Hive-compatible queries. The column order matches the GROUP BY clause.

If you use Hive 2.3.0 or later, we recommend that you use this function in MaxCompute. If you use a Hive version earlier than 2.3.0, we recommend that you do not use this function in MaxCompute.

SELECT a, b, c, COUNT(*), grouping__id
FROM VALUES (1, 2, 3) AS t(a, b, c)
GROUP BY a, b, c GROUPING SETS ((a, b, c), (a));
-- Equivalent to:
SELECT a, b, c, COUNT(*), GROUPING_ID(a, b, c)
FROM VALUES (1, 2, 3) AS t(a, b, c)
GROUP BY a, b, c GROUPING SETS ((a, b, c), (a));

Result:

+------+------+------+-----+-----+
| a    | b    | c    | _c3 | _c4 |
+------+------+------+-----+-----+
| 1    | NULL | NULL | 1   | 3   |
| 1    | 2    | 3    | 1   | 0   |
+------+------+------+-----+-----+

CUBE and ROLLUP equivalences

The following tables show the GROUPING SETS equivalents for common CUBE and ROLLUP expressions.

CUBE equivalences

Expression	Equivalent GROUPING SETS
`GROUP BY CUBE (a, b, c)`	`GROUPING SETS ((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),())`
`GROUP BY CUBE ((a, b), (c, d))`	`GROUPING SETS ((a,b,c,d),(a,b),(c,d),())`
`GROUP BY a, CUBE (b, c), GROUPING SETS ((d), (e))`	`GROUPING SETS ((a,b,c,d),(a,b,c,e),(a,b,d),(a,b,e),(a,c,d),(a,c,e),(a,d),(a,e))`

ROLLUP equivalences

Expression	Equivalent GROUPING SETS
`GROUP BY ROLLUP (a, b, c)`	`GROUPING SETS ((a,b,c),(a,b),(a),())`
`GROUP BY ROLLUP (a, (b, c), d)`	`GROUPING SETS ((a,b,c,d),(a,b,c),(a),())`
`GROUP BY GROUPING SETS ((b), (c), ROLLUP (a,b,c))`	`GROUPING SETS ((b),(c),(a,b,c),(a,b),(a),())`