All Products
Search
Document Center

MaxCompute:GROUPING SETS

Last Updated:Jul 20, 2023

In some cases, you need to execute a UNION ALL clause multiple times to aggregate and analyze data from multiple dimensions. For example, if you want to aggregate Column a, aggregate Column b, and then aggregate Column a and Column b, you can use GROUPING SETS to perform these operations. This topic describes how to use GROUPING SETS for multidimensional aggregation.

Description

GROUPING SETS is an extension of a GROUP BY clause in a SELECT statement. The GROUPING SETS clause allows you to group your results in multiple ways, without the need to execute multiple SELECT statements and UNION ALL in sequence. This allows MaxCompute to generate more efficient execution plans with higher performance.

The following table describes the syntax that is associated with GROUPING SETS.

Type

Description

CUBE

A special form of GROUPING SETS, CUBE lists all the possible combinations of specified columns as grouping sets. You can also use CUBE with GROUPING SETS.

group by cube (a, b, c)  
-- Equivalent to the following clauses:  
grouping sets ((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),())

group by cube ( (a, b), (c, d) ) 
-- Equivalent to the following clauses: 
grouping sets (
    ( a, b, c, d ),
    ( a, b       ),
    (       c, d ),
    (            )
)
group by a, cube (b, c), grouping sets ((d), (e)) 
-- Equivalent to the following clause: 
group by grouping sets (
    (a, b, c, d), (a, b, c, e),
    (a, b, d),    (a, b, e),
    (a, c, d),    (a, c, e),
    (a, d),       (a, e)
)

ROLLUP

A special form of GROUPING SETS. ROLLUP aggregates specified columns and generates grouping sets in a hierarchical manner. You can also use ROLLUP with GROUPING SETS.

group by rollup (a, b, c)
-- Equivalent to the following clauses:  
grouping sets ((a,b,c),(a,b),(a), ())

group by rollup ( a, (b, c), d ) 
-- Equivalent to the following clauses:
grouping sets (
    ( a, b, c, d ),
    ( a, b, c    ),
    ( a          ),
    (            )
)
group by grouping sets((b), (c), rollup(a,b,c)) 
-- Equivalent to the following clause: 
group by grouping sets (
    (b), (c),
    (a,b,c), (a,b), (a), ()
 )

GROUPING

NULL is used as placeholders in the results of GROUPING SETS. As a result, NULL placeholders cannot be distinguished from NULL values. To address this issue, MaxCompute provides GROUPING. GROUPING allows you to use the name of a column as a parameter. If specific rows are aggregated based on a column, 0 is returned. In this case, NULL is used as a value. If specific rows are not aggregated based on a column, 1 is returned. In this case, NULL is used as a placeholder.

GROUPING_ID

GROUPING_ID allows you to use the names of one or more columns as parameters. The grouping results of columns are used to bitmap integer values.

GROUPING__ID

GROUPING__ID does not require parameters. It is used for Hive-compatible queries. GROUPING__ID is equivalent to GROUPING__ID(GROUP BY Parameter list) in MaxCompute. The parameters of GROUPING__ID are in the same order as GROUP BY.

Note

If you use Hive 2.3.0 or later, we recommend that you use this function in MaxCompute. If you use a Hive version earlier than 2.3.0, we recommend that you do not use this function in MaxCompute.

Examples

Example for using GROUPING SETS:

  1. Prepare data.

    create table requests lifecycle 20 as 
    select * from values 
        (1, 'windows', 'PC', 'Beijing'),
        (2, 'windows', 'PC', 'Shijiazhuang'),
        (3, 'linux', 'Phone', 'Beijing'),
        (4, 'windows', 'PC', 'Beijing'),
        (5, 'ios', 'Phone', 'Shijiazhuang'),
        (6, 'linux', 'PC', 'Beijing'),
        (7, 'windows', 'Phone', 'Shijiazhuang') 
    as t(id, os, device, city);
  2. Use one of the following methods to group data:

    • Execute multiple SELECT statements to group data.

      select NULL, NULL, NULL, count(*)
      from requests
      union all
      select os, device, NULL, count(*)
      from requests group by os, device
      union all
      select null, null, city, count(*)
      from requests group by city;
    • Use GROUPING SETS to group data.

      select os,device, city ,count(*)
      from requests
      group by grouping sets((os, device), (city), ());

      The following result is returned:

      +------------+------------+------------+------------+
      | os         | device     | city       | _c3        |
      +------------+------------+------------+------------+
      | NULL       | NULL       | NULL       | 7          |
      | NULL       | NULL       | Beijing    | 4          |
      | NULL       | NULL       | Shijiazhuang | 3          |
      | ios        | Phone      | NULL       | 1          |
      | linux      | PC         | NULL       | 1          |
      | linux      | Phone      | NULL       | 1          |
      | windows    | PC         | NULL       | 3          |
      | windows    | Phone      | NULL       | 1          |
      +------------+------------+------------+------------+
    Note

    If some expressions are not used in GROUPING SETS, NULL is used as placeholders for these expressions, for example, NULL of the city column in the fourth row to the eighth row. This way, you can perform operations on the result sets.

Examples for using CUBE or ROLLUP

Examples for using CUBE or ROLLUP based on the syntax of GROUPING SETS:

  • Example 1: Use CUBE to list all the possible columns os, device, and city as grouping sets. Sample statement:

    select os,device, city, count(*)
    from requests 
    group by cube (os, device, city);
    -- The preceding statement is equivalent to the following statement:
    select os,device, city, count(*)
    from requests 
    group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());

    The following result is returned:

    +------------+------------+------------+------------+
    | os         | device     | city       | _c3        |
    +------------+------------+------------+------------+
    | NULL       | NULL       | NULL       | 7          |
    | NULL       | NULL       | Beijing    | 4          |
    | NULL       | NULL       | Shijiazhuang | 3          |
    | NULL       | PC         | NULL       | 4          |
    | NULL       | PC         | Beijing    | 3          |
    | NULL       | PC         | Shijiazhuang | 1          |
    | NULL       | Phone      | NULL       | 3          |
    | NULL       | Phone      | Beijing    | 1          |
    | NULL       | Phone      | Shijiazhuang | 2          |
    | ios        | NULL       | NULL       | 1          |
    | ios        | NULL       | Shijiazhuang | 1          |
    | ios        | Phone      | NULL       | 1          |
    | ios        | Phone      | Shijiazhuang | 1          |
    | linux      | NULL       | NULL       | 2          |
    | linux      | NULL       | Beijing    | 2          |
    | linux      | PC         | NULL       | 1          |
    | linux      | PC         | Beijing    | 1          |
    | linux      | Phone      | NULL       | 1          |
    | linux      | Phone      | Beijing    | 1          |
    | windows    | NULL       | NULL       | 4          |
    | windows    | NULL       | Beijing    | 2          |
    | windows    | NULL       | Shijiazhuang | 2          |
    | windows    | PC         | NULL       | 3          |
    | windows    | PC         | Beijing    | 2          |
    | windows    | PC         | Shijiazhuang | 1          |
    | windows    | Phone      | NULL       | 1          |
    | windows    | Phone      | Shijiazhuang | 1          |
    +------------+------------+------------+------------+
  • Example 2: Use CUBE to list all the possible combinations of columns, (os, device),(device, city), as grouping sets. Sample statement:

    select os,device, city, count(*) 
    from requests 
    group by cube ((os, device), (device, city));
    -- The preceding statement is equivalent to the following statement:
    select os,device, city, count(*) 
    from requests 
    group by grouping sets ((os, device, city),(os, device),(device,city),());

    The following result is returned:

    +------------+------------+------------+------------+
    | os         | device     | city       | _c3        |
    +------------+------------+------------+------------+
    | NULL       | NULL       | NULL       | 7          |
    | NULL       | PC         | Beijing    | 3          |
    | NULL       | PC         | Shijiazhuang | 1          |
    | NULL       | Phone      | Beijing    | 1          |
    | NULL       | Phone      | Shijiazhuang | 2          |
    | ios        | Phone      | NULL       | 1          |
    | ios        | Phone      | Shijiazhuang | 1          |
    | linux      | PC         | NULL       | 1          |
    | linux      | PC         | Beijing    | 1          |
    | linux      | Phone      | NULL       | 1          |
    | linux      | Phone      | Beijing    | 1          |
    | windows    | PC         | NULL       | 3          |
    | windows    | PC         | Beijing    | 2          |
    | windows    | PC         | Shijiazhuang | 1          |
    | windows    | Phone      | NULL       | 1          |
    | windows    | Phone      | Shijiazhuang | 1          |
    +------------+------------+------------+------------+
  • Example 3: Use ROLLUP to aggregate the os, device, and city columns in a hierarchical manner to generate multiple grouping sets. Sample statement:

    select os,device, city, count(*)
    from requests 
    group by rollup (os, device, city);
    -- The preceding statement is equivalent to the following statement:
    select os,device, city, count(*)
    from requests 
    group by grouping sets ((os, device, city),(os, device),(os),());

    The following result is returned:

    +------------+------------+------------+------------+
    | os         | device     | city       | _c3        |
    +------------+------------+------------+------------+
    | NULL       | NULL       | NULL       | 7          |
    | ios        | NULL       | NULL       | 1          |
    | ios        | Phone      | NULL       | 1          |
    | ios        | Phone      | Shijiazhuang | 1          |
    | linux      | NULL       | NULL       | 2          |
    | linux      | PC         | NULL       | 1          |
    | linux      | PC         | Beijing    | 1          |
    | linux      | Phone      | NULL       | 1          |
    | linux      | Phone      | Beijing    | 1          |
    | windows    | NULL       | NULL       | 4          |
    | windows    | PC         | NULL       | 3          |
    | windows    | PC         | Beijing    | 2          |
    | windows    | PC         | Shijiazhuang | 1          |
    | windows    | Phone      | NULL       | 1          |
    | windows    | Phone      | Shijiazhuang | 1          |
    +------------+------------+------------+------------+
  • Example 4: Use ROLLUP to aggregate os, (os,device), city in a hierarchical manner to generate multiple grouping sets. Sample statement:

    select os,device, city, count(*)
    from requests 
    group by rollup (os, (os,device), city);
    -- The preceding statement is equivalent to the following statement:
    select os,device, city, count(*)
    from requests 
    group by grouping sets ((os, device, city),(os, device),(os),());

    The following result is returned:

    +------------+------------+------------+------------+
    | os         | device     | city       | _c3        |
    +------------+------------+------------+------------+
    | NULL       | NULL       | NULL       | 7          |
    | ios        | NULL       | NULL       | 1          |
    | ios        | Phone      | NULL       | 1          |
    | ios        | Phone      | Shijiazhuang | 1          |
    | linux      | NULL       | NULL       | 2          |
    | linux      | PC         | NULL       | 1          |
    | linux      | PC         | Beijing    | 1          |
    | linux      | Phone      | NULL       | 1          |
    | linux      | Phone      | Beijing    | 1          |
    | windows    | NULL       | NULL       | 4          |
    | windows    | PC         | NULL       | 3          |
    | windows    | PC         | Beijing    | 2          |
    | windows    | PC         | Shijiazhuang | 1          |
    | windows    | Phone      | NULL       | 1          |
    | windows    | Phone      | Shijiazhuang | 1          |
    +------------+------------+------------+------------+
  • Example 5: Use GROUP BY, CUBE, and GROUPING SETS to generate multiple grouping sets. Sample statement:

    select os,device, city, count(*)
    from requests 
    group by os, cube(os,device), grouping sets(city);
    -- The preceding statement is equivalent to the following statement:
    select os,device, city, count(*)
    from requests 
    group by grouping sets((os,device,city),(os,city),(os,device,city));

    The following result is returned:

    +------------+------------+------------+------------+
    | os         | device     | city       | _c3        |
    +------------+------------+------------+------------+
    | ios        | NULL       | Shijiazhuang | 1          |
    | ios        | Phone      | Shijiazhuang | 1          |
    | linux      | NULL       | Beijing    | 2          |
    | linux      | PC         | Beijing    | 1          |
    | linux      | Phone      | Beijing    | 1          |
    | windows    | NULL       | Beijing    | 2          |
    | windows    | NULL       | Shijiazhuang | 2          |
    | windows    | PC         | Beijing    | 2          |
    | windows    | PC         | Shijiazhuang | 1          |
    | windows    | Phone      | Shijiazhuang | 1          |
    +------------+------------+------------+------------+

Example for using GROUPING and GROUPING_ID

Sample statement:

select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid 
from values (1,2,3) as t(a,b,c)
group by cube(a,b,c);

The following result is returned:

+------------+------------+------------+------------+------------+------------+------------+------------+
| a          | b          | c          | _c3        | ga         | gb         | gc         | groupingid |
+------------+------------+------------+------------+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 1          | 1          | 1          | 1          | 7          |
| NULL       | NULL       | 3          | 1          | 1          | 1          | 0          | 6          |
| NULL       | 2          | NULL       | 1          | 1          | 0          | 1          | 5          |
| NULL       | 2          | 3          | 1          | 1          | 0          | 0          | 4          |
| 1          | NULL       | NULL       | 1          | 0          | 1          | 1          | 3          |
| 1          | NULL       | 3          | 1          | 0          | 1          | 0          | 2          |
| 1          | 2          | NULL       | 1          | 0          | 0          | 1          | 1          |
| 1          | 2          | 3          | 1          | 0          | 0          | 0          | 0          |
+------------+------------+------------+------------+------------+------------+------------+------------+

By default, the columns that are not specified in GROUP BY are filled with NULL. You can use GROUPING to specify the values that you require. Sample statement that is based on the syntax of GROUPING SETS:

select
  if(grouping(os) == 0, os, 'ALL') as os,
  if(grouping(device) == 0, device, 'ALL') as device,
  if(grouping(city) == 0, city, 'ALL') as city, 
  count(*) as count 
from requests 
group by os, device, city grouping sets((os, device), (city), ());

The following result is returned:

+------------+------------+------------+------------+
| os         | device     | city       | count      |
+------------+------------+------------+------------+
| ALL        | ALL        | ALL        | 7          |
| ALL        | ALL        | Beijing    | 4          |
| ALL        | ALL        | Shijiazhuang | 3          |
| ios        | Phone      | ALL        | 1          |
| linux      | PC         | ALL        | 1          |
| linux      | Phone      | ALL        | 1          |
| windows    | PC         | ALL        | 3          |
| windows    | Phone      | ALL        | 1          |
+------------+------------+------------+------------+

Example for using GROUPING__ID:

Example for using GROUPING__ID without parameters specified:

set odps.sql.hive.compatible=true;
select      
a, b, c, count(*), grouping__id 
from values (1,2,3) as t(a,b,c) 
group by a, b, c grouping sets ((a,b,c), (a));
-- The preceding statement is equivalent to the following statement:
select      
a, b, c, count(*), grouping_id(a,b,c)  
from values (1,2,3) as t(a,b,c) 
group by a, b, c grouping sets ((a,b,c), (a));

The following result is returned:

+------------+------------+------------+------------+------------+
| a          | b          | c          | _c3        | _c4        |
+------------+------------+------------+------------+------------+
| 1          | NULL       | NULL       | 1          | 3          |
| 1          | 2          | 3          | 1          | 0          |
+------------+------------+------------+------------+------------+