GROUPING SETS, ROLLUP, and CUBE extensions - PolarDB - Alibaba Cloud Documentation Center

In relational databases, you must use multiple SELECT and UNION statements to group results based on multiple groups of dimensions. PolarDB-X 1.0 provides GROUPING SETS, ROLLUP, and CUBE extensions that allow you to group results based on multiple groups of dimensions. In addition, PolarDB-X 1.0 allows you to use the GROUPING and GROUPING_ID functions in a SELECT statement or a HAVING clause. This helps to explain the results of the preceding extensions. This topic describes the relevant syntax and examples.

Prerequisites

The PolarDB-X 1.0 instance version is 5.4.10 or later.

Considerations

The syntax of all the GROUP BY extensions in this topic does not allow SQL queries to be pushed down to the LogicalView operators for execution. For more information about SQL query pushdown, see SQL rewrite and pushdown.

The following test data information is used in the examples of this topic:

Execute the following statement to create a table named requests:

CREATE TABLE requests (
  `id` int(10) UNSIGNED NOT NULL,
  `os` varchar(20) DEFAULT NULL,
  `device` varchar(20) DEFAULT NULL,
  `city` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8 dbpartition BY hash(`id`) tbpartition BY hash(`id`);

Execute the following statement to insert the required test data to the requests table:

INSERT INTO requests (id, os, device, city) VALUES
(1, 'windows', 'PC', 'Beijing'),
(2, 'windows', 'PC', 'Shijiazhuang'),
(3, 'linux', 'Phone', 'Beijing'),
(4, 'windows', 'PC', 'Beijing'),
(5, 'ios', 'Phone', 'Shijiazhuang'),
(6, 'linux', 'PC', 'Beijing'),
(7, 'windows', 'Phone', 'Shijiazhuang');

GROUPING SETS extension

Overview
GROUPING SETS is an extension of the GROUP BY clause and can generate a result set. The result set is a concatenation of multiple result sets based on different groups. The result returned by the GROUPING SETS extension is similar to that of the UNION ALL operator. However, the UNION ALL operator and the GROUPING SETS expansion do not remove duplicate rows from the combined result sets.
Syntax
```
GROUPING SETS (
  { expr_1 | ( expr_1a [, expr_1b ] ...) |
    ROLLUP ( expr_list ) | CUBE ( expr_list )
  } [, ...] )
```
Note
A GROUPING SETS extension can contain a combination of one or more comma-separated expressions, such as expr_1 or (expr_1a [, expr_1b ] ...),and lists of expressions enclosed within parentheses (), such as ( expr_list ). In the syntax:
- Each expression can be used to determine how the result set is grouped.
- You can nest a ROLLUP or CUBE extension in a GROUPING SETS extension.

Examples

You can group the data that you want to query by using a GROUPING SETS extension. The following code block shows the relevant syntax:

select os,device, city ,count(*)
from requests
group by grouping sets((os, device), (city), ());

The preceding statement is equivalent to the following statement:
select os, device, NULL, count(*)
from requests group by os, device
union all
select NULL, NULL, NULL, count(*)
from requests
union all
select null, null, city, count(*)
from requests group by city;

The following result is returned:

+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| windows | PC     | NULL         |        3 |
| linux   | PC     | NULL         |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| ios     | Phone  | NULL         |        1 |
| NULL    | NULL   | Shijiazhuang |        3 |
| NULL    | NULL   | Beijing      |        4 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+

Note

If an expression is not used in a grouping set, NULL is used as a placeholder for the expression. This facilitates operations on the result set that is not used in the grouping set. For example, the result set is the rows where the values of the city column are NULL in the returned result.

You can group data by nesting a ROLLUP extension in a GROUPING SETS extension. The following code block shows the relevant syntax:

select os,device, city ,count(*) from requests 
group by grouping sets((city), ROLLUP(os, device));

The preceding statement is equivalent to the following statement:
select os,device, city ,count(*) from requests 
group by grouping sets((city), (os), (os, device), ());

The following result is returned:

+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| NULL    | NULL   | Shijiazhuang |        3 |
| NULL    | NULL   | Beijing      |        4 |
| windows | PC     | NULL         |        3 |
| linux   | PC     | NULL         |        1 |
| ios     | Phone  | NULL         |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| windows | NULL   | NULL         |        4 |
| linux   | NULL   | NULL         |        2 |
| ios     | NULL   | NULL         |        1 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+

You can group data by nesting a CUBE extension in a GROUPING SETS extension. The following code block shows the relevant syntax:

select os,device, city ,count(*) from requests 
group by grouping sets((city), CUBE(os, device));

The preceding statement is equivalent to the following statement:
select os,device, city ,count(*) from requests 
group by grouping sets((city), (os), (os, device), (), (device));

The following result is returned:

+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| NULL    | NULL   | Beijing      |        4 |
| NULL    | NULL   | Shijiazhuang |        3 |
| windows | PC     | NULL         |        3 |
| ios     | Phone  | NULL         |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| linux   | PC     | NULL         |        1 |
| windows | NULL   | NULL         |        4 |
| ios     | NULL   | NULL         |        1 |
| linux   | NULL   | NULL         |        2 |
| NULL    | PC     | NULL         |        4 |
| NULL    | Phone  | NULL         |        3 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+

You can combine the GROUP BY clause and the CUBE, and GROUPING SETS extensions to generate grouping sets, as shown in the following example:

select os,device, city, count(*)
from requests 
group by os, cube(os,device), grouping sets(city);

The preceding statement is equivalent to the following statement:
select os,device, city, count(*)
from requests 
group by grouping sets((os,device,city),(os,city);

The following result is returned:

+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| linux   | Phone  | Beijing      |        1 |
| windows | Phone  | Shijiazhuang |        1 |
| windows | PC     | Shijiazhuang |        1 |
| linux   | PC     | Beijing      |        1 |
| windows | PC     | Beijing      |        2 |
| ios     | Phone  | Shijiazhuang |        1 |
| linux   | NULL   | Beijing      |        2 |
| windows | NULL   | Shijiazhuang |        2 |
| windows | NULL   | Beijing      |        2 |
| ios     | NULL   | Shijiazhuang |        1 |
+---------+--------+--------------+----------+

ROLLUP extension

Overview
A ROLLUP extension generates a hierarchical set of groups. In this set, subtotals for each hierarchical group and a grand total are available. The order of the hierarchy is determined by the order of the expressions that are specified in the ROLLUP expression list. The top of the hierarchy is the leftmost item in the list. Each successive item that proceeds to the right side moves down the hierarchy. The rightmost item is at the lowest level.
Syntax
```
ROLLUP ( { expr_1 | ( expr_1a [, expr_1b ] ...) }
  [, expr_2 | ( expr_2a [, expr_2b ] ...) ] ...)
```
Note
- Each expression is used to determine how the result set is grouped. If the expressions are enclosed in parentheses (), such as ( expr_1a, expr_1b, ...), the combination of values returned by expr_1a and expr_1b defines a single grouping level of the hierarchy.
- For the first item in the list, such as expr_1 or the combination of ( expr_1a, expr_1b, ...), PolarDB-X 1.0a subtotal is returned for each unique value of the first item. For the second item in the list, such as expr_2 or the combination of ( expr_2a, expr_2b, ...), PolarDB-X 1.0a subtotal is returned for each unique value of each group in the second item. Similar rules are used in each grouping level of the first item and other items. Finally, PolarDB-X 1.0a grand total is returned for the entire result set.
- For the subtotal rows, NULL is returned for the items across which the subtotal is taken.

Examples

ROLLUP is used to aggregate (os, device, city) in a hierarchical manner to generate grouping sets. The following code block shows the relevant syntax:

select os,device, city, count(*)
from requests 
group by rollup (os, device, city);

The preceding statement is equivalent to the following statement:
select os,device, city, count(*)
from requests 
group by os, device, city with rollup;

The first statement is also equivalent to the following statement:
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());

The following result is returned:

+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| windows | PC     | Beijing      |        2 |
| ios     | Phone  | Shijiazhuang |        1 |
| windows | PC     | Shijiazhuang |        1 |
| linux   | PC     | Beijing      |        1 |
| linux   | Phone  | Beijing      |        1 |
| windows | Phone  | Shijiazhuang |        1 |
| windows | PC     | NULL         |        3 |
| ios     | Phone  | NULL         |        1 |
| linux   | PC     | NULL         |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| windows | NULL   | NULL         |        4 |
| ios     | NULL   | NULL         |        1 |
| linux   | NULL   | NULL         |        2 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+

ROLLUP is used to aggregate os, (os,device), and city in a hierarchical manner to generate grouping sets. The following code block shows the relevant syntax:

select os,device, city, count(*)
from requests 
group by rollup (os, (os,device), city);

The preceding statement is equivalent to the following statement:
select os,device, city, count(*)
from requests 
group by os, (os,device), city with rollup;

The first statement is also equivalent to the following statement:
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());

The following result is returned:

+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| windows | PC     | Beijing      |        2 |
| windows | PC     | Shijiazhuang |        1 |
| linux   | PC     | Beijing      |        1 |
| linux   | Phone  | Beijing      |        1 |
| windows | Phone  | Shijiazhuang |        1 |
| ios     | Phone  | Shijiazhuang |        1 |
| windows | PC     | NULL         |        3 |
| linux   | PC     | NULL         |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| ios     | Phone  | NULL         |        1 |
| windows | NULL   | NULL         |        4 |
| linux   | NULL   | NULL         |        2 |
| ios     | NULL   | NULL         |        1 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+

CUBE extension

Overview
A CUBE extension is similar to a ROLLUP extension. A ROLLUP extension generates groupings and results in a hierarchy based on a left-to-right listing of items in the ROLLUP expression list. However, a CUBE extension generates groupings and subtotals based on each permutation of all the items in the CUBE expression list. Therefore, a CUBE extension returns more rows in the generated result set than a ROLLUP extension that is performed on the same expression list.
Syntax
```
CUBE ( { expr_1 | ( expr_1a [, expr_1b ] ...) }
  [, expr_2 | ( expr_2a [, expr_2b ] ...) ] ...)
```
Note
- Each expression is used to determine how the result set is grouped. If the expressions are enclosed within parentheses (), such as ( expr_1a, expr_1b, ...), the combination of values that are returned by expr_1a and expr_1b defines a single group.
- For the first item in the list, such as expr_1 or the combination of ( expr_1a, expr_1b, ...), PolarDB-X 1.0a subtotal is returned for each unique value of the first item. For the second item in the list, such as expr_2 or the combination of ( expr_2a, expr_2b, ...), PolarDB-X 1.0a subtotal is returned for each unique value of the second item. A subtotal is also returned for each unique combination of the first item and the second item. If a third item exists, PolarDB-X 1.0a subtotal is returned for each unique value of the third item, each unique combination of the third and first items, each unique combination of the third and second items, and each unique combination of the third, second, and first items. Finally, a grand total is returned for the entire result set.
- For the subtotal rows, NULL is returned for the items across which the subtotal is taken.

Examples

CUBE lists all the possible combinations of (os, device, city) columns as grouping sets. The following code block shows the relevant syntax:

select os,device, city, count(*)
from requests 
group by cube (os, device, city);

The preceding statement is equivalent to the following statement:
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());

The following result is returned:

+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| linux   | Phone  | Beijing      |        1 |
| windows | Phone  | Shijiazhuang |        1 |
| windows | PC     | Beijing      |        2 |
| ios     | Phone  | Shijiazhuang |        1 |
| windows | PC     | Shijiazhuang |        1 |
| linux   | PC     | Beijing      |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| windows | PC     | NULL         |        3 |
| ios     | Phone  | NULL         |        1 |
| linux   | PC     | NULL         |        1 |
| linux   | NULL   | Beijing      |        2 |
| windows | NULL   | Shijiazhuang |        2 |
| windows | NULL   | Beijing      |        2 |
| ios     | NULL   | Shijiazhuang |        1 |
| linux   | NULL   | NULL         |        2 |
| windows | NULL   | NULL         |        4 |
| ios     | NULL   | NULL         |        1 |
| NULL    | Phone  | Beijing      |        1 |
| NULL    | Phone  | Shijiazhuang |        2 |
| NULL    | PC     | Beijing      |        3 |
| NULL    | PC     | Shijiazhuang |        1 |
| NULL    | Phone  | NULL         |        3 |
| NULL    | PC     | NULL         |        4 |
| NULL    | NULL   | Beijing      |        4 |
| NULL    | NULL   | Shijiazhuang |        3 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+

CUBE lists all the possible combinations of (os, device),(device, city) columns as grouping sets. The following code block shows the relevant syntax:

select os,device, city, count(*) 
from requests 
group by cube ((os, device), (device, city));

The preceding statement is equivalent to the following statement:
select os,device, city, count(*) 
from requests 
group by grouping sets ((os, device, city),(os, device),(device,city),());

The following result is returned:

+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| linux   | Phone  | Beijing      |        1 |
| windows | Phone  | Shijiazhuang |        1 |
| windows | PC     | Beijing      |        2 |
| windows | PC     | Shijiazhuang |        1 |
| linux   | PC     | Beijing      |        1 |
| ios     | Phone  | Shijiazhuang |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| windows | PC     | NULL         |        3 |
| linux   | PC     | NULL         |        1 |
| ios     | Phone  | NULL         |        1 |
| NULL    | Phone  | Beijing      |        1 |
| NULL    | Phone  | Shijiazhuang |        2 |
| NULL    | PC     | Beijing      |        3 |
| NULL    | PC     | Shijiazhuang |        1 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+

GROUPING and GROUPING_ID functions

Overview
- GROUPING function
  When you use the GROUPING SETS, ROLLUP, or CUBE extensions in the GROUP BY clause, NULL is used as a placeholder in a return value of the GROUPING SETS extension. As a result, the placeholder NULL cannot be distinguished from the value NULL. You can use the GROUPING function provided by PolarDB-X 1.0 to solve this problem.
  The GROUPING function allows you to use a column name as a parameter. If the corresponding rows are aggregated based on the column, 0 is returned in the result. In this case, NULL is a value. If the corresponding rows are not aggregated based on the column, 1 is returned. In this case, NULL is a placeholder in a return value of the GROUPING SETS extension.
- GROUPING_ID function
  The GROUPING_ID function simplifies the process of implementing the GROUPING function. The GROUPING_ID function is used to determine the subtotal level of a row in the result set from a ROLLBACK, CUBE, or GROUPING SETS extension. The GROUPING function uses only one column expression and returns a value to indicate whether a row is a subtotal for all the values of the specified column. Therefore, multiple GROUPING functions may be required to interpret the level of subtotals for queries that have multiple grouping columns. The GROUPING_ID function supports one or more column expressions that have been used in the ROLLBACK, CUBE, or GROUPING SETS extensions and returns a single integer. This integer indicates the column on which a subtotal has been aggregated.

Syntax

GROUPING function
```
SELECT [ expr ...,] GROUPING( col_expr ) [, expr ] ...
FROM ...
GROUP BY { ROLLUP | CUBE | GROUPING SETS }( [...,] col_expr
  [, ...] ) [, ...]
```
Note
The GROUPING function uses a single parameter. This parameter must be an expression of a dimension column that is specified in the expression list of a ROLLUP, CUBE, or GROUPING SETS extension of the GROUP BY clause.

GROUPING_ID function

SELECT [ expr ...,]
  GROUPING_ID( col_expr_1 [, col_expr_2 ] ... )
  [, expr ] ...
FROM ...
GROUP BY { ROLLUP | CUBE | GROUPING SETS }( [...,] col_expr_1
  [, col_expr_2 ] [, ...] ) [, ...]

Examples

The GROUPING_ID function uses multiple column names as parameters, and converts the grouping results of the parameter columns into integers by using the bitmap algorithm. The following code block shows the relevant syntax:

select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid 
from (select 1 as a ,2 as b,3 as c)
group by cube(a,b,c);

The following result is returned:

+------+------+------+----------+------+------+------+------------+
| a    | b    | c    | count(*) | ga   | gb   | gc   | groupingid |
+------+------+------+----------+------+------+------+------------+
|    1 |    2 |    3 |        1 |    0 |    0 |    0 |          0 |
|    1 |    2 | NULL |        1 |    0 |    0 |    1 |          1 |
|    1 | NULL |    3 |        1 |    0 |    1 |    0 |          2 |
|    1 | NULL | NULL |        1 |    0 |    1 |    1 |          3 |
| NULL |    2 |    3 |        1 |    1 |    0 |    0 |          4 |
| NULL |    2 | NULL |        1 |    1 |    0 |    1 |          5 |
| NULL | NULL |    3 |        1 |    1 |    1 |    0 |          6 |
| NULL | NULL | NULL |        1 |    1 |    1 |    1 |          7 |
+------+------+------+----------+------+------+------+------------+