All Products
Search
Document Center

PolarDB:GROUPING SETS, ROLLUP, and CUBE extensions

Last Updated:Dec 11, 2023

In relational databases, you must use multiple SELECT and UNION statements to group results based on multiple groups of dimensions. PolarDB-X 1.0 provides GROUPING SETS, ROLLUP, and CUBE extensions that allow you to group results based on multiple groups of dimensions. In addition, PolarDB-X 1.0 allows you to use the GROUPING and GROUPING_ID functions in a SELECT statement or a HAVING clause. This helps to explain the results of the preceding extensions. This topic describes the relevant syntax and examples.

Prerequisites

The PolarDB-X 1.0 instance version is 5.4.10 or later.

Considerations

  • The syntax of all the GROUP BY extensions in this topic does not allow SQL queries to be pushed down to the LogicalView operators for execution. For more information about SQL query pushdown, see SQL rewrite and pushdown.

  • The following test data information is used in the examples of this topic:

    Execute the following statement to create a table named requests:

    CREATE TABLE requests (
      `id` int(10) UNSIGNED NOT NULL,
      `os` varchar(20) DEFAULT NULL,
      `device` varchar(20) DEFAULT NULL,
      `city` varchar(20) DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE = InnoDB DEFAULT CHARSET = utf8 dbpartition BY hash(`id`) tbpartition BY hash(`id`);

    Execute the following statement to insert the required test data to the requests table:

    INSERT INTO requests (id, os, device, city) VALUES
    (1, 'windows', 'PC', 'Beijing'),
    (2, 'windows', 'PC', 'Shijiazhuang'),
    (3, 'linux', 'Phone', 'Beijing'),
    (4, 'windows', 'PC', 'Beijing'),
    (5, 'ios', 'Phone', 'Shijiazhuang'),
    (6, 'linux', 'PC', 'Beijing'),
    (7, 'windows', 'Phone', 'Shijiazhuang');

GROUPING SETS extension

  • Overview

    GROUPING SETS is an extension of the GROUP BY clause and can generate a result set. The result set is a concatenation of multiple result sets based on different groups. The result returned by the GROUPING SETS extension is similar to that of the UNION ALL operator. However, the UNION ALL operator and the GROUPING SETS expansion do not remove duplicate rows from the combined result sets.

  • Syntax

    GROUPING SETS (
      { expr_1 | ( expr_1a [, expr_1b ] ...) |
        ROLLUP ( expr_list ) | CUBE ( expr_list )
      } [, ...] )
    Note

    A GROUPING SETS extension can contain a combination of one or more comma-separated expressions, such as expr_1 or (expr_1a [, expr_1b ] ...),and lists of expressions enclosed within parentheses (), such as ( expr_list ). In the syntax:

    • Each expression can be used to determine how the result set is grouped.

    • You can nest a ROLLUP or CUBE extension in a GROUPING SETS extension.

  • Examples

    • You can group the data that you want to query by using a GROUPING SETS extension. The following code block shows the relevant syntax:

      select os,device, city ,count(*)
      from requests
      group by grouping sets((os, device), (city), ());
      
      The preceding statement is equivalent to the following statement:
      select os, device, NULL, count(*)
      from requests group by os, device
      union all
      select NULL, NULL, NULL, count(*)
      from requests
      union all
      select null, null, city, count(*)
      from requests group by city;

      The following result is returned:

      +---------+--------+--------------+----------+
      | os      | device | city         | count(*) |
      +---------+--------+--------------+----------+
      | windows | PC     | NULL         |        3 |
      | linux   | PC     | NULL         |        1 |
      | linux   | Phone  | NULL         |        1 |
      | windows | Phone  | NULL         |        1 |
      | ios     | Phone  | NULL         |        1 |
      | NULL    | NULL   | Shijiazhuang |        3 |
      | NULL    | NULL   | Beijing      |        4 |
      | NULL    | NULL   | NULL         |        7 |
      +---------+--------+--------------+----------+
      Note

      If an expression is not used in a grouping set, NULL is used as a placeholder for the expression. This facilitates operations on the result set that is not used in the grouping set. For example, the result set is the rows where the values of the city column are NULL in the returned result.

    • You can group data by nesting a ROLLUP extension in a GROUPING SETS extension. The following code block shows the relevant syntax:

      select os,device, city ,count(*) from requests 
      group by grouping sets((city), ROLLUP(os, device));
      
      The preceding statement is equivalent to the following statement:
      select os,device, city ,count(*) from requests 
      group by grouping sets((city), (os), (os, device), ());

      The following result is returned:

      +---------+--------+--------------+----------+
      | os      | device | city         | count(*) |
      +---------+--------+--------------+----------+
      | NULL    | NULL   | Shijiazhuang |        3 |
      | NULL    | NULL   | Beijing      |        4 |
      | windows | PC     | NULL         |        3 |
      | linux   | PC     | NULL         |        1 |
      | ios     | Phone  | NULL         |        1 |
      | linux   | Phone  | NULL         |        1 |
      | windows | Phone  | NULL         |        1 |
      | windows | NULL   | NULL         |        4 |
      | linux   | NULL   | NULL         |        2 |
      | ios     | NULL   | NULL         |        1 |
      | NULL    | NULL   | NULL         |        7 |
      +---------+--------+--------------+----------+
    • You can group data by nesting a CUBE extension in a GROUPING SETS extension. The following code block shows the relevant syntax:

      select os,device, city ,count(*) from requests 
      group by grouping sets((city), CUBE(os, device));
      
      The preceding statement is equivalent to the following statement:
      select os,device, city ,count(*) from requests 
      group by grouping sets((city), (os), (os, device), (), (device));

      The following result is returned:

      +---------+--------+--------------+----------+
      | os      | device | city         | count(*) |
      +---------+--------+--------------+----------+
      | NULL    | NULL   | Beijing      |        4 |
      | NULL    | NULL   | Shijiazhuang |        3 |
      | windows | PC     | NULL         |        3 |
      | ios     | Phone  | NULL         |        1 |
      | linux   | Phone  | NULL         |        1 |
      | windows | Phone  | NULL         |        1 |
      | linux   | PC     | NULL         |        1 |
      | windows | NULL   | NULL         |        4 |
      | ios     | NULL   | NULL         |        1 |
      | linux   | NULL   | NULL         |        2 |
      | NULL    | PC     | NULL         |        4 |
      | NULL    | Phone  | NULL         |        3 |
      | NULL    | NULL   | NULL         |        7 |
      +---------+--------+--------------+----------+
    • You can combine the GROUP BY clause and the CUBE, and GROUPING SETS extensions to generate grouping sets, as shown in the following example:

      select os,device, city, count(*)
      from requests 
      group by os, cube(os,device), grouping sets(city);
      
      The preceding statement is equivalent to the following statement:
      select os,device, city, count(*)
      from requests 
      group by grouping sets((os,device,city),(os,city);

      The following result is returned:

      +---------+--------+--------------+----------+
      | os      | device | city         | count(*) |
      +---------+--------+--------------+----------+
      | linux   | Phone  | Beijing      |        1 |
      | windows | Phone  | Shijiazhuang |        1 |
      | windows | PC     | Shijiazhuang |        1 |
      | linux   | PC     | Beijing      |        1 |
      | windows | PC     | Beijing      |        2 |
      | ios     | Phone  | Shijiazhuang |        1 |
      | linux   | NULL   | Beijing      |        2 |
      | windows | NULL   | Shijiazhuang |        2 |
      | windows | NULL   | Beijing      |        2 |
      | ios     | NULL   | Shijiazhuang |        1 |
      +---------+--------+--------------+----------+

ROLLUP extension

  • Overview

    A ROLLUP extension generates a hierarchical set of groups. In this set, subtotals for each hierarchical group and a grand total are available. The order of the hierarchy is determined by the order of the expressions that are specified in the ROLLUP expression list. The top of the hierarchy is the leftmost item in the list. Each successive item that proceeds to the right side moves down the hierarchy. The rightmost item is at the lowest level.

  • Syntax

    ROLLUP ( { expr_1 | ( expr_1a [, expr_1b ] ...) }
      [, expr_2 | ( expr_2a [, expr_2b ] ...) ] ...)
    Note
    • Each expression is used to determine how the result set is grouped. If the expressions are enclosed in parentheses (), such as ( expr_1a, expr_1b, ...), the combination of values returned by expr_1a and expr_1b defines a single grouping level of the hierarchy.

    • For the first item in the list, such as expr_1 or the combination of ( expr_1a, expr_1b, ...), PolarDB-X 1.0a subtotal is returned for each unique value of the first item. For the second item in the list, such as expr_2 or the combination of ( expr_2a, expr_2b, ...), PolarDB-X 1.0a subtotal is returned for each unique value of each group in the second item. Similar rules are used in each grouping level of the first item and other items. Finally, PolarDB-X 1.0a grand total is returned for the entire result set.

    • For the subtotal rows, NULL is returned for the items across which the subtotal is taken.

  • Examples

    • ROLLUP is used to aggregate (os, device, city) in a hierarchical manner to generate grouping sets. The following code block shows the relevant syntax:

      select os,device, city, count(*)
      from requests 
      group by rollup (os, device, city);
      
      The preceding statement is equivalent to the following statement:
      select os,device, city, count(*)
      from requests 
      group by os, device, city with rollup;
      
      The first statement is also equivalent to the following statement:
      select os,device, city, count(*)
      from requests 
      group by grouping sets ((os, device, city),(os, device),(os),());

      The following result is returned:

      +---------+--------+--------------+----------+
      | os      | device | city         | count(*) |
      +---------+--------+--------------+----------+
      | windows | PC     | Beijing      |        2 |
      | ios     | Phone  | Shijiazhuang |        1 |
      | windows | PC     | Shijiazhuang |        1 |
      | linux   | PC     | Beijing      |        1 |
      | linux   | Phone  | Beijing      |        1 |
      | windows | Phone  | Shijiazhuang |        1 |
      | windows | PC     | NULL         |        3 |
      | ios     | Phone  | NULL         |        1 |
      | linux   | PC     | NULL         |        1 |
      | linux   | Phone  | NULL         |        1 |
      | windows | Phone  | NULL         |        1 |
      | windows | NULL   | NULL         |        4 |
      | ios     | NULL   | NULL         |        1 |
      | linux   | NULL   | NULL         |        2 |
      | NULL    | NULL   | NULL         |        7 |
      +---------+--------+--------------+----------+
    • ROLLUP is used to aggregate os, (os,device), and city in a hierarchical manner to generate grouping sets. The following code block shows the relevant syntax:

      select os,device, city, count(*)
      from requests 
      group by rollup (os, (os,device), city);
      
      The preceding statement is equivalent to the following statement:
      select os,device, city, count(*)
      from requests 
      group by os, (os,device), city with rollup;
      
      The first statement is also equivalent to the following statement:
      select os,device, city, count(*)
      from requests 
      group by grouping sets ((os, device, city),(os, device),(os),());

      The following result is returned:

      +---------+--------+--------------+----------+
      | os      | device | city         | count(*) |
      +---------+--------+--------------+----------+
      | windows | PC     | Beijing      |        2 |
      | windows | PC     | Shijiazhuang |        1 |
      | linux   | PC     | Beijing      |        1 |
      | linux   | Phone  | Beijing      |        1 |
      | windows | Phone  | Shijiazhuang |        1 |
      | ios     | Phone  | Shijiazhuang |        1 |
      | windows | PC     | NULL         |        3 |
      | linux   | PC     | NULL         |        1 |
      | linux   | Phone  | NULL         |        1 |
      | windows | Phone  | NULL         |        1 |
      | ios     | Phone  | NULL         |        1 |
      | windows | NULL   | NULL         |        4 |
      | linux   | NULL   | NULL         |        2 |
      | ios     | NULL   | NULL         |        1 |
      | NULL    | NULL   | NULL         |        7 |
      +---------+--------+--------------+----------+

CUBE extension

  • Overview

    A CUBE extension is similar to a ROLLUP extension. A ROLLUP extension generates groupings and results in a hierarchy based on a left-to-right listing of items in the ROLLUP expression list. However, a CUBE extension generates groupings and subtotals based on each permutation of all the items in the CUBE expression list. Therefore, a CUBE extension returns more rows in the generated result set than a ROLLUP extension that is performed on the same expression list.

  • Syntax

    CUBE ( { expr_1 | ( expr_1a [, expr_1b ] ...) }
      [, expr_2 | ( expr_2a [, expr_2b ] ...) ] ...)
    Note
    • Each expression is used to determine how the result set is grouped. If the expressions are enclosed within parentheses (), such as ( expr_1a, expr_1b, ...), the combination of values that are returned by expr_1a and expr_1b defines a single group.

    • For the first item in the list, such as expr_1 or the combination of ( expr_1a, expr_1b, ...), PolarDB-X 1.0a subtotal is returned for each unique value of the first item. For the second item in the list, such as expr_2 or the combination of ( expr_2a, expr_2b, ...), PolarDB-X 1.0a subtotal is returned for each unique value of the second item. A subtotal is also returned for each unique combination of the first item and the second item. If a third item exists, PolarDB-X 1.0a subtotal is returned for each unique value of the third item, each unique combination of the third and first items, each unique combination of the third and second items, and each unique combination of the third, second, and first items. Finally, a grand total is returned for the entire result set.

    • For the subtotal rows, NULL is returned for the items across which the subtotal is taken.

  • Examples

    • CUBE lists all the possible combinations of (os, device, city) columns as grouping sets. The following code block shows the relevant syntax:

      select os,device, city, count(*)
      from requests 
      group by cube (os, device, city);
      
      The preceding statement is equivalent to the following statement:
      select os,device, city, count(*)
      from requests 
      group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());

      The following result is returned:

      +---------+--------+--------------+----------+
      | os      | device | city         | count(*) |
      +---------+--------+--------------+----------+
      | linux   | Phone  | Beijing      |        1 |
      | windows | Phone  | Shijiazhuang |        1 |
      | windows | PC     | Beijing      |        2 |
      | ios     | Phone  | Shijiazhuang |        1 |
      | windows | PC     | Shijiazhuang |        1 |
      | linux   | PC     | Beijing      |        1 |
      | linux   | Phone  | NULL         |        1 |
      | windows | Phone  | NULL         |        1 |
      | windows | PC     | NULL         |        3 |
      | ios     | Phone  | NULL         |        1 |
      | linux   | PC     | NULL         |        1 |
      | linux   | NULL   | Beijing      |        2 |
      | windows | NULL   | Shijiazhuang |        2 |
      | windows | NULL   | Beijing      |        2 |
      | ios     | NULL   | Shijiazhuang |        1 |
      | linux   | NULL   | NULL         |        2 |
      | windows | NULL   | NULL         |        4 |
      | ios     | NULL   | NULL         |        1 |
      | NULL    | Phone  | Beijing      |        1 |
      | NULL    | Phone  | Shijiazhuang |        2 |
      | NULL    | PC     | Beijing      |        3 |
      | NULL    | PC     | Shijiazhuang |        1 |
      | NULL    | Phone  | NULL         |        3 |
      | NULL    | PC     | NULL         |        4 |
      | NULL    | NULL   | Beijing      |        4 |
      | NULL    | NULL   | Shijiazhuang |        3 |
      | NULL    | NULL   | NULL         |        7 |
      +---------+--------+--------------+----------+
    • CUBE lists all the possible combinations of (os, device),(device, city) columns as grouping sets. The following code block shows the relevant syntax:

      select os,device, city, count(*) 
      from requests 
      group by cube ((os, device), (device, city));
      
      The preceding statement is equivalent to the following statement:
      select os,device, city, count(*) 
      from requests 
      group by grouping sets ((os, device, city),(os, device),(device,city),());

      The following result is returned:

      +---------+--------+--------------+----------+
      | os      | device | city         | count(*) |
      +---------+--------+--------------+----------+
      | linux   | Phone  | Beijing      |        1 |
      | windows | Phone  | Shijiazhuang |        1 |
      | windows | PC     | Beijing      |        2 |
      | windows | PC     | Shijiazhuang |        1 |
      | linux   | PC     | Beijing      |        1 |
      | ios     | Phone  | Shijiazhuang |        1 |
      | linux   | Phone  | NULL         |        1 |
      | windows | Phone  | NULL         |        1 |
      | windows | PC     | NULL         |        3 |
      | linux   | PC     | NULL         |        1 |
      | ios     | Phone  | NULL         |        1 |
      | NULL    | Phone  | Beijing      |        1 |
      | NULL    | Phone  | Shijiazhuang |        2 |
      | NULL    | PC     | Beijing      |        3 |
      | NULL    | PC     | Shijiazhuang |        1 |
      | NULL    | NULL   | NULL         |        7 |
      +---------+--------+--------------+----------+

GROUPING and GROUPING_ID functions

  • Overview

    • GROUPING function

      When you use the GROUPING SETS, ROLLUP, or CUBE extensions in the GROUP BY clause, NULL is used as a placeholder in a return value of the GROUPING SETS extension. As a result, the placeholder NULL cannot be distinguished from the value NULL. You can use the GROUPING function provided by PolarDB-X 1.0 to solve this problem.

      The GROUPING function allows you to use a column name as a parameter. If the corresponding rows are aggregated based on the column, 0 is returned in the result. In this case, NULL is a value. If the corresponding rows are not aggregated based on the column, 1 is returned. In this case, NULL is a placeholder in a return value of the GROUPING SETS extension.

    • GROUPING_ID function

      The GROUPING_ID function simplifies the process of implementing the GROUPING function. The GROUPING_ID function is used to determine the subtotal level of a row in the result set from a ROLLBACK, CUBE, or GROUPING SETS extension. The GROUPING function uses only one column expression and returns a value to indicate whether a row is a subtotal for all the values of the specified column. Therefore, multiple GROUPING functions may be required to interpret the level of subtotals for queries that have multiple grouping columns. The GROUPING_ID function supports one or more column expressions that have been used in the ROLLBACK, CUBE, or GROUPING SETS extensions and returns a single integer. This integer indicates the column on which a subtotal has been aggregated.

  • Syntax

    • GROUPING function

      SELECT [ expr ...,] GROUPING( col_expr ) [, expr ] ...
      FROM ...
      GROUP BY { ROLLUP | CUBE | GROUPING SETS }( [...,] col_expr
        [, ...] ) [, ...]
      Note

      The GROUPING function uses a single parameter. This parameter must be an expression of a dimension column that is specified in the expression list of a ROLLUP, CUBE, or GROUPING SETS extension of the GROUP BY clause.

    • GROUPING_ID function

      SELECT [ expr ...,]
        GROUPING_ID( col_expr_1 [, col_expr_2 ] ... )
        [, expr ] ...
      FROM ...
      GROUP BY { ROLLUP | CUBE | GROUPING SETS }( [...,] col_expr_1
        [, col_expr_2 ] [, ...] ) [, ...]
  • Examples

    The GROUPING_ID function uses multiple column names as parameters, and converts the grouping results of the parameter columns into integers by using the bitmap algorithm. The following code block shows the relevant syntax:

    select a,b,c,count(*),
    grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid 
    from (select 1 as a ,2 as b,3 as c)
    group by cube(a,b,c);

    The following result is returned:

    +------+------+------+----------+------+------+------+------------+
    | a    | b    | c    | count(*) | ga   | gb   | gc   | groupingid |
    +------+------+------+----------+------+------+------+------------+
    |    1 |    2 |    3 |        1 |    0 |    0 |    0 |          0 |
    |    1 |    2 | NULL |        1 |    0 |    0 |    1 |          1 |
    |    1 | NULL |    3 |        1 |    0 |    1 |    0 |          2 |
    |    1 | NULL | NULL |        1 |    0 |    1 |    1 |          3 |
    | NULL |    2 |    3 |        1 |    1 |    0 |    0 |          4 |
    | NULL |    2 | NULL |        1 |    1 |    0 |    1 |          5 |
    | NULL | NULL |    3 |        1 |    1 |    1 |    0 |          6 |
    | NULL | NULL | NULL |        1 |    1 |    1 |    1 |          7 |
    +------+------+------+----------+------+------+------+------------+