Use ROLLUP to accelerate parallel queries - PolarDB

A standard GROUP BY query returns one aggregated row per group. To get subtotals at higher levels—such as the total across all products within a country, or the grand total across all years—you must run separate queries or aggregate the results manually. ROLLUP eliminates this by computing all hierarchical subtotals in a single query. When combined with PolarDB's parallel query feature, ROLLUP distributes aggregation work across multiple threads, significantly reducing query time on large datasets.

Prerequisites

Before you begin, ensure that you have:

A PolarDB for MySQL 8.0 cluster with revision version 8.0.1.1.0 or later

To check your cluster's version, see Query the engine cluster.

How it works

ROLLUP extends GROUP BY to generate all higher-level subtotals in the same result set. Add WITH ROLLUP after the columns in your GROUP BY clause:

SELECT year, country, product, SUM(profit) AS profit
FROM sales
GROUP BY year, country, product WITH ROLLUP;

ROLLUP computes subtotals from right to left through the grouping columns. For GROUP BY year, country, product WITH ROLLUP, the query produces four levels of aggregation:

Level	Effective grouping	What the row represents
1	`year, country, product`	Profit per product
2	`year, country`	Profit per country (all products combined)
3	`year`	Profit per year (all countries and products combined)
4	(none)	Grand total

Benefits

Using ROLLUP instead of multiple separate queries provides the following advantages:

Simpler queries: A single statement replaces multiple GROUP BY queries at different granularities.
Reduced client load: The server computes all aggregations and the client reads the data only once, reducing processing overhead and network traffic.
Parallel execution: PolarDB distributes aggregation work across multiple threads when ROLLUP is combined with parallel queries, significantly improving performance on large datasets.

Performance with parallel queries

PolarDB accelerates ROLLUP queries by distributing aggregation work across multiple threads. The following TPC Benchmark H (TPC-H) Q1 query, modified to include WITH ROLLUP, demonstrates the improvement:

Note

The TPC-H implementation in this topic is based on TPC-H benchmarking. These test results cannot be compared with published TPC-H benchmark results because the tests do not meet all TPC-H requirements.

SELECT
    l_returnflag,
    l_linestatus,
    sum(l_quantity) AS sum_qty,
    sum(l_extendedprice) AS sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
    avg(l_quantity) AS avg_qty,
    avg(l_extendedprice) AS avg_price,
    avg(l_discount) AS avg_disc,
    count(*) AS count_order
FROM
    lineitem
WHERE
    l_shipdate <= date_sub('1998-12-01', interval ':1' day)
GROUP BY
    l_returnflag,
    l_linestatus
WITH ROLLUP
ORDER BY
    l_returnflag,
    l_linestatus;

Results:

Without parallel queries: 318.73 seconds
With parallel queries: 22.30 seconds—more than 14x faster