This topic provides examples on how to use parallel queries. TPC Benchmarkâ˘H (TPC-H) queries are used in the examples.
- Support for GROUP BY and ORDER BY
- Support for AGGREGATE functions (SUM, AVG, and COUNT)
- Support for JOIN
- Support for BETWEEN and IN functions
- Support for LIMIT
- Support for INTERVAL functions
- Support for CASE WHEN
- Support for LIKE
Test design
- Data volume: The data volume for testing is 100 GB. The scale factor is 100.
- PolarDB for MySQL cluster that runs MySQL 8.0: The node specification is 88 CPU cores and 710 GB memory. The test is performed on the primary node of the cluster.
Support for GROUP BY and ORDER BY
For example, the following SQL statement is executed to run a query:
SELECT l_returnflag,
l_linestatus,
Sum(l_quantity) AS sum_qty,
Sum(l_extendedprice) AS sum_base_price,
Sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
Sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
Avg(l_quantity) AS avg_qty,
Avg(l_extendedprice) AS avg_price,
Avg(l_discount) AS avg_disc,
Count(*) AS count_order
FROM lineitem
WHERE l_shipdate <= date '1998-12-01' - INTERVAL '93' day
GROUP BY l_returnflag,
l_linestatus
ORDER BY l_returnflag,
l_linestatus ;
- Before the parallel query feature is enabled, 1,563.32s are consumed to run the query.
- After the parallel query feature is enabled, 49.65s are consumed to run the query. The time consumed is reduced to 3.18% of the original response time.
Support for AGGREGATE functions (SUM, AVG, and COUNT)
For example, the query contains the following SQL statements:
SELECT l_returnflag,
l_linestatus,
Sum(l_quantity) AS sum_qty,
Sum(l_extendedprice) AS sum_base_price,
Sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
Sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
Avg(l_quantity) AS avg_qty,
Avg(l_extendedprice) AS avg_price,
Avg(l_discount) AS avg_disc,
Count(*) AS count_order
FROM lineitem
WHERE l_shipdate <= date '1998-12-01' - INTERVAL '93' day
GROUP BY l_returnflag,
l_linestatus
ORDER BY l_returnflag,
l_linestatus ;
- Before the parallel query feature is enabled, 1,563.32s are consumed to run the query.
- After the parallel query feature is enabled, 49.65s are consumed to run the query. The time consumed is reduced to 3.18% of the original response time.
Support for JOIN
For example, the query contains the following SQL statements:
select sum(l_extendedprice* (1 - l_discount)) as revenue
from lineitem, part
where ( p_partkey = l_partkey and p_brand = 'Brand#12'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 6 and l_quantity <= 6 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON' )
or ( p_partkey = l_partkey and p_brand = 'Brand#13'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 10 and l_quantity <= 10 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON' )
or ( p_partkey = l_partkey and p_brand = 'Brand#24'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 21 and l_quantity <= 21 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON' );
- Before the parallel query feature is enabled, 21.73s are consumed to run the query.
- After the parallel query feature is enabled, 1.37s are consumed to run the query. The time consumed is reduced to 6.30% of the original response time.
Support for BETWEEN and IN functions
For example, the query contains the following SQL statements:
select sum(l_extendedprice* (1 - l_discount)) as revenue
from lineitem, part
where ( p_partkey = l_partkey and p_brand = 'Brand#12'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 6 and l_quantity <= 6 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON' )
or ( p_partkey = l_partkey and p_brand = 'Brand#13'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 10 and l_quantity <= 10 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON' )
or ( p_partkey = l_partkey and p_brand = 'Brand#24'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 21 and l_quantity <= 21 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON' );
- Before the parallel query feature is enabled, 21.73s are consumed to run the query.
- After the parallel query feature is enabled, 1.37s are consumed to run the query. The time consumed is reduced to 6.30% of the original response time.
Support for LIMIT
For example, the query contains the following SQL statements:
select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1
else 0
end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1
else 0
end) as low_line_count
from orders, lineitem
where o_orderkey = l_orderkey
and l_shipmode in ('MAIL', 'TRUCK')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1996-01-01'
and l_receiptdate < date '1996-01-01' + interval '1' year
group by l_shipmode
order by l_shipmode limit 10;
- Before the parallel query feature is enabled, 339.22s are consumed to run the query.
- After the parallel query feature is enabled, 29.31s are consumed to run the query. The time consumed is reduced to 8.64% of the original response time.
Support for INTERVAL functions
For example, the query contains the following SQL statements:
select
100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from lineitem, part
where l_partkey = p_partkey
and l_shipdate >= date '1996-01-01'
and l_shipdate < date '1996-01-01' + interval '1' month limit 10;
- Before the parallel query feature is enabled, 220.87s are consumed to run the query.
- After the parallel query feature is enabled, 7.75s are consumed to run the query. The time consumed is reduced to 3.51% of the original response time.
Support for CASE WHEN
For example, the query contains the following SQL statements:
select
100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from lineitem, part
where l_partkey = p_partkey
and l_shipdate >= date '1996-01-01'
and l_shipdate < date '1996-01-01' + interval '1' month limit 10;
- Before the parallel query feature is enabled, 220.87s are consumed to run the query.
- After the parallel query feature is enabled, 7.75s are consumed to run the query. The time consumed is reduced to 3.51% of the original response time.
Support for LIKE
For example, the query contains the following SQL statements:
select s_name, s_address from
supplier, nation where
s_suppkey in
( select ps_suppkey from partsupp where
ps_partkey in ( select p_partkey from part where p_name like 'dark%')
and ps_availqty>(select 0.0005 * sum(l_quantity) as col1
from lineitem, partsupp
where l_partkey = ps_partkey and l_suppkey = ps_suppkey
and l_shipdate >= date '1993-01-01' and l_shipdate < date '1993-01-01' + interval '1' year)
)
and s_nationkey = n_nationkey and n_name = 'JORDAN'
order by s_name limit 10;
- Before the parallel query feature is enabled, 427.46s are consumed to run the query.
- After the parallel query feature is enabled, 33.72s are consumed to run the query. The time consumed is reduced to 7.89% of the original response time.
Support for subqueries
For example, the query contains the following SQL statements:
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 35
and p_type like '%STEEL'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey;
limit 1;
- Before the parallel query feature is enabled, 9.27s are consumed to run the query.
- After the parallel query feature is enabled, 1.12s are consumed to run the query. The time consumed is reduced to 12% of the original response time.
Support for GROUP BY WITH ROLLUP
For more information about GROUP BY WITH ROLLUP, see MySQL WITH ROLLUP and MySQL ROLLUP.
For example, the query contains the following SQL statements:
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date_sub('1998-12-01', interval ':1' day)
group by
l_returnflag,
l_linestatus
with rollup
order by
l_returnflag,
l_linestatus;
- Before the parallel query feature is enabled, 318.73s are consumed to run the query.
- After the parallel query feature is enabled, 22.30s are consumed to run the query. The time consumed is reduced to 7.00% of the original response time.
Support for Support for INSERT ... SELECT and REPLACE ... SELECT
For example, the query contains the following SQL statements:
insert into line_item_ap
SELECT l_returnflag,
l_linestatus,
Sum(l_quantity) AS sum_qty,
Sum(l_extendedprice) AS sum_base_price,
Sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
Sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
Avg(l_quantity) AS avg_qty,
Avg(l_extendedprice) AS avg_price,
Avg(l_discount) AS avg_disc,
Count(*) AS count_order
FROM lineitem
WHERE l_shipdate <= date '1998-12-01' - INTERVAL '93' day
GROUP BY l_returnflag,
l_linestatus
ORDER BY l_returnflag,
l_linestatus ;
- Before the parallel query feature is enabled, 182.82s are consumed to run the query.
- After the parallel query feature is enabled, 23.25s are consumed to run the query. The time consumed is reduced to 12.72% of the original response time.