All Products
Search
Document Center

PolarDB:Examples of parallel queries

Last Updated:Apr 11, 2024

This topic provides examples on how to use parallel queries. TPC Benchmark™H (TPC-H) queries are used in the examples.

Note

In this example, a test based on the TPC-H benchmark is implemented, but it does not meet all the requirements of the TPC-H benchmark test. Therefore, the test results may not match the published results of the TPC-H benchmark test.

Test design

  • Data volume: The data volume for testing is 100 GB. The scale factor is 100.

  • PolarDB for MySQL cluster that runs MySQL 8.0: The node specification is 88 CPU cores and 710 GB memory. The test is performed on the primary node of the cluster.

Support for GROUP BY and ORDER BY

For example, the following SQL statement is executed to run a query:

SELECT   l_returnflag, 
         l_linestatus, 
         Sum(l_quantity)                                       AS sum_qty, 
         Sum(l_extendedprice)                                  AS sum_base_price, 
         Sum(l_extendedprice * (1 - l_discount))               AS sum_disc_price, 
         Sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge, 
         Avg(l_quantity)                                       AS avg_qty, 
         Avg(l_extendedprice)                                  AS avg_price, 
         Avg(l_discount)                                       AS avg_disc, 
         Count(*)                                              AS count_order 
FROM     lineitem
WHERE    l_shipdate <= date '1998-12-01' - INTERVAL '93' day 
GROUP BY l_returnflag, 
         l_linestatus 
ORDER BY l_returnflag, 
         l_linestatus ;
  • Before the parallel query feature is enabled, 1,563.32s are consumed to run the query. Before the parallel query feature is enabled

  • After the parallel query feature is enabled, 49.65s are consumed to run the query. The time consumed is reduced to 3.18% of the original response time. After the parallel query feature is enabled

Support for AGGREGATE functions (SUM, AVG, and COUNT)

For example, the query contains the following SQL statements:

SELECT   l_returnflag, 
         l_linestatus, 
         Sum(l_quantity)                                       AS sum_qty, 
         Sum(l_extendedprice)                                  AS sum_base_price, 
         Sum(l_extendedprice * (1 - l_discount))               AS sum_disc_price, 
         Sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge, 
         Avg(l_quantity)                                       AS avg_qty, 
         Avg(l_extendedprice)                                  AS avg_price, 
         Avg(l_discount)                                       AS avg_disc, 
         Count(*)                                              AS count_order 
FROM     lineitem
WHERE    l_shipdate <= date '1998-12-01' - INTERVAL '93' day 
GROUP BY l_returnflag, 
         l_linestatus 
ORDER BY l_returnflag, 
         l_linestatus ;
  • Before the parallel query feature is enabled, 1,563.32s are consumed to run the query. off

  • After the parallel query feature is enabled, 49.65s are consumed to run the query. The time consumed is reduced to 3.18% of the original response time. on

Support for JOIN

For example, the query contains the following SQL statements:

select sum(l_extendedprice* (1 - l_discount)) as revenue 
from   lineitem,   part 
where ( p_partkey = l_partkey and p_brand = 'Brand#12'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') 
        and l_quantity >= 6 and l_quantity <= 6 + 10 
        and p_size between 1 and 5 
        and l_shipmode in ('AIR', 'AIR REG') 
        and l_shipinstruct = 'DELIVER IN PERSON' ) 
    or ( p_partkey = l_partkey and p_brand = 'Brand#13'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') 
        and l_quantity >= 10 and l_quantity <= 10 + 10 
        and p_size between 1 and 10 
        and l_shipmode in ('AIR', 'AIR REG') 
        and l_shipinstruct = 'DELIVER IN PERSON' ) 
    or ( p_partkey = l_partkey and p_brand = 'Brand#24'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') 
        and l_quantity >= 21 and l_quantity <= 21 + 10 
        and p_size between 1 and 15 
        and l_shipmode in ('AIR', 'AIR REG') 
        and l_shipinstruct = 'DELIVER IN PERSON' ); 
  • Before the parallel query feature is enabled, 21.73s are consumed to run the query. off

  • After the parallel query feature is enabled, 1.37s are consumed to run the query. The time consumed is reduced to 6.30% of the original response time. on

Support for BETWEEN and IN functions

For example, the query contains the following SQL statements:

select sum(l_extendedprice* (1 - l_discount)) as revenue 
from   lineitem,   part 
where ( p_partkey = l_partkey and p_brand = 'Brand#12'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') 
        and l_quantity >= 6 and l_quantity <= 6 + 10 
        and p_size between 1 and 5 
        and l_shipmode in ('AIR', 'AIR REG') 
        and l_shipinstruct = 'DELIVER IN PERSON' ) 
    or ( p_partkey = l_partkey and p_brand = 'Brand#13'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') 
        and l_quantity >= 10 and l_quantity <= 10 + 10 
        and p_size between 1 and 10 
        and l_shipmode in ('AIR', 'AIR REG') 
        and l_shipinstruct = 'DELIVER IN PERSON' ) 
    or ( p_partkey = l_partkey and p_brand = 'Brand#24'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') 
        and l_quantity >= 21 and l_quantity <= 21 + 10 
        and p_size between 1 and 15 
        and l_shipmode in ('AIR', 'AIR REG') 
        and l_shipinstruct = 'DELIVER IN PERSON' ); 
  • Before the parallel query feature is enabled, 21.73s are consumed to run the query. off

  • After the parallel query feature is enabled, 1.37s are consumed to run the query. The time consumed is reduced to 6.30% of the original response time. on

Support for LIMIT

For example, the query contains the following SQL statements:

select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 
    else 0 
end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 
else 0 
end) as low_line_count 
from   orders,   lineitem 
where o_orderkey = l_orderkey 
and l_shipmode in ('MAIL', 'TRUCK') 
and l_commitdate < l_receiptdate 
and l_shipdate < l_commitdate 
and l_receiptdate >= date '1996-01-01' 
and l_receiptdate < date '1996-01-01' + interval '1' year 
group by l_shipmode 
order by l_shipmode limit 10; 
  • Before the parallel query feature is enabled, 339.22s are consumed to run the query. off

  • After the parallel query feature is enabled, 29.31s are consumed to run the query. The time consumed is reduced to 8.64% of the original response time. on

Support for INTERVAL functions

For example, the query contains the following SQL statements:

select 
    100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) 
    else 0 
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue 
from   lineitem,   part 
where l_partkey = p_partkey
and l_shipdate >= date '1996-01-01' 
and l_shipdate < date '1996-01-01' + interval '1' month limit 10; 
  • Before the parallel query feature is enabled, 220.87s are consumed to run the query. off

  • After the parallel query feature is enabled, 7.75s are consumed to run the query. The time consumed is reduced to 3.51% of the original response time. on

Support for CASE WHEN

For example, the query contains the following SQL statements:

select 
    100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) 
    else 0 
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue 
from   lineitem,   part 
where l_partkey = p_partkey
and l_shipdate >= date '1996-01-01' 
and l_shipdate < date '1996-01-01' + interval '1' month limit 10; 
  • Before the parallel query feature is enabled, 220.87s are consumed to run the query. off

  • After the parallel query feature is enabled, 7.75s are consumed to run the query. The time consumed is reduced to 3.51% of the original response time. on

Support for LIKE

For example, the query contains the following SQL statements:

select s_name, s_address from
 supplier,  nation where
s_suppkey in 
    ( select ps_suppkey from  partsupp where
             ps_partkey in ( select p_partkey from  part where p_name like 'dark%')
            and ps_availqty>(select 0.0005 * sum(l_quantity) as col1
     from   lineitem,   partsupp
     where l_partkey = ps_partkey and l_suppkey = ps_suppkey
     and l_shipdate >= date '1993-01-01' and l_shipdate < date '1993-01-01' + interval '1' year)
    )
and s_nationkey = n_nationkey and n_name = 'JORDAN'
order by s_name limit 10; 
  • Before the parallel query feature is enabled, 427.46s are consumed to run the query. 1

  • After the parallel query feature is enabled, 33.72s are consumed to run the query. The time consumed is reduced to 7.89% of the original response time. 2

Support for subqueries

For example, the query contains the following SQL statements:

select
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
from
    part,
    supplier,
    partsupp,
    nation,
    region
where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 35
    and p_type like '%STEEL'
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'AMERICA'
    and ps_supplycost = (
        select
            min(ps_supplycost)
        from
            partsupp,
            supplier,
            nation,
            region
        where
            p_partkey = ps_partkey
            and s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = 'AMERICA'
    )
order by
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey;
limit 1;
  • Before the parallel query feature is enabled, 9.27s are consumed to run the query. Support for subqueries (before the parallel query feature is enabled)

  • After the parallel query feature is enabled, 1.12s are consumed to run the query. The time consumed is reduced to 12% of the original response time. Support for subqueries (after the parallel query feature is enabled)

Support for GROUP BY WITH ROLLUP

For more information about GROUP BY WITH ROLLUP, see MySQL WITH ROLLUP and MySQL ROLLUP.

For example, the query contains the following SQL statements:

select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
    l_shipdate <= date_sub('1998-12-01', interval ':1' day)
group by
    l_returnflag,
    l_linestatus
with rollup
order by
    l_returnflag,
    l_linestatus;
  • Before the parallel query feature is enabled, 318.73s are consumed to run the query. GROUP BY WITH ROLLUP (before the parallel query feature is enabled)

  • After the parallel query feature is enabled, 22.30s are consumed to run the query. The time consumed is reduced to 7.00% of the original response time. GROUP BY WITH ROLLUP function (after the parallel query feature is enabled)

Support for Support for INSERT ... SELECT and REPLACE ... SELECT

For example, the query contains the following SQL statements:

insert into line_item_ap
SELECT   l_returnflag, 
         l_linestatus, 
         Sum(l_quantity)                                       AS sum_qty, 
         Sum(l_extendedprice)                                  AS sum_base_price, 
         Sum(l_extendedprice * (1 - l_discount))               AS sum_disc_price, 
         Sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge, 
         Avg(l_quantity)                                       AS avg_qty, 
         Avg(l_extendedprice)                                  AS avg_price, 
         Avg(l_discount)                                       AS avg_disc, 
         Count(*)                                              AS count_order 
FROM     lineitem
WHERE    l_shipdate <= date '1998-12-01' - INTERVAL '93' day 
GROUP BY l_returnflag, 
         l_linestatus 
ORDER BY l_returnflag, 
         l_linestatus ;
  • Before the parallel query feature is enabled, 182.82s are consumed to run the query. INSERT ... SELECT/REPLACE ... SELECT (before the parallel query feature is enabled)

  • After the parallel query feature is enabled, 23.25s are consumed to run the query. The time consumed is reduced to 12.72% of the original response time. INSERT ... SELECT/REPLACE ... SELECT (after the parallel query feature is enabled)