In many scenarios, query statements contain variable parameters. For example, a variable parameter can be set to a day or an hour in a statement to query the number of orders within the day or the hour. Materialized views, even real-time materialized views, must be created ahead of time. In this case, you can use real-time materialized views and the query rewrite feature to accelerate queries that contain variable parameters.

Operation method

The following section describes how to use real-time materialized views and query rewrite to accelerate queries that contain variable parameters. In this example, a TPC-H Q1 query is used.

For more information about TPC-H Q1, see TPC-H.

Note In this example, a test based on the TPC-H benchmark test is implemented, but it does not meet all the requirements of the TPC-H benchmark test. Therefore, the test results are not comparable with the published results of the TPC-H benchmark test.

Execute the following statement to create a table:

CREATE TABLE lineitem (
    l_orderkey    bigint not null,
    l_partkey     integer not null,
    l_suppkey     integer not null,
    l_linenumber  integer not null,
    l_quantity    numeric not null,
    l_extendedprice  numeric not null,
    l_discount    numeric not null,
    l_tax         numeric not null,
    l_returnflag  "char" not null,
    l_linestatus  "char" not null,
    l_shipdate    date not null,
    l_commitdate  date not null,
    l_receiptdate date not null,
    l_shipinstruct char(25) not null,
    l_shipmode     char(10) not null,
    l_comment      varchar(44) not null
) DISTRIBUTED BY (l_orderkey);

Execute the following statement to perform a TPC-H Q1 query:

SELECT l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
FROM
    lineitem
WHERE
    l_shipdate <= date '1998-12-01' - interval '$1' day --$1 specifies a dynamic condition, which is in the range of 60 to 120.
GROUP BY
    l_returnflag,
    l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus
LIMIT 1;

In the WHERE clause of the preceding TPC-H Q1 query statement, the l_shipdate value varies based on the date and interval values, and the interval value ranges from 60 to 120.

Execute the following statement to create a materialized view based on the TPC-H Q1 query statement:

CREATE INCREMENTAL MATERIALIZED VIEW q1_mv
AS
SELECT
    l_returnflag,
    l_linestatus,
    l_shipdate,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    count(*) as count_order,
    sum(l_extendedprice) as sum_price,
    sum(l_discount) as sum_disc,
    count(l_quantity) as count_qty,
    count(l_extendedprice) as count_price,
    count(l_discount) as count_disc
FROM
    lineitem
WHERE
    l_shipdate <= date '1998-12-01' - interval '60' day
GROUP BY
    l_returnflag,
    l_linestatus,
    l_shipdate
DISTRIBUTED BY (l_returnflag, l_linestatus);

In the preceding statement, the following features of query rewrite are used:

  • Re-aggregation and compensation

    In SQL statements, a WHERE clause is executed before a GROUP BY clause. The TPC-H Q1 query statement contains only the l_returnflag and l_linestatus fields in the GROUP BY clause. To support a variable parameter in the WHERE clause, the statement used to create a real-time materialized view includes the l_shipdate field besides l_returnflag and l_linestatus in the GROUP BY clause. In a real query, you can apply a dynamic WHERE condition in the materialized view and perform re-aggregation to obtain the GROUP BY results on the l_returnflag and l_linestatus fields.

    Note The AVG function does not support re-aggregation. Therefore, query rewrite uses the expression compensation mechanism to create the SUM and COUNT functions that support re-aggregation in the materialized view. The compensation mechanism is implemented based on the following formula: AVG = SUM/COUNT.
  • WHERE condition range matching

    If no WHERE condition is specified when you create a materialized view, query rewrite automatically supplements a condition. However, the dynamic condition must be in the range of 60 to 120. When you create a materialized view, you can narrow the range of the WHERE condition down to 60. This way, the materialized view can use a condition in the range of 60 to 120. Query rewrite checks whether the condition contained in the actual query statement is a subset of the condition contained in the materialized view. If yes, query rewrite supplements the required condition.

Execute the following statement to view the use of the materialized view in the query plan:

EXPLAIN SELECT
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
FROM
    lineitem
WHERE
    l_shipdate <= date '1998-12-01' - interval '100' day
GROUP BY
    l_returnflag,
    l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus
LIMIT 1;

The following sample result is returned:

                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.01..1.13 rows=1 width=234)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=1.01..1.13 rows=1 width=234)
         Merge Key: l_returnflag, l_linestatus
         ->  Limit  (cost=1.01..1.11 rows=1 width=234)
               ->  GroupAggregate  (cost=1.01..1.11 rows=1 width=234)
                     Group Key: l_returnflag, l_linestatus
                     ->  Sort  (cost=1.01..1.01 rows=1 width=194)
                           Sort Key: l_returnflag, l_linestatus
                           ->  Seq Scan on q1_mv  (cost=0.00..1.00 rows=1 width=194)
                                 Filter: (l_shipdate <= '1998-08-23 00:00:00'::timestamp without time zone)
 Optimizer: Postgres query optimizer
(11 rows)

Test results

The TPC-H Q1 query speed is significantly improved by the use of the real-time materialized view and query rewrite. A TPC-H test of 1 TB data is performed on an AnalyticDB for PostgreSQL instance that contains 16 compute nodes. After real-time materialized views and query rewrite are used for the TPC-H Q1 query, the query speed improves by nearly 10,000 times and the query time decreases significantly from 340 seconds to 0.04 seconds.

References