Both materialized views and real-time materialized views must be created in advance for queries. In many scenarios, query statements contain variable parameters. For example, a variable parameter can be set to a day or an hour in a statement to query the number of orders within the day or the hour. In this case, you can use the query rewrite feature for real-time materialized views to accelerate queries that contain variable parameters.
Procedure
The following section describes how to use real-time materialized views and query rewrite to accelerate queries that contain variable parameters. In this example, a TPC-H Q1 query is used.
For more information about TPC-H Q1, see TPC-H.
In this example, a test based on the TPC-H benchmark is implemented, but it does not meet all the requirements of the TPC-H benchmark test. Therefore, the test results may not match the published results of the TPC-H benchmark test.
Execute the following statement to create a table:
CREATE TABLE lineitem (
l_orderkey bigint not null,
l_partkey integer not null,
l_suppkey integer not null,
l_linenumber integer not null,
l_quantity numeric not null,
l_extendedprice numeric not null,
l_discount numeric not null,
l_tax numeric not null,
l_returnflag "char" not null,
l_linestatus "char" not null,
l_shipdate date not null,
l_commitdate date not null,
l_receiptdate date not null,
l_shipinstruct char(25) not null,
l_shipmode char(10) not null,
l_comment varchar(44) not null
) DISTRIBUTED BY (l_orderkey);
Execute the following statement to perform a TPC-H Q1 query:
SELECT l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '$1' day --$1 specifies a dynamic condition, which is in the range of 60 to 120.
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus
LIMIT 1;
In the WHERE clause of the preceding TPC-H Q1 query statement, the l_shipdate value varies based on the date and interval values, and the interval value ranges from 60 to 120
.
Execute the following statement to create a materialized view based on the TPC-H Q1 query statement:
CREATE INCREMENTAL MATERIALIZED VIEW q1_mv
AS
SELECT
l_returnflag,
l_linestatus,
l_shipdate,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
count(*) as count_order,
sum(l_extendedprice) as sum_price,
sum(l_discount) as sum_disc,
count(l_quantity) as count_qty,
count(l_extendedprice) as count_price,
count(l_discount) as count_disc
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '60' day
GROUP BY
l_returnflag,
l_linestatus,
l_shipdate
DISTRIBUTED BY (l_returnflag, l_linestatus);
In the preceding statement, the following features of query rewrite are used:
Re-aggregation for supplement
In SQL statements, a WHERE clause is executed before a GROUP BY clause. The TPC-H Q1 query statement contains only the
l_returnflag and l_linestatus
fields in the GROUP BY clause. To support a variable parameter in the WHERE clause, the statement used to create a real-time materialized view includes thel_shipdate
field besidesl_returnflag and l_linestatus
in the GROUP BY clause. In a real query, you can apply a dynamic WHERE condition in the materialized view and perform re-aggregation to obtain the GROUP BY results on thel_returnflag and l_linestatus
fields.NoteThe AVG function does not support re-aggregation. Therefore, query rewrite uses the expression supplement mechanism to create the SUM and COUNT functions that support re-aggregation in the materialized view. The supplement mechanism is implemented based on the following formula:
AVG = SUM/COUNT
.WHERE condition range matching
If no WHERE condition is specified when you create a materialized view, query rewrite automatically supplements a condition. However, the dynamic condition must be in the range of
60 to 120
. When you create a materialized view, you can narrow the range of the WHERE condition down to 60. This way, the materialized view can use a condition in the range of 60 to 120. Query rewrite checks whether the condition contained in the actual query statement is a subset of the condition contained in the materialized view. If yes, query rewrite supplements the required condition.
Execute the following statement to view the use of the materialized view in the query plan:
EXPLAIN SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '100' day
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus
LIMIT 1;
The following sample result is returned:
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Limit (cost=1.01..1.13 rows=1 width=234)
-> Gather Motion 3:1 (slice1; segments: 3) (cost=1.01..1.13 rows=1 width=234)
Merge Key: l_returnflag, l_linestatus
-> Limit (cost=1.01..1.11 rows=1 width=234)
-> GroupAggregate (cost=1.01..1.11 rows=1 width=234)
Group Key: l_returnflag, l_linestatus
-> Sort (cost=1.01..1.01 rows=1 width=194)
Sort Key: l_returnflag, l_linestatus
-> Seq Scan on q1_mv (cost=0.00..1.00 rows=1 width=194)
Filter: (l_shipdate <= '1998-08-23 00:00:00'::timestamp without time zone)
Optimizer: Postgres query optimizer
(11 rows)
Test results
The TPC-H Q1 query speed is significantly improved by the use of the real-time materialized view and query rewrite. A TPC-H test of 1 TB data is performed on an AnalyticDB for PostgreSQL instance that contains 16 compute nodes. After real-time materialized views and query rewrite are used for the TPC-H Q1 query, the query speed improves by nearly 10,000 times and the query time decreases significantly from 340 seconds to 0.04 seconds.