In many scenarios, query statements contain variable parameters. For example, a variable parameter can be set to a day or an hour in a statement to query the number of orders within the day or the hour. Materialized views, even real-time materialized views, must be created ahead of time. In this case, you can use real-time materialized views and the query rewrite feature to accelerate queries that contain variable parameters.
Operation method
The following section describes how to use real-time materialized views and query rewrite to accelerate queries that contain variable parameters. In this example, a TPC-H Q1 query is used.
For more information about TPC-H Q1, see TPC-H.
Execute the following statement to create a table:
CREATE TABLE lineitem (
l_orderkey bigint not null,
l_partkey integer not null,
l_suppkey integer not null,
l_linenumber integer not null,
l_quantity numeric not null,
l_extendedprice numeric not null,
l_discount numeric not null,
l_tax numeric not null,
l_returnflag "char" not null,
l_linestatus "char" not null,
l_shipdate date not null,
l_commitdate date not null,
l_receiptdate date not null,
l_shipinstruct char(25) not null,
l_shipmode char(10) not null,
l_comment varchar(44) not null
) DISTRIBUTED BY (l_orderkey);
Execute the following statement to perform a TPC-H Q1 query:
SELECT l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '$1' day --$1 specifies a dynamic condition, which is in the range of 60 to 120.
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus
LIMIT 1;
In the WHERE clause of the preceding TPC-H Q1 query statement, the l_shipdate value
varies based on the date and interval values, and the interval value ranges from 60 to 120
.
Execute the following statement to create a materialized view based on the TPC-H Q1 query statement:
CREATE INCREMENTAL MATERIALIZED VIEW q1_mv
AS
SELECT
l_returnflag,
l_linestatus,
l_shipdate,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
count(*) as count_order,
sum(l_extendedprice) as sum_price,
sum(l_discount) as sum_disc,
count(l_quantity) as count_qty,
count(l_extendedprice) as count_price,
count(l_discount) as count_disc
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '60' day
GROUP BY
l_returnflag,
l_linestatus,
l_shipdate
DISTRIBUTED BY (l_returnflag, l_linestatus);
In the preceding statement, the following features of query rewrite are used:
- Re-aggregation and compensation
In SQL statements, a WHERE clause is executed before a GROUP BY clause. The TPC-H Q1 query statement contains only the
l_returnflag and l_linestatus
fields in the GROUP BY clause. To support a variable parameter in the WHERE clause, the statement used to create a real-time materialized view includes thel_shipdate
field besidesl_returnflag and l_linestatus
in the GROUP BY clause. In a real query, you can apply a dynamic WHERE condition in the materialized view and perform re-aggregation to obtain the GROUP BY results on thel_returnflag and l_linestatus
fields.Note The AVG function does not support re-aggregation. Therefore, query rewrite uses the expression compensation mechanism to create the SUM and COUNT functions that support re-aggregation in the materialized view. The compensation mechanism is implemented based on the following formula:AVG = SUM/COUNT
. - WHERE condition range matching
If no WHERE condition is specified when you create a materialized view, query rewrite automatically supplements a condition. However, the dynamic condition must be in the range of
60 to 120
. When you create a materialized view, you can narrow the range of the WHERE condition down to 60. This way, the materialized view can use a condition in the range of 60 to 120. Query rewrite checks whether the condition contained in the actual query statement is a subset of the condition contained in the materialized view. If yes, query rewrite supplements the required condition.
Execute the following statement to view the use of the materialized view in the query plan:
EXPLAIN SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '100' day
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus
LIMIT 1;
The following sample result is returned:
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Limit (cost=1.01..1.13 rows=1 width=234)
-> Gather Motion 3:1 (slice1; segments: 3) (cost=1.01..1.13 rows=1 width=234)
Merge Key: l_returnflag, l_linestatus
-> Limit (cost=1.01..1.11 rows=1 width=234)
-> GroupAggregate (cost=1.01..1.11 rows=1 width=234)
Group Key: l_returnflag, l_linestatus
-> Sort (cost=1.01..1.01 rows=1 width=194)
Sort Key: l_returnflag, l_linestatus
-> Seq Scan on q1_mv (cost=0.00..1.00 rows=1 width=194)
Filter: (l_shipdate <= '1998-08-23 00:00:00'::timestamp without time zone)
Optimizer: Postgres query optimizer
(11 rows)
Test results
The TPC-H Q1 query speed is significantly improved by the use of the real-time materialized view and query rewrite. A TPC-H test of 1 TB data is performed on an AnalyticDB for PostgreSQL instance that contains 16 compute nodes. After real-time materialized views and query rewrite are used for the TPC-H Q1 query, the query speed improves by nearly 10,000 times and the query time decreases significantly from 340 seconds to 0.04 seconds.