Real-time materialized view - Hologres - Alibaba Cloud Documentation Center

Background information

You do not need to manually refresh materialized views in Hologres. When data is written to the base table, the data is aggregated to materialized views in real time and can immediately be queried by using the materialized views.

The table that is written in real time is called a base table. Users perform INSERT, UPDATE, and DELETE operations on the base table. Materialized views are defined based on aggregation rules. When the base table is modified, the changes are synchronized to the materialized views in real time. Only INSERT operation-related changes can be synchronized.

Limits

To use materialized views, do not perform DELETE or UPDATE operations on a base table. You need to set the appendonly property for the base table. This way, if you perform DELETE or UPDATE operations on the base table, the error Table XXX is append-only is returned. If you use Flink to write data to the base table in real time, you must set the mutateType parameter to InsertOrIgnore.
You cannot create materialized views for an existing table. You need to define materialized views for a table when you create the table.
You can create materialized views only for a single table. Materialized views do not support common table expressions (CTEs), multi-table JOIN operations, subqueries, or WHERE, ORDER BY, LIMIT, and HAVING clauses.
The GROUP BY key and value of a materialized view cannot be expressions. For example, SUM(CASE WHEN COND THEN A ELSE B END), SUM(col1 + col2), and GROUP BY date_trunc('hour', ts) cannot be used as the GROUP BY key or value of a materialized view.
You can create a maximum of 10 materialized views for each base table. More materialized views consume more resources.
If you create a materialized view for a partitioned table, the GROUP BY key of the materialized view must contain the partition key column of the partitioned table. You can create a materialized view only for a parent partitioned table, not child partitioned tables.
If you create a materialized view for a partitioned table, you cannot execute the ATTACH PARTITION statement to attach a partition to the parent partitioned table. However, you can execute the CREATE TABLE PARTITION OF statement to create a partition.
After you create materialized views for a base table, you cannot execute the DROP COLUMN statement to delete columns from the base table.
The underlying data of a materialized view has the same time to live (TTL) as the base table for which the materialized view is created. Do not manually set a TTL for the materialized view. Otherwise, the data in the materialized view may become inconsistent with that in the base table.

Supported aggregate functions

Materialized views support the following aggregate functions:

SUM
COUNT
AVG
MIN
MAX
RB_BUILD_CARDINALITY_AGG: Only the BIGINT data type is supported. In addition, you must create the RoaringBitmap extension.

Sample SQL statements

Create a materialized view

BEGIN;
CREATE TABLE base_sales(
  day text not null,
  hour int ,
  ts timestamptz,
  amount float,
  pk text not null primary key
);
CALL SET_TABLE_PROPERTY('base_sales', 'mutate_type', 'appendonly');

-- After the materialized view is deleted, you can execute the following statement to cancel the appendonly property for the base table:
--CALL SET_TABLE_PROPERTY('base_sales', 'mutate_type', 'none');

CREATE MATERIALIZED VIEW mv_sales AS
  SELECT
    day,
    hour,
    avg(amount) AS amount_avg
  FROM base_sales
  GROUP BY day, hour;

COMMIT;

insert into base_sales values(to_char(now(),'YYYYMMDD'),'12',now(),100,'pk1');
insert into base_sales values(to_char(now(),'YYYYMMDD'),'12',now(),200,'pk2');
insert into base_sales values(to_char(now(),'YYYYMMDD'),'12',now(),300,'pk3');

Create a materialized view for a partitioned table

BEGIN;
CREATE TABLE base_sales_p(
  day text not null,
  hour int,
  ts timestamptz,
  amount float,
  pk text not null,
  primary key (day, pk)
) partition by list(day);
CALL SET_TABLE_PROPERTY('base_sales_p', 'mutate_type', 'appendonly');

-- day is a partition key column. It must be included in the condition of the GROUP BY clause of the materialized view.
CREATE MATERIALIZED VIEW mv_sales_p AS
  SELECT
    day,
    hour,
    avg(amount) AS amount_avg
  FROM base_sales_p
  GROUP BY day, hour;
COMMIT;

create table base_sales_20220101 partition of base_sales_p for values in('20220101');

Query a materialized view

SELECT * FROM mv_sales WHERE day = to_char(now(),'YYYYMMDD') AND hour = 12;

Delete a materialized view
```
DROP MATERIALIZED VIEW mv_sales;
```
Query the storage space occupied by a materialized view
```
select pg_relation_size('mv_sales');
```

Query the storage space occupied by all materialized views

SELECT schemaname || '.' || matviewname AS mv_full_name,
pg_size_pretty(pg_relation_size('"' || schemaname || '"."' || matviewname || '"')) AS mv_size,
pg_relation_size('"' || schemaname || '"."' || matviewname || '"') AS  order_size
FROM pg_matviews
ORDER BY order_size DESC;

Improve the performance of precise UV computing by using materialized views

Precise unique visitor (UV) computing is an operator with high computational complexity and is usually a system performance bottleneck. Hologres supports the RB_BUILD_CARDINALITY_AGG aggregate function. By using the RoaringBitmap data structure, Hologres can pre-aggregate data of the BIGINT type into materialized views. The BIGINT type is usually used by business ID fields. This implements real-time deduplication of UV statistics. You can create materialized views by executing the following statements. Only fields of the BIGINT type are supported for aggregation and duplication.

--UV computing depends on the RoaringBitmap data type. You must create the RoaringBitmap extension in advance.
CREATE EXTENSION if not exists roaringbitmap;

BEGIN;
CREATE TABLE base_sales_r(
  day text not null,
  hour int ,
  ts timestamptz,
  amount float,
  userid bigint,
  pk text not null primary key
);
CALL SET_TABLE_PROPERTY('base_sales_r', 'mutate_type', 'appendonly');

CREATE MATERIALIZED VIEW mv_sales_r AS
  SELECT
    day,
    hour,
    avg(amount) AS amount_avg,
    rb_build_cardinality_agg(userid) as user_count
  FROM base_sales_r
  GROUP BY day, hour;

COMMIT;

insert into base_sales_r values(to_char(now(),'YYYYMMDD'),'12',now(),100,1,'pk1');
insert into base_sales_r values(to_char(now(),'YYYYMMDD'),'12',now(),200,2,'pk2');
insert into base_sales_r values(to_char(now(),'YYYYMMDD'),'12',now(),300,3,'pk3');
select user_count as UV from mv_sales_r where day = to_char(now(),'YYYYMMDD') AND hour = 12;

The RB_BUILD_CARDINALITY_AGG aggregate function is used to compute the UV value. In the mv_sales_r view, the user_count field stores the UV value for the userid field. You can query the user_count field to obtain the UV value.

Perform multi-dimensional aggregate queries by using materialized views

Assume that you have executed the preceding sample statement to create the mv_sales materialized view. The following table describes the data that the base_sales base table contains.


Day	Hour	Amount	PK
20210101	12	2	pk1
20210101	12	4	pk2
20210101	13	6	pk3

The following result is returned if you directly query the sales_mv materialized view:

postgres=> select * from mv_sales;
    day    |   hour  |   amount_avg
-----------+---------+--------------
  20210101 |    12   |     3
  20210101 |    13   |     6

If you query the materialized view from an aggregation dimension, incorrect results are returned. For example, if you use the AVG function to aggregate data in the materialized view by day, an incorrect result is returned. This is because the average of average values does not equal the average of original values.

postgres=> select day, avg(amount_avg) from mv_sales group by day;
    day    |   avg
-----------+--------
  20210101 |   4.5

In this case, you can create another materialized view that is aggregated by day. However, this will cause the number of materialized views to expand. Hologres allows you to use only one materialized view to implement aggregate queries in different dimensions based on the aggregation result in the intermediate state. In the following example, the AVG aggregate function is used. Execute the following modified DDL statements to create the materialized view:

BEGIN;
CREATE TABLE base_sales(
  day text not null,
  hour int ,
  ts timestamptz,
  amount float,
  pk text not null primary key
);
CALL SET_TABLE_PROPERTY('base_sales', 'mutate_type', 'appendonly');

CREATE MATERIALIZED VIEW mv_sales_partial AS
  SELECT
    day,
    hour,
    avg(amount) as avg,
    avg_partial(amount) AS amt_avg_partial
  FROM base_sales
  GROUP BY day, hour;

COMMIT;

As shown in the statements, the AVG aggregate function is replaced with the AVG_PARTIAL aggregate function. The amount_avg_partial column stores the aggregation result in the intermediate state. You can use the AVG_FINAL aggregate function to query the final aggregation of the aggregate result in the intermediate state.

postgres=> select day, avg(avg) as avg_avg, avg_final(amt_avg_partial) as real_avg from mv_sales_partial group by day;
    day    |   avg_avg |  real_avg
-----------+-----------+----------
  20210101 |    4.5    |     4

The following table describes the aggregate functions and their partial and final aggregate functions that are supported.


Aggregate function	Partial aggregate function	Final aggregate function
AVG	AVG_PARTIAL	AVG_FINAL
RB_BUILD_CARDINALITY_AGG	RB_BUILD_AGG	RB_OR_CARDINALITY_AGG

TTL description

If a TTL is set for a base table and a materialized view is created, Hologres cannot ensure the consistency between the query results if you query the base table and the materialized view for data that is about to expire. The results of querying data that is about to expire from materialized views are undefined. In the following example, the base_sales_table base table and the sales_mv materialized view are used.

A TTL is set for the base_sales_table base table, and some data has expired. If you query the base table, the following query result is returned:

postgres=> SELECT
    day,
    hour,
    avg(amount) AS amount_avg
  FROM base_sales
  GROUP BY day, hour;

--Query result
    day    |   hour  |   amount_avg
-----------+---------+--------------
  20210101 |    12   |     4
  20210101 |    13   |     6

However, the expired data has been materialized into the materialized view. The following query result may be returned if you query the materialized view:

postgres=> select * from mv_sales;
--Query result
    day    |   hour  |   amount_avg
-----------+---------+--------------
  20210101 |    12   |     3
  20210101 |    13   |     6

In this case, the query results are inconsistent. We recommend that you make the following improvements:

Do not set a TTL for the base table.
If a TTL is set for the base table and a time field is contained in the GROUP BY key of a materialized view, do not query the materialized view for data that is about to expire.
Create the base table as a partitioned table. Do not set a TTL for the base table. Delete expired data by deleting child partitioned tables.

Best practices for using real-time materialized views

When you create a table, we recommend that you set the GROUP BY key of a materialized view to the distribution key of the base table. This increases the data compression ratio and improves query performance.
We recommend that you put the filter conditions that are commonly used in queries of materialized views in the front of the GROUP BY key based on the leftmost matching principle of the clustering key.

Intelligent routing for materialized views

You do not need to explicitly specify the materialized view to query. You can perform queries based on the base table as before. If matching materialized views exist, the optimizer intelligently routes queries to the most appropriate materialized view to accelerate the queries. Hologres selects a materialized view based on the following rules:

Hologres selects the materialized view that contains all queried columns or the columns that can be used to calculate the values of the queried columns.
Hologres selects the materialized view whose GROUP BY key contains all the columns specified in the GROUP BY key of the original query.
If multiple materialized views meet the requirements, Hologres selects the materialized view that contains the fewest fields in the GROUP BY key.

The aggregate functions that support intelligent routing include SUM, COUNT, AVG, MIN, and MAX.