When you need to optimize repetitive and time-consuming complex queries, you can use the synchronous materialized view feature in ApsaraDB for SelectDB to cache data from complex queries, enabling fast access and efficient data processing.
Overview
A synchronous materialized view is a pre-computed dataset based on a defined SELECT statement and stored in a special table in SelectDB. During queries, the system automatically matches the most optimal synchronous materialized view and reads data directly from it. Synchronous materialized views have the following characteristics:
Can reduce query response time.
Synchronous materialized views reduce query response time by pre-storing computation results.
Automatic synchronization mechanism requires no manual data maintenance.
Data remains consistent with base tables in real time.
In data insertion, update, or deletion scenarios, SelectDB ensures strong consistency between base tables and synchronous materialized views, maintaining the validity of materialized view data through a built-in incremental update mechanism.
Notes
Usage notes:
Direct querying is not supported.
Unlike asynchronous materialized views, you cannot directly query synchronous materialized views. Query statements still query the base table. When SelectDB receives a query statement, it automatically selects the most optimal materialized view, reads data from the materialized view, and performs calculations. For more information, see Query automatic matching.
In the Unique model, you cannot perform coarse-grained aggregation operations on data by creating synchronous materialized views.
This is because synchronous materialized views for the Unique data model can only change column order and cannot serve an aggregation purpose.
Performance impact: Too many synchronous materialized views on a single table will affect data import efficiency.
During data import, synchronous materialized views and base table data are updated synchronously.
For example: If more than 10 synchronous materialized view tables are created for one table, the import speed may become very slow. This is because a single import operation needs to import data into more than 10 tables simultaneously.
Use cases
Accelerate time-consuming aggregation operations.
Queries need to match different prefix indexes.
Reduce the amount of data to be scanned through pre-filtering.
Accelerate queries by pre-completing complex expression calculations.
Create materialized views
Principles
Before creating a materialized view, you need to determine the type of materialized view to create based on the characteristics of the query statement. Having a materialized view definition that exactly matches a query statement is not always the ideal situation. The following are two principles for creating materialized views.
Abstract common grouping and aggregation methods from multiple queries as the definition of the materialized view.
An abstracted materialized view that can be matched by multiple queries is most effective. This is because maintaining materialized views also consumes resources. If a materialized view only matches a specific query and cannot be used by other queries, its cost-effectiveness will be low, as it occupies cluster storage resources while not being useful for more queries.
You do not need to create materialized views for all dimension combinations.
In actual analytical queries, not all dimension analyses are covered. Therefore, creating materialized views only for commonly used dimension combinations can achieve a balance between space and time.
Notes
SELECT statement
Only supports SELECT statements based on a single table.
SELECT columns must not include auto-increment columns, constants, duplicate expressions, or window functions.
If SELECT columns include partition key columns and bucketing columns of the created table, these columns must be used as Key columns in the materialized view.
Clauses
WHERE, GROUP BY, and ORDER BY clauses are allowed, but JOIN, HAVING, LIMIT, and LATERAL VIEW are prohibited.
Supported aggregate functions
ImportantAggregate functions parameters do not support expressions and only support single columns. For example:
<span style="color:rgb(51, 65, 85)">sum(a)</span>is supported, but<span style="color:rgb(51, 65, 85)">sum(a+b)</span>is not supported.Different aggregate functions are not allowed on the same column. For example:
<span style="color:rgb(51, 65, 85)">select sum(a), min(a) from table</span>is not supported.
Currently, the following aggregate functions are supported in statements for creating synchronous materialized views:
SUM, MIN, MAX.
COUNT.
BITMAP_UNION function supports creating materialized views only in the following two cases:
When using the format
<span style="color:rgb(51, 65, 85);background-color:rgba(0,0,0,var(--tw-bg-opacity))">BITMAP_UNION(TO_BITMAP(COLUMN))</span>,the column (COLUMN) type can only be integers, excluding the<span style="color:rgb(51, 65, 85)">largeint</span>type.When using the format
<span style="color:rgb(51, 65, 85);background-color:rgba(0,0,0,var(--tw-bg-opacity))">BITMAP_UNION(COLUMN)</span>, the base table must be an Aggregate model.
HLL_UNION function supports creating materialized views only in the following two cases:
When using the format
<span style="color:rgb(51, 65, 85);background-color:rgba(0,0,0,var(--tw-bg-opacity))">HLL_UNION(HLL_HASH(COLUMN))</span>, the column (COLUMN) type cannot be<span style="color:rgb(51, 65, 85)">DECIMAL</span>.When using the format
<span style="color:rgb(51, 65, 85);background-color:rgba(0,0,0,var(--tw-bg-opacity))">HLL_UNION(COLUMN)</span>,the base table must be an Aggregate model.
Syntax
CREATE MATERIALIZED VIEW <mv_name> as <query>
[PROPERTIES ("key" = "value")]Parameter description
Parameter name | Required | Description |
mv_name | Yes | The name of the synchronous materialized view. Materialized view names for the same base table cannot be duplicated. |
query | Yes | The query statement used to build the synchronous materialized view. The result of the query statement is the data in the materialized view. For more information, see query parameter description below. |
properties | No | Declares the related configuration of the materialized view. The format is as follows. The following configurations can be defined in this property. |
The query parameter description is as follows.
Currently supported query format is as follows.
SELECT select_expr[, select_expr ...]FROM <base_view_name>GROUP BY column_name[, column_name ...]ORDER BY column_name[, column_name ...]Parameter description is as follows.
Parameter name | Required | Description |
select_expr | Yes | All columns in the synchronous materialized view schema. Must include at least one single column. |
base_view_name | Yes | The base table of the synchronous materialized view. Must be a single table, not a subquery. |
group by | No | The grouping columns of the synchronous materialized view. If not specified, data is not grouped. |
order by | No | The sorting of the synchronous materialized view.
|
Examples
Prepare an example base table.
Create an example table
<span style="color:rgb(64, 64, 64)">duplicate_table</span>, as shown below.CREATE TABLE duplicate_table( k1 int null, k2 int null, k3 bigint null, k4 bigint null ) DUPLICATE KEY (k1,k2,k3,k4) DISTRIBUTED BY HASH(k4) BUCKETS 3;Query the table structure of the example table
<span style="color:rgb(64, 64, 64)">duplicate_table</span>as follows.DESC duplicate_table; +-------+--------+------+------+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------+------+------+---------+-------+ | k1 | INT | Yes | true | N/A | | | k2 | INT | Yes | true | N/A | | | k3 | BIGINT | Yes | true | N/A | | | k4 | BIGINT | Yes | true | N/A | | +-------+--------+------+------+---------+-------+Create a synchronous materialized view.
Create a materialized view that only contains the original table's (k1, k2) columns, as shown below.
CREATE MATERIALIZED VIEW k1_k2 AS SELECT k2, k1 FROM duplicate_table;The schema of the materialized view is as follows. The materialized view only contains two columns k1, k2 and does not include any aggregation.
+-----------------+-------+--------+------+------+---------+-------+ | IndexName | Field | Type | Null | Key | Default | Extra | +-----------------+-------+--------+------+------+---------+-------+ | k2_k1 | k2 | INT | Yes | true | N/A | | | | k1 | INT | Yes | true | N/A | | +-----------------+-------+--------+------+------+---------+-------+Create a synchronous materialized view with k2 as the sorting column, as shown below.
CREATE MATERIALIZED VIEW k2_order AS SELECT k2, k1 FROM duplicate_table ORDER BY k2;The schema of the materialized view is as follows. The materialized view only contains two columns k2, k1, where k2 is the sorting column and does not include any aggregation.
+-----------------+-------+--------+------+-------+---------+-------+ | IndexName | Field | Type | Null | Key | Default | Extra | +-----------------+-------+--------+------+-------+---------+-------+ | k2_order | k2 | INT | Yes | true | N/A | | | | k1 | INT | Yes | false | N/A | NONE | +-----------------+-------+--------+------+-------+---------+-------+Create a materialized view grouped by k1, k2, with SUM aggregation on the k3 column, as shown below.
CREATE MATERIALIZED VIEW k1_k2_sumk3 AS SELECT k1, k2, sum(k3) FROM duplicate_table GROUP BY k1, k2;The schema of the materialized view is as follows. It contains two columns k1, k2, sum(k3). Here, k1, k2 are grouping columns, and sum(k3) is the sum of the k3 column after grouping by k1, k2. Since the materialized view does not declare sorting columns and includes aggregated data, the system automatically supplements the grouping columns k1, k2 as sorting columns.
+-----------------+-------+--------+------+-------+---------+-------+ | IndexName | Field | Type | Null | Key | Default | Extra | +-----------------+-------+--------+------+-------+---------+-------+ | k1_k2_sumk3 | k1 | INT | Yes | true | N/A | | | | k2 | INT | Yes | true | N/A | | | | k3 | BIGINT | Yes | false | N/A | SUM | +-----------------+-------+--------+------+-------+---------+-------+
Query materialized views
Query all materialized views created based on a specific table.
Syntax
DESC <table_name> ALL;Parameter description
table_name is the base table of the materialized view.
Example
Query materialized views with <span style="color:rgb(64, 64, 64)">duplicate_table</span> as the base table.
DESC duplicate_table ALL;The query result is as follows.
+-----------------+---------------+---------------+--------+--------------+------+-------+---------+-------+---------+------------+-------------+
| IndexName | IndexKeysType | Field | Type | InternalType | Null | Key | Default | Extra | Visible | DefineExpr | WhereClause |
+-----------------+---------------+---------------+--------+--------------+------+-------+---------+-------+---------+------------+-------------+
| duplicate_table | DUP_KEYS | k1 | INT | INT | Yes | true | NULL | | true | | |
| | | k2 | INT | INT | Yes | true | NULL | | true | | |
| | | k3 | BIGINT | BIGINT | Yes | true | NULL | | true | | |
| | | k4 | BIGINT | BIGINT | Yes | true | NULL | | true | | |
| | | | | | | | | | | | |
| k2_order | DUP_KEYS | mv_k2 | INT | INT | Yes | true | NULL | | true | `k2` | |
| | | mv_k1 | INT | INT | Yes | false | NULL | NONE | true | `k1` | |
| | | | | | | | | | | | |
| k1_k2 | DUP_KEYS | mv_k2 | INT | INT | Yes | true | NULL | | true | `k2` | |
| | | mv_k1 | INT | INT | Yes | true | NULL | | true | `k1` | |
| | | | | | | | | | | | |
| k1_k2_sumk3 | AGG_KEYS | mv_k1 | INT | INT | Yes | true | NULL | | true | `k1` | |
| | | mv_k2 | INT | INT | Yes | true | NULL | | true | `k2` | |
| | | mva_SUM__`k3` | BIGINT | BIGINT | Yes | false | NULL | SUM | true | `k3` | |
+-----------------+---------------+---------------+--------+--------------+------+-------+---------+-------+---------+------------+-------------+Based on the query results, the example table <span style="color:rgb(51, 65, 85);background-color:rgba(0,0,0,var(--tw-bg-opacity))">duplicate_table</span> has a total of three materialized views: k1_k2, k2_order and k1_k2_sumk3, along with their table structures.
Delete materialized views
If you no longer need a synchronous materialized view, you can delete it or cancel its creation through commands.
Delete an incomplete materialized view
If you want to cancel an incomplete materialized view creation, you can use the following command.
Syntax
CANCEL ALTER TABLE MATERIALIZED VIEW FROM <database>.<table_name>;Parameter description
Parameter name
Required
Description
database
Yes
The database name to which the materialized view to be deleted belongs.
table_name
Yes
The table name to which the materialized view to be deleted belongs.
Example
Cancel all incomplete materialized views for the base table duplicate_table.
CANCEL ALTER TABLE MATERIALIZED VIEW FROM test_db.duplicate_table;If the materialized view has already been created, it cannot be canceled using this command, but it can be deleted using the delete command.
Delete a completed materialized view
If the materialized view has already been created, you can delete it using the delete command.
Syntax
DROP MATERIALIZED VIEW [IF EXISTS] <mv_name> ON <table_name>;Parameter description
Parameter name
Required
Description
IF EXISTS
No
If the materialized view does not exist, do not throw an error. If this keyword is not specified, an error will be reported if the materialized view does not exist.
mv_name
Yes
The name of the materialized view to be deleted.
table_name
Yes
The base table of the materialized view to be deleted.
Example
View the materialized views of the base table
<span style="color:rgb(64, 64, 64)">duplicate_table</span>and their table structures, as shown below.DESC duplicate_table ALL;The query result is as follows.
+-----------------+---------------+-------+--------+--------------+------+------+---------+-------+---------+------------+-------------+ | IndexName | IndexKeysType | Field | Type | InternalType | Null | Key | Default | Extra | Visible | DefineExpr | WhereClause | +-----------------+---------------+-------+--------+--------------+------+------+---------+-------+---------+------------+-------------+ | duplicate_table | DUP_KEYS | k1 | INT | INT | Yes | true | NULL | | true | | | | | | k2 | INT | INT | Yes | true | NULL | | true | | | | | | k3 | BIGINT | BIGINT | Yes | true | NULL | | true | | | | | | k4 | BIGINT | BIGINT | Yes | true | NULL | | true | | | | | | | | | | | | | | | | | k1_k2 | DUP_KEYS | mv_k2 | INT | INT | Yes | true | NULL | | true | `k2` | | | | | mv_k1 | INT | INT | Yes | true | NULL | | true | `k1` | | +-----------------+---------------+-------+--------+--------------+------+------+---------+-------+---------+------------+-------------+Delete the materialized view named
<span style="color:rgb(64, 64, 64)">k1_k2</span>from the base table<span style="color:rgb(64, 64, 64)">duplicate_table</span>.DROP MATERIALIZED VIEW k1_k2 ON duplicate_table;After deleting the materialized view, view the materialized views of the base table
<span style="color:rgb(64, 64, 64)">duplicate_table</span>and their table structures, as shown below.DESC duplicate_table ALL;The query result is as follows.
+-----------------+---------------+-------+--------+--------------+------+------+---------+-------+---------+------------+-------------+ | IndexName | IndexKeysType | Field | Type | InternalType | Null | Key | Default | Extra | Visible | DefineExpr | WhereClause | +-----------------+---------------+-------+--------+--------------+------+------+---------+-------+---------+------------+-------------+ | duplicate_table | DUP_KEYS | k1 | INT | INT | Yes | true | NULL | | true | | | | | | k2 | INT | INT | Yes | true | NULL | | true | | | | | | k3 | BIGINT | BIGINT | Yes | true | NULL | | true | | | | | | k4 | BIGINT | BIGINT | Yes | true | NULL | | true | | | +-----------------+---------------+-------+--------+--------------+------+------+---------+-------+---------+------------+-------------+
View status
Creating a synchronous materialized view is an asynchronous operation. After you submit the creation task, SelectDB calculates the historical data in the background until the creation is successful. You can check whether the materialized view has been built using the command.
Syntax
SHOW ALTER TABLE MATERIALIZED VIEW FROM <database>Parameter description
database is the database to which the base table of the synchronous materialized view belongs. The result of this command will display all materialized view creation tasks for that database.
Example
View the materialized view creation task information in the test_db database.
SHOW ALTER TABLE MATERIALIZED VIEW FROM test_db;
-- Result
+--------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+----------+------+----------+---------+
| JobId | TableName | CreateTime | FinishTime | BaseIndexName | RollupIndexName | RollupId | TransactionId | State | Msg | Progress | Timeout |
+--------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+----------+------+----------+---------+
| 494349 | sales_records | 2020-07-30 20:04:56 | 2020-07-30 20:04:57 | sales_records | store_amt | 494350 | 133107 | FINISHED | | NULL | 2592000 |
+--------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+----------+------+----------+---------+The result description is as follows.
Field name | Description |
| Refers to the source table of the materialized view data. |
| Base table name. |
| The name of the materialized view. |
| Task status.
When the State becomes FINISHED, the materialized view has been created successfully. |
View statements
View the statement used to create a materialized view through a command.
Syntax
This syntax cannot query materialized views that have already been deleted.
SHOW CREATE MATERIALIZED VIEW <mv_name> ON <table_name>Parameter description
Parameter name | Required | Description |
mv_name | Yes | The name of the materialized view. |
table_name | Yes | The base table of the materialized view. |
Example
Prepare an example materialized view with the following creation statement.
CREATE MATERIALIZED VIEW id_col1 AS SELECT id,col1 FROM table3;View the creation statement of the created materialized view, as shown below.
SHOW CREATE MATERIALIZED VIEW id_col1 ON table3;The query result is as follows.
SHOW CREATE MATERIALIZED VIEW id_col1 on table3; +-----------+----------+----------------------------------------------------------------+ | TableName | ViewName | CreateStmt | +-----------+----------+----------------------------------------------------------------+ | table3 | id_col1 | create materialized view id_col1 as select id,col1 from table3 | +-----------+----------+----------------------------------------------------------------+ 1 row in set (0.00 sec)
Query automatic matching
After a materialized view is created successfully, queries do not need to change at all and still query the base table. SelectDB will automatically select the most optimal materialized view based on the current query statement, read data from the materialized view, and perform calculations.
You can use the <span style="color:rgb(64, 64, 64)">EXPLAIN</span> command to check whether the current query uses a materialized view. For more information about <span style="color:rgb(64, 64, 64)">EXPLAIN</span>, see Query Explain.
The matching relationship between materialized view aggregation and query aggregation is as follows.
Materialized view aggregation | Query aggregation |
sum | sum |
min | min |
max | max |
count | count |
bitmap_union | bitmap_union, bitmap_union_count, count(distinct) |
hll_union | hll_raw_agg, hll_union_agg, ndv, approx_count_distinct |
When the aggregate functions of <span style="color:rgb(51, 65, 85)">bitmap</span> and <span style="color:rgb(51, 65, 85)">hll</span> match a materialized view during a query, the aggregation operator of the query will be rewritten based on the table schema of the materialized view.
Complete usage example
Using materialized views involves the following three steps.
Create a materialized view.
Asynchronously check if the materialized view construction is complete.
Query and automatically match the materialized view.
Example background
For example, in a business scenario that calculates ad PV and UV, the original ad click data is stored in SelectDB. Therefore, ad PV and UV queries can be accelerated by creating a materialized view with aggregation type <span style="color:rgb(64, 64, 64)">bitmap_union</span>.
Preparations: Design the base table
Create a table <span style="color:rgb(64, 64, 64)">advertiser_view_record</span> to store ad click data details. Fields include the time of each ad click, the clicked ad, the channel through which the click occurred, and the user who clicked. The specific statement is as follows.
CREATE TABLE advertiser_view_record(
time date,
advertiser varchar(10),
channel varchar(10),
user_id int
)
DISTRIBUTED BY HASH(time);Query the table structure of the original ad click data table, as shown below.
DESC advertiser_view_record ALL;
+------------------------+---------------+------------+-------------+--------------+------+-------+---------+-------+---------+------------+-------------+
| IndexName | IndexKeysType | Field | Type | InternalType | Null | Key | Default | Extra | Visible | DefineExpr | WhereClause |
+------------------------+---------------+------------+-------------+--------------+------+-------+---------+-------+---------+------------+-------------+
| advertiser_view_record | DUP_KEYS | time | DATE | DATEV2 | Yes | true | NULL | | true | | |
| | | advertiser | VARCHAR(10) | VARCHAR(10) | Yes | true | NULL | | true | | |
| | | channel | VARCHAR(10) | VARCHAR(10) | Yes | false | NULL | NONE | true | | |
| | | user_id | INT | INT | Yes | false | NULL | NONE | true | | |
+------------------------+---------------+------------+-------------+--------------+------+-------+---------+-------+---------+------------+-------------+
4 rows in set (0.02 sec)Step 1: Create a materialized view
Design analysis.
To query ad UV values, you need to perform exact deduplication of users for the same ad. The usual query method is as follows.
SELECT advertiser, channel, COUNT(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel;For this type of UV query scenario, you can create a materialized view with
<span style="color:rgb(64, 64, 64)">bitmap_union</span>to achieve pre-exact deduplication. In SelectDB, the results of<span style="color:rgb(64, 64, 64)">count(distinct)</span>aggregation and<span style="color:rgb(64, 64, 64)">bitmap_union_count</span>aggregation are completely consistent. And<span style="color:rgb(64, 64, 64)">bitmap_union_count</span>equals counting the result of<span style="color:rgb(64, 64, 64)">bitmap_union</span>. So if a query involves<span style="color:rgb(64, 64, 64)">count(distinct)</span>, query speed can be accelerated by creating a materialized view with<span style="color:rgb(64, 64, 64)">bitmap_union</span>aggregation. For this case, you can create a materialized view that groups by ad and channel and performs exact deduplication on the<span style="color:rgb(64, 64, 64)">user_id</span>column.Create a materialized view.
Based on the above analysis, create a materialized view that groups by advertisement and channel, and performs exact deduplication on
<span style="color:rgb(64, 64, 64)">user_id</span>. The statement is as follows:CREATE MATERIALIZED VIEW advertiser_uv AS SELECT advertiser, channel, bitmap_union(to_bitmap(user_id)) FROM advertiser_view_record GROUP BY advertiser, channel; Query OK, 0 rows affected (0.012 sec)NoteBecause user_id is an INT type, in SelectDB you need to first transform the field into bitmap type using the function
<span style="color:rgb(64, 64, 64)">to_bitmap</span>before you can perform<span style="color:rgb(64, 64, 64)">bitmap_union</span>aggregation.After creating the materialized view, query the materialized views of the ad click detail table and their table structures, as shown below.
DESC advertiser_view_record ALL; +------------------------+---------------+-------------------------------------------------------------------+-------------+--------------+------+-------+---------+--------------+---------+-------------------------------------------------+-------------+ | IndexName | IndexKeysType | Field | Type | InternalType | Null | Key | Default | Extra | Visible | DefineExpr | WhereClause | +------------------------+---------------+-------------------------------------------------------------------+-------------+--------------+------+-------+---------+--------------+---------+-------------------------------------------------+-------------+ | advertiser_view_record | DUP_KEYS | time | DATE | DATEV2 | Yes | true | NULL | | true | | | | | | advertiser | VARCHAR(10) | VARCHAR(10) | Yes | true | NULL | | true | | | | | | channel | VARCHAR(10) | VARCHAR(10) | Yes | false | NULL | NONE | true | | | | | | user_id | INT | INT | Yes | false | NULL | NONE | true | | | | | | | | | | | | | | | | | advertiser_uv | AGG_KEYS | mv_advertiser | VARCHAR(*) | VARCHAR(*) | Yes | true | NULL | | true | `advertiser` | | | | | mv_channel | VARCHAR(*) | VARCHAR(*) | Yes | true | NULL | | true | `channel` | | | | | mva_BITMAP_UNION__to_bitmap_with_check(CAST(`user_id` AS BIGINT)) | BITMAP | BITMAP | No | false | NULL | BITMAP_UNION | true | to_bitmap_with_check(CAST(`user_id` AS BIGINT)) | | +------------------------+---------------+-------------------------------------------------------------------+-------------+--------------+------+-------+---------+--------------+---------+-------------------------------------------------+-------------+ 8 rows in set (0.03 sec)
Step 2: Check if the view construction is complete
Use the command View status. When the status is FINISHED, the materialized view has been created successfully.
Step 3: Query automatic matching
Execute the query statement.
When the materialized view table is created, when querying ad UV, SelectDB will automatically query data from the created materialized view
<span style="color:rgb(64, 64, 64)">advertiser_uv</span>. Example as follows.SELECT advertiser, channel, COUNT(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel;After hitting the materialized view, SelectDB implicitly transforms the actual query into the following query.
SELECT advertiser, channel, bitmap_union_count(to_bitmap(user_id)) FROM advertiser_uv GROUP BY advertiser, channel;Verify whether the query is using the materialized view.
You can use the EXPLAIN command to verify whether SelectDB has matched the materialized view, as shown below.
EXPLAIN SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel; +-------------------------------------------------------------------------------------------------------------------------------------------------+ | Explain String | +-------------------------------------------------------------------------------------------------------------------------------------------------+ | PLAN FRAGMENT 0 | | OUTPUT EXPRS: | | advertiser[#13] | | channel[#14] | | count(DISTINCT user_id)[#15] | | PARTITION: UNPARTITIONED | | | | VRESULT SINK | | | | 4:VEXCHANGE | | offset: 0 | | | | PLAN FRAGMENT 1 | | | | PARTITION: HASH_PARTITIONED: mv_advertiser[#7], mv_channel[#8] | | | | STREAM DATA SINK | | EXCHANGE ID: 04 | | UNPARTITIONED | | | | 3:VAGGREGATE (merge finalize) | | | output: bitmap_union_count(partial_bitmap_union_count(mva_BITMAP_UNION__to_bitmap_with_check(cast(user_id as BIGINT)))[#9])[#12] | | | group by: mv_advertiser[#7], mv_channel[#8] | | | cardinality=1 | | | projections: mv_advertiser[#10], mv_channel[#11], bitmap_union_count(mva_BITMAP_UNION__to_bitmap_with_check(cast(user_id as BIGINT)))[#12] | | | project output tuple id: 4 | | | | | 2:VEXCHANGE | | offset: 0 | | | | PLAN FRAGMENT 2 | | | | PARTITION: HASH_PARTITIONED: time[#3] | | | | STREAM DATA SINK | | EXCHANGE ID: 02 | | HASH_PARTITIONED: mv_advertiser[#7], mv_channel[#8] | | | | 1:VAGGREGATE (update serialize) | | | STREAMING | | | output: partial_bitmap_union_count(mva_BITMAP_UNION__to_bitmap_with_check(cast(user_id as BIGINT))[#2])[#9] | | | group by: mv_advertiser[#0], mv_channel[#1] | | | cardinality=1 | | | | | 0:VOlapScanNode | | TABLE: default_cluster:test.advertiser_view_record(advertiser_uv), PREAGGREGATION: ON | | partitions=1/1, tablets=10/10, tabletList=13531,13533,13535 ... | | cardinality=1, avgRowSize=2745.0, numNodes=1 | | pushAggOp=NONE | +-------------------------------------------------------------------------------------------------------------------------------------------------+ 49 rows in set (0.11 sec)In the EXPLAIN result, first you can see that the OlapScanNode's
<span style="color:rgb(64, 64, 64)">rollup</span>attribute value is<span style="color:rgb(64, 64, 64)">advertiser_uv</span>. This indicates that the query will directly scan data from the materialized view, confirming a successful match.Additionally, for the
<span style="color:rgb(64, 64, 64)">user_id</span>field,<span style="color:rgb(64, 64, 64)">count(distinct)</span>is rewritten as<span style="color:rgb(64, 64, 64)">bitmap_union_count(to_bitmap)</span>. This indicates that the query will use the Bitmap method to achieve exact deduplication.
FAQ
Q: Error: DATA_QUALITY_ERR: "The data quality does not satisfy, please check your data."
A: Due to data quality issues or schema changes, memory usage exceeds limits, causing materialized view creation to fail. If it's a memory issue, increase the
<span style="color:rgb(64, 64, 64)">memory_limitation_per_thread_for_schema_change_bytes</span>parameter.ImportantBitmap type only supports positive integers. If there are negative numbers in the original data, materialized view creation will fail.
For String type fields, you can use bitmap_hash or bitmap_hash64 to calculate the Hash value and return the bitmap of the Hash value.