This topic describes how to use partitioned materialized views to meet different business requirements.
Background
StarRocks asynchronous materialized views support various partitioning strategies and functions to provide the following benefits:
Incremental build: When configuring a partitioned materialized view, you can flexibly set partition refresh tasks to execute in batches, effectively avoiding resource overload issues caused by parallel processing of all partitions.
Incremental refresh: Configure the refresh policy to update only the materialized view partitions that correspond to changed partitions in the base table. This partition-level approach significantly reduces resource consumption compared to refreshing the entire view.
Partial materialization: You can set TTL for materialized view partitions to achieve partial data materialization.
Transparent query rewrite: Queries can be transparently rewritten based only on the latest materialized view partitions. Expired partitions will not participate in the query plan, and corresponding queries will be executed directly on the base table, ensuring data consistency.
Limits
Partitioned materialized views can only be created on partitioned base tables (typically fact tables). You must define a mapping between the base table's partitions and the materialized view's partitions.
Currently, StarRocks supports building partitioned materialized views on tables in the following data sources:
OLAP tables in StarRocks Default Catalog
Supported partitioning strategies: Range partitioning.
Supported partition key data types: INT, DATE, DATETIME and STRING.
Supported table types: Primary key tables, detail tables, aggregate tables, and update tables.
Supports both storage-compute integrated and storage-compute separated instances.
Tables in Hive Catalog, Hudi Catalog, Iceberg Catalog, and Paimon Catalog
Supported partition level: First-level partitioning only.
Supported partition key data types: INT, DATE, DATETIME, and STRING.
Creating partitioned materialized views on non-partitioned base tables is not supported.
StarRocks OLAP table specific limitations:
List partitioning and expression partitioning are not currently supported.
Two adjacent partitions of the base table must have continuous ranges.
For multi-level partitioned base tables in External Catalog, only the first-level partition path can be used to create materialized views. For example, for tables with the format
yyyyMMdd/hour, only materialized views withyyyyMMddas the partition can be created.Starting from v3.2.3, StarRocks supports creating partitioned materialized views on Iceberg tables that use Partition Transforms, and the materialized view will be partitioned based on the transformed columns.
Use cases
Assume there are three tables with the following structures and partition designs.
CREATE TABLE IF NOT EXISTS par_tbl1 (
datekey DATE, -- DATE type date column used as partition key.
k1 STRING,
v1 INT,
v2 INT
)
ENGINE=olap
PARTITION BY RANGE (datekey) (
START ("2021-01-01") END ("2021-01-04") EVERY (INTERVAL 1 DAY)
)
DISTRIBUTED BY HASH(k1);
CREATE TABLE IF NOT EXISTS par_tbl2 (
datekey STRING, -- STRING type date column used as partition key.
k1 STRING,
v1 INT,
v2 INT
)
ENGINE=olap
PARTITION BY RANGE (str2date(datekey, '%Y-%m-%d')) (
START ("2021-01-01") END ("2021-01-04") EVERY (INTERVAL 1 DAY)
)
DISTRIBUTED BY HASH(k1);
CREATE TABLE IF NOT EXISTS par_tbl3 (
datekey_new DATE, -- Equivalent to par_tbl1.datekey column.
k1 STRING,
v1 INT,
v2 INT
)
ENGINE=olap
PARTITION BY RANGE (datekey_new) (
START ("2021-01-01") END ("2021-01-04") EVERY (INTERVAL 1 DAY)
)
DISTRIBUTED BY HASH(k1);One-to-One partition mapping
You can create a materialized view with partitions that correspond one-to-one with the base table partitions by using the same partition key.
Create materialized views based on the type of partition key:
If the base table partition key is DATE or DATETIME type, you can directly specify the same partition key for the materialized view.
Syntax
PARTITION BY <base_table_partitioning_column>Example
CREATE MATERIALIZED VIEW par_mv1 REFRESH ASYNC PARTITION BY datekey AS SELECT k1, sum(v1) AS SUM, datekey FROM par_tbl1 GROUP BY datekey, k1;
If the base table partition key is STRING type, you can use the str2date function to convert the date string to DATE or DATETIME type.
Syntax
PARTITION BY str2date(<base_table_partitioning_column>, <format>)Example
CREATE MATERIALIZED VIEW par_mv2 REFRESH ASYNC PARTITION BY str2date(datekey, '%Y-%m-%d') AS SELECT k1, sum(v1) AS SUM, datekey FROM par_tbl2 GROUP BY datekey, k1;
Time granularity roll-up aligned partitions
You can create a materialized view with a coarser time granularity partition than the base table by using the date_trunc function on the partition key. When data changes are detected in the base table partitions, StarRocks will refresh the corresponding rolled-up partition in the materialized view.
If the base table partition key is DATE or DATETIME type, you can directly use the date_trunc function on the base table partition key.
Syntax
PARTITION BY date_trunc(<format>, <base_table_partitioning_column>)Example
CREATE MATERIALIZED VIEW par_mv3 REFRESH ASYNC PARTITION BY date_trunc('month', datekey) AS SELECT k1, sum(v1) AS SUM, datekey FROM par_tbl1 GROUP BY datekey, k1;
If the base table partition key is STRING type, you must convert the base table partition key to DATE or DATETIME type in the SELECT List, set an alias for it, and then use the date_trunc function and specify the result as the materialized view partition key.
Syntax
PARTITION BY date_trunc(<format>, <mv_partitioning_column>) AS SELECT str2date(<base_table_partitioning_column>, <format>) AS <mv_partitioning_column>Example
CREATE MATERIALIZED VIEW par_mv4 REFRESH ASYNC PARTITION BY date_trunc('month', mv_datekey) AS SELECT datekey, k1, sum(v1) AS SUM, str2date(datekey, '%Y-%m-%d') AS mv_datekey FROM par_tbl2 GROUP BY datekey, k1;
Custom time granularity aligned partitions
If your business scenario requires partitioning with custom time granularity, you can create a materialized view using the time_slice or date_slice function to define the time granularity of its partitions. Both functions can convert a given time to the start or end time of its time granularity period based on the specified time granularity cycle.
You need to use the time_slice or date_slice function on the base table partition key in the SELECT List to define a new time granularity, set an alias for it, and then specify the materialized view partition key in combination with the date_trunc function to create a materialized view with custom partition time granularity.
Syntax
PARTITION BY date_trunc(<format>, <mv_partitioning_column>) AS SELECT -- You can use either time_slice or date_slice function. date_slice(<base_table_partitioning_column>, <interval>) AS <mv_partitioning_column>Example
CREATE MATERIALIZED VIEW par_mv5 REFRESH ASYNC PARTITION BY date_trunc('day', mv_datekey) AS SELECT k1, sum(v1) AS SUM, date_slice(datekey, INTERVAL 5 HOUR) AS mv_datekey FROM par_tbl1 GROUP BY datekey, k1;
Multi-base table aligned partitions
This scenario is supported from v3.3 onwards.
If multiple base tables can have aligned partitions, meaning the base tables use the same type of partition key, you can create a materialized view based on these tables. You can use JOIN to connect the base tables and set the partition key as the common column. You can also use UNION to connect the base tables. Base tables with aligned partitions are called Reference Tables. Data changes in any Reference Table will trigger a refresh task for the corresponding materialized view partition.
-- Use JOIN to connect base tables.
CREATE MATERIALIZED VIEW par_mv6
REFRESH ASYNC
PARTITION BY datekey
AS SELECT
par_tbl1.datekey,
par_tbl1.k1 AS t1k1,
par_tbl3.k1 AS t2k1,
sum(par_tbl1.v1) AS SUM1,
sum(par_tbl3.v1) AS SUM2
FROM par_tbl1 JOIN par_tbl3 ON par_tbl1.datekey = par_tbl3.datekey_new
GROUP BY par_tbl1.datekey, t1k1, t2k1;
-- Use UNION to connect base tables.
CREATE MATERIALIZED VIEW par_mv7
REFRESH ASYNC
PARTITION BY datekey
AS SELECT
par_tbl1.datekey,
par_tbl1.k1 AS t1k1,
sum(par_tbl1.v1) AS SUM1
FROM par_tbl1
GROUP BY
par_tbl1.datekey,
par_tbl1.k1
UNION ALL
SELECT
par_tbl3.datekey_new,
par_tbl3.k1 AS t2k1,
sum(par_tbl3.v1) AS SUM2
FROM par_tbl3
GROUP BY
par_tbl3.datekey_new,
par_tbl3.k1;Implementing incremental refresh and transparent rewrite
You can create a partitioned materialized view to achieve incremental updates through partition refresh and transparent query rewriting through partial data materialization.
When creating a materialized view, you must consider the following aspects:
Refresh granularity
You can specify the granularity of each refresh operation through the
partition_refresh_numberproperty. This property controls the maximum number of partitions refreshed in a single refresh operation. If the number of partitions to be refreshed exceeds this value, StarRocks automatically splits the refresh task and completes it in batches, with partitions refreshed in order from oldest to newest (excluding future partitions). The default value ofpartition_refresh_numberis -1, indicating that the refresh task is not split.Materialization range
The retention range of materialized data is controlled by the
partition_ttl_number(in versions before v3.1.5) orpartition_ttl(recommended for v3.1.5 and higher versions) property.partition_ttl_numberspecifies the number of latest partitions to retain, whilepartition_ttlsets the time range of materialized view data. In each refresh, StarRocks arranges partitions in chronological order and only retains partitions that meet the TTL requirements.Refresh strategy
Auto-refresh (
REFRESH ASYNC): The materialized view is automatically refreshed when the base table data changes.Scheduled refresh (
REFRESH ASYNC [START (<start_time>)] EVERY (INTERVAL <interval>)): Periodically refreshes the materialized view based on the set interval.NoteAuto-refresh and scheduled refresh are executed automatically after the refresh task is triggered. StarRocks records and compares the data version of each partition in the base table, and a change in the data version indicates that the corresponding data has been updated. Once a data change is detected in the base table partition, StarRocks will refresh the corresponding materialized view partition. If there is no data change, the refresh of that partition is skipped.
Manual refresh (
REFRESH MANUAL): In this mode, you need to manually execute theREFRESH MATERIALIZED VIEWstatement to refresh. You can specify the time range for refresh to avoid refreshing the entire materialized view. IfFORCEis specified in the statement, StarRocks will force refresh the corresponding view or partition regardless of whether the data in the base table has changed. By addingWITH SYNC MODEto the statement, the refresh task can be executed synchronously, returning only the success or failure result of the task.
The following example shows how to create a partitioned materialized view named par_mv8. This view is designed to automatically refresh the corresponding partition when data updates occur in the base table's partition. The refresh operation is performed in batches, with only one partition at a time (set by "partition_refresh_number" = "1"), and to control storage overhead, the materialized view only maintains the two most recent partitions (implemented by setting "partition_ttl_number" = "2"), with excess old partitions automatically deleted during the refresh cycle.
CREATE MATERIALIZED VIEW par_mv8
REFRESH ASYNC
PARTITION BY datekey
PROPERTIES (
"partition_ttl_number" = "2",
"partition_refresh_number" = "1"
) AS
SELECT k1, SUM(v1) AS total_v1, datekey
FROM par_tbl1
GROUP BY datekey, k1;You can use the REFRESH MATERIALIZED VIEW statement to refresh this materialized view. The following example forces a refresh of partitions within a specified time range in par_mv8 through a synchronous call.
REFRESH MATERIALIZED VIEW par_mv8
PARTITION START ('2021-01-03') END ('2021-01-04')
FORCE WITH SYNC MODE;After execution, you can view the partition refresh status on the Asynchronous Materialized View page in the EMR StarRocks Manager console.
The TTL (Time-To-Live) mechanism enables par_mv8 to automatically manage its partition lifecycle, which is critical for scenarios that focus on querying the latest data. It not only accelerates queries for recent data (such as data from the past week or month) but also effectively reduces storage usage. When a query requests data outside the materialized view's retained partitions, the optimizer automatically rewrites the query to fetch the data directly from the base table. This ensures results are always complete and correct, whether they come from the materialized view or the base table.
In the following example, Query 1 is accelerated through the materialized view because it hits a partition retained in par_mv8, while Query 2 is executed on the base table because it is not within the time range of retained partitions.
-- Query 1
SELECT
k1,
sum(v1) AS SUM,
datekey
FROM par_tbl1
WHERE datekey='2021-01-04'
GROUP BY datekey, k1;
-- Query 2
SELECT
k1,
sum(v1) AS SUM,
datekey
FROM par_tbl1
WHERE datekey='2021-01-01'
GROUP BY datekey, k1;The following figure demonstrates the role of materialized views in database query performance, improving response speed by caching partial results.