A materialized view is a database object that pre-computes and stores query results to significantly boost the performance of complex queries. SelectDB provides two types of materialized views, a sync materialized view and an async materialized view, each suited for different business scenarios.
Use cases
Materialized views are primarily used in the following three scenarios:
-
Query acceleration: For frequent queries involving complex aggregations and multi-table JOINs, you can pre-compute and materialize the results. This allows queries to read the materialized data directly, avoiding repeated calculations and significantly reducing query latency.
-
Lightweight ETL (data processing): A materialized view can act as a lightweight ETL tool to pre-aggregate and transform raw data. This provides a processed data layer for downstream reports or analyses, enabling the construction of a layered data warehouse.
-
Lakehouse acceleration: For external data stored in a data lake (such as Hive or Iceberg), you can use an async materialized view to cache frequently queried results in SelectDB's local storage. This leverages SelectDB's high-performance engine to accelerate data lake queries.
Types of materialized views
SelectDB provides two types of materialized views for different use cases:
Sync materialized view
A sync materialized view updates in real time as data is written to the base table, guaranteeing strong consistency between the materialized view and the base table. Its features include:
-
Real-time consistency: Every write operation to the base table synchronously updates the materialized view, ensuring that query results are always consistent with the base table.
-
Single-table only: A sync materialized view can only be created on a single base table and does not support multi-table JOINs.
-
Automatic transparent rewriting: The query optimizer can automatically identify and use a sync materialized view to accelerate a query without requiring any changes to the SQL statement.
-
Applicable scenarios: Aggregation acceleration queries that require high data freshness, such as real-time dashboards and online reports.
Async materialized view
An async materialized view uses an asynchronous refresh mechanism. A flexible combination of a refresh method (full or partitioned incremental) and a trigger method (manual, scheduled, or on commit) meets various data freshness requirements. Its features include:
-
Multi-table JOIN support: An async materialized view can be created on multiple tables, including internal and external tables, supporting pre-computation of complex JOIN queries.
-
Flexible refresh strategies: It supports three trigger methods: manual refresh (
ON MANUAL), scheduled refresh (ON SCHEDULE), and refresh on commit (ON COMMIT). You can configure these flexibly based on data freshness requirements. -
Lakehouse support: Supports creating materialized views on external data sources like Hive and Iceberg to accelerate data lake queries.
-
Transparent rewriting: The query optimizer supports transparent rewriting based on the SPJG algorithm. It automatically rewrites the query to use the materialized view without modifying the business SQL.
-
Applicable scenarios: Analytical scenarios with relatively relaxed data freshness requirements (e.g., T+1) and high query complexity, such as data warehouse layering and multi-dimensional reporting.
Sync vs. async materialized views
|
Dimension |
Sync materialized view |
Async materialized view |
|
Data consistency |
Strong consistency (real-time sync) |
Eventual consistency (asynchronous refresh) |
|
Number of supported base tables |
Single table only |
Multiple tables (including JOINs) |
|
External table support |
Not supported |
Supported (Hive, Iceberg, etc.) |
|
Refresh method |
Synchronously updated on write |
Full refresh ( |
|
Trigger method |
Automatic (synchronously triggered on write) |
Manual refresh ( |
|
Transparent rewriting |
Supported |
Supported (SPJG algorithm) |
|
Applicable scenarios |
High data freshness requirements, single-table aggregation acceleration |
T+1 analytics, multi-table JOIN pre-computation, Lakehouse acceleration |
Async materialized view refresh mechanism
The refresh of an async materialized view is determined by two independent dimensions: the refresh method (the scope of each refresh) and the trigger method (when to trigger a refresh). You can combine these as needed and specify them using the REFRESH clause when creating a materialized view.
Refresh method
The refresh method determines the data scope for each refresh:
|
Refresh method |
Refresh scope |
Description |
|
Full refresh ( |
All data |
Each refresh recalculates all data for the materialized view based on its defining query and overwrites the existing data. This is suitable for non-partitioned tables or scenarios requiring a full data rebuild. |
|
Partitioned incremental refresh ( |
Changed partitions |
Refreshes only the partitions where data has changed since the last refresh, resulting in lower overhead. If the system cannot detect partition changes, it automatically falls back to a full refresh. This is ideal for partitioned tables. |
Trigger method
The trigger method determines when a refresh task starts. The following three trigger methods can be combined with any of the refresh methods described above:
|
Trigger method |
Trigger time |
Description |
Scenarios |
|
Manual refresh ( |
User-initiated |
Manually trigger a refresh by using the |
Scenarios that require precise control over refresh timing. |
|
Scheduled refresh ( |
At a set interval |
Automatically triggers a refresh at a specified interval (e.g., minute, hour, or day). |
Scenarios with periodic data updates, such as daily reports. |
|
Refresh on commit ( |
After base table data changes |
Automatically triggers a refresh after data is committed to the base table, providing the lowest data latency. |
Scenarios where the base table changes infrequently, but data freshness is a high priority. |
The refresh method and trigger method are two independent dimensions. For example, you can configure a "full refresh triggered on a schedule" to rebuild the entire view at a set time each day, or a "partitioned incremental refresh triggered on commit" to refresh only the affected partitions after the base table is updated.