SelectDB materialized view overview - ApsaraDB for SelectDB

Use cases

Materialized views are primarily used in the following three scenarios:

Query acceleration: For frequent queries involving complex aggregations and multi-table JOINs, you can pre-compute and materialize the results. This allows queries to read the materialized data directly, avoiding repeated calculations and significantly reducing query latency.
Lightweight ETL (data processing): A materialized view can act as a lightweight ETL tool to pre-aggregate and transform raw data. This provides a processed data layer for downstream reports or analyses, enabling the construction of a layered data warehouse.
Lakehouse acceleration: For external data stored in a data lake (such as Hive or Iceberg), you can use an async materialized view to cache frequently queried results in SelectDB's local storage. This leverages SelectDB's high-performance engine to accelerate data lake queries.

Types of materialized views

SelectDB provides two types of materialized views for different use cases:

Sync materialized view

A sync materialized view updates in real time as data is written to the base table, guaranteeing strong consistency between the materialized view and the base table. Its features include:

Real-time consistency: Every write operation to the base table synchronously updates the materialized view, ensuring that query results are always consistent with the base table.
Single-table only: A sync materialized view can only be created on a single base table and does not support multi-table JOINs.
Automatic transparent rewriting: The query optimizer can automatically identify and use a sync materialized view to accelerate a query without requiring any changes to the SQL statement.
Applicable scenarios: Aggregation acceleration queries that require high data freshness, such as real-time dashboards and online reports.

Async materialized view

An async materialized view uses an asynchronous refresh mechanism. A flexible combination of a refresh method (full or partitioned incremental) and a trigger method (manual, scheduled, or on commit) meets various data freshness requirements. Its features include:

Multi-table JOIN support: An async materialized view can be created on multiple tables, including internal and external tables, supporting pre-computation of complex JOIN queries.
Flexible refresh strategies: It supports three trigger methods: manual refresh (ON MANUAL), scheduled refresh (ON SCHEDULE), and refresh on commit (ON COMMIT). You can configure these flexibly based on data freshness requirements.
Lakehouse support: Supports creating materialized views on external data sources like Hive and Iceberg to accelerate data lake queries.
Transparent rewriting: The query optimizer supports transparent rewriting based on the SPJG algorithm. It automatically rewrites the query to use the materialized view without modifying the business SQL.
Applicable scenarios: Analytical scenarios with relatively relaxed data freshness requirements (e.g., T+1) and high query complexity, such as data warehouse layering and multi-dimensional reporting.

Sync vs. async materialized views

Dimension	Sync materialized view	Async materialized view
Data consistency	Strong consistency (real-time sync)	Eventual consistency (asynchronous refresh)
Number of supported base tables	Single table only	Multiple tables (including JOINs)
External table support	Not supported	Supported (Hive, Iceberg, etc.)
Refresh method	Synchronously updated on write	Full refresh (`COMPLETE`) or partitioned incremental refresh (`AUTO`)
Trigger method	Automatic (synchronously triggered on write)	Manual refresh (`ON MANUAL`), scheduled refresh (`ON SCHEDULE`), or refresh on commit (`ON COMMIT`)
Transparent rewriting	Supported	Supported (SPJG algorithm)
Applicable scenarios	High data freshness requirements, single-table aggregation acceleration	T+1 analytics, multi-table JOIN pre-computation, Lakehouse acceleration

Async materialized view refresh mechanism

The refresh of an async materialized view is determined by two independent dimensions: the refresh method (the scope of each refresh) and the trigger method (when to trigger a refresh). You can combine these as needed and specify them using the REFRESH clause when creating a materialized view.

Refresh method

The refresh method determines the data scope for each refresh:

Refresh method	Refresh scope	Description
Full refresh (`COMPLETE`)	All data	Each refresh recalculates all data for the materialized view based on its defining query and overwrites the existing data. This is suitable for non-partitioned tables or scenarios requiring a full data rebuild.
Partitioned incremental refresh (`AUTO`)	Changed partitions	Refreshes only the partitions where data has changed since the last refresh, resulting in lower overhead. If the system cannot detect partition changes, it automatically falls back to a full refresh. This is ideal for partitioned tables.

Trigger method

The trigger method determines when a refresh task starts. The following three trigger methods can be combined with any of the refresh methods described above:

Trigger method	Trigger time	Description	Scenarios
Manual refresh (`ON MANUAL`)	User-initiated	Manually trigger a refresh by using the `REFRESH MATERIALIZED VIEW` statement.	Scenarios that require precise control over refresh timing.
Scheduled refresh (`ON SCHEDULE`)	At a set interval	Automatically triggers a refresh at a specified interval (e.g., minute, hour, or day).	Scenarios with periodic data updates, such as daily reports.
Refresh on commit (`ON COMMIT`)	After base table data changes	Automatically triggers a refresh after data is committed to the base table, providing the lowest data latency.	Scenarios where the base table changes infrequently, but data freshness is a high priority.

Note

The refresh method and trigger method are two independent dimensions. For example, you can configure a "full refresh triggered on a schedule" to rebuild the entire view at a set time each day, or a "partitioned incremental refresh triggered on commit" to refresh only the affected partitions after the base table is updated.