All Products
Search
Document Center

Realtime Compute for Apache Flink:Materialized tables

Last Updated:Jun 17, 2026

Traditional data warehouse architectures such as Lambda and Kappa suffer from three main challenges: high maintenance costs caused by separate batch and streaming frameworks, storage waste from duplicate data copies, and consistency risks from misaligned logic across layers. Materialized tables in Realtime Compute for Apache Flink address these issues by automatically deriving table schemas from query statements and a configurable data freshness target (from daily to every few minutes) to create continuously refreshing data pipelines. By unifying batch and stream processing into a single path, materialized tables eliminate redundant data copies and ensure consistent data processing logic and schemas end to end, simplifying real-time data warehouse maintenance.

Core concepts

Data freshness

  • Definition: Data freshness is a crucial attribute of the materialized table. It defines the maximum acceptable lag between a materialized table and its base tables. It is a best-effort target, not a guarantee. Flink uses this value to determine the refresh frequency of automated data pipelines.

  • Purposes:

    • Determines the refresh mode: Continuous or Full.

    • Balances data freshness against resource consumption. For example, minute-level freshness suits real-time dashboards, while daily or hourly freshness suits batch analytics.

Refresh mode

Materialized tables support two refresh modes: Continuous and Full.

Refresh mode

Description

Visibility

Applicable scenario

Continuous mode

Incrementally updates the materialized table through a streaming job.

Updates are visible either immediately for low latency, or after checkpoint completion for consistency.

Ideal for real-time applications such as risk control or real-time recommendation.

Full mode

A scheduler periodically triggers a batch job (daily or hourly) to fully overwrite the materialized table. By default, overwriting occurs at the table level. If partition fields such as time partitions are defined, overwriting happens at the partition level, refreshing only the latest partition each time.

Data is visible after a full refresh is complete.

Suitable for scenarios such as backfilling historical data and generating regular reports.

Query definition

You can use any Flink SQL query to define the data source and computation logic.

Dynamic update:

  • In Continuous mode, query results are populated to the materialized table in real time.

  • In Full mode, query results overwrite the materialized table to ensure accuracy.

Schema

Column names and types are automatically derived from the query, with no manual declaration required.

Benefits:

  • Explicitly declare primary keys to optimize query performance.

  • Define partition keys (like time) to organize data in layers, improving refresh efficiency.

How materialized tables work

When you create a materialized table, you must specify the FRESHNESS parameter and the AS <select_statement> clause. The Flink engine automatically derives and registers the table schema in a catalog, and creates a streaming or batch refresh job based on the FRESHNESS value.

image

For example, if materialized table C has a freshness of 30 minutes, Flink attempts to refresh it as closely as possible within 30 minutes after its source table A updates. Downstream materialized tables such as E and F must use a freshness value that is a positive multiple of C's freshness, such as 60 or 90 minutes. Increasing the freshness value (for example, from X minutes to Y hours, capped at 1 day) reduces refresh frequency and lowers resource consumption.

Scenarios

By unifying batch and stream processing, materialized tables offer technical and cost advantages in the following use cases:

  • Backfilling historical data.

    Final data can sometimes be partially distorted by issues such as transmission latency. Correcting historical data traditionally requires a separate batch job. Materialized tables support on-demand refresh, allowing you to manually trigger a refresh for a specific table and all its downstream dependents.

  • Unifying data processing logic and table schemas.

    In the Lambda architecture, historical and real-time data reside in separate systems, making it difficult to align processing logic and table schemas. Materialized tables store only a single copy of the data, eliminating complex joins and computations. This improves storage efficiency while unifying batch and stream processing logic and the schemas for historical and real-time data.

  • Building dynamic dashboards with adaptable data freshness.

    Dynamic dashboards often require different data freshness levels across business scenarios. Materialized tables let you adjust refresh intervals, from daily to every few seconds, by modifying the freshness value, without building and maintaining separate real-time pipelines.

Use materialized tables

References

Description

Create and use materialized tables

Learn how to create a materialized table, backfill historical data, change data freshness, and view data lineage.

Materialized Table (Build a Unified Stream-Batch Lakehouse)

Learn how to use materialized tables and Apache Paimon tables to build a stream-batch integrated data lakehouse, and how to adjust freshness to switch from batch to streaming execution modes for real-time data updates.

References