This topic describes the major updates of Realtime Compute for Apache Flink version released on December 20, 2024.
The version upgrade is incrementally rolled out across the network using a canary release plan. You can use the new features in this version only after the upgrade is complete for your account. To apply for an early upgrade, submit a ticket.
Overview
This release introduces materialized tables — a unified stream and batch processing feature that lets you define data freshness in Flink SQL while Flink automatically manages the underlying refresh pipelines. You no longer need to define separate streaming and batch job logic or manually manage job transitions between the two modes.
Why materialized tables
Different business scenarios place different demands on data timeliness:
|
Scenario |
Required freshness |
|
Risk control |
Seconds to milliseconds |
|
User profiling and real-time recommendations |
Minutes |
|
BI reporting and historical data analytics (year-on-year and month-on-month comparisons) |
Day level |
Traditional data warehouse architectures — Kappa and Lambda — each address part of this spectrum but neither provides a consistent development experience across all freshness levels. Maintaining separate streaming and batch pipelines increases complexity and risks inconsistent data processing logic between the two paths.
Realtime Compute for Apache Flink solves this with materialized tables. Built on Apache Paimon's integrated stream-batch storage, materialized tables let you:
-
Define data freshness using Flink SQL
-
Have Flink attempt to refresh data at the defined interval
-
Streamline ETL processes
-
Transition jobs seamlessly between stream and batch modes
-
Apply cascading updates across dependent tables
-
Improve data update efficiency significantly
Use cases
Materialized tables are suited for:
-
Consistent data processing logic: when the Lambda architecture cannot ensure consistent data processing logic
-
Real-time statistics on offline reports: when real-time statistics are required for offline reports
-
Real-time dashboards backed by historical data: when real-time dashboard applications rely on historical data for accuracy