Materialized Table Overview for Real-Time Data Processing - Realtime Compute for Apache Flink

This topic describes the major updates of Realtime Compute for Apache Flink version released on December 20, 2024.

Important

The version upgrade is incrementally rolled out across the network using a canary release plan. You can use the new features in this version only after the upgrade is complete for your account. To apply for an early upgrade, submit a ticket.

Overview

This release introduces materialized tables — a unified stream and batch processing feature that lets you define data freshness in Flink SQL while Flink automatically manages the underlying refresh pipelines. You no longer need to define separate streaming and batch job logic or manually manage job transitions between the two modes.

Why materialized tables

Different business scenarios place different demands on data timeliness:

Scenario	Required freshness
Risk control	Seconds to milliseconds
User profiling and real-time recommendations	Minutes
BI reporting and historical data analytics (year-on-year and month-on-month comparisons)	Day level

Traditional data warehouse architectures — Kappa and Lambda — each address part of this spectrum but neither provides a consistent development experience across all freshness levels. Maintaining separate streaming and batch pipelines increases complexity and risks inconsistent data processing logic between the two paths.

Realtime Compute for Apache Flink solves this with materialized tables. Built on Apache Paimon's integrated stream-batch storage, materialized tables let you:

Define data freshness using Flink SQL
Have Flink attempt to refresh data at the defined interval
Streamline ETL processes
Transition jobs seamlessly between stream and batch modes
Apply cascading updates across dependent tables
Improve data update efficiency significantly

Use cases

Materialized tables are suited for:

Consistent data processing logic: when the Lambda architecture cannot ensure consistent data processing logic
Real-time statistics on offline reports: when real-time statistics are required for offline reports
Real-time dashboards backed by historical data: when real-time dashboard applications rely on historical data for accuracy