Real-Time Data Warehouse Architecture with Stream-Batch Unification - AnalyticDB for PostgreSQL

AnalyticDB for PostgreSQL lets you ingest, process, and analyze data in a single system — using one copy of data, one write operation, one system, and one set of SQL scripts. The foundation is real-time materialized views, which replace external scheduling systems and stream compute engines with declarative, auto-refreshing SQL pipelines.

When to use real-time data warehouses

Real-time data warehouses are a good fit when:

Your business requires sub-minute data freshness for dashboards, reports, or decision systems.
You want to eliminate a separate scheduling system and stream processing engine from your data stack.
Your ETL logic is too complex for pure stream processing — for example, joins across dozens of tables, multi-layer aggregations, or access to historical data beyond a time window.
You want a single SQL codebase that handles both real-time incremental updates and periodic batch refreshes.

How it works

Building a real-time data warehouse involves three stages:

Real-time data writing — Write multi-source data (real-time and batch) in parallel with high performance and low latency. Written data is immediately visible for querying.
Real-time data processing — Use real-time materialized views to run extract, transform, load (ETL) operations in incremental streaming mode. Real-time materialized views support data cleansing, field conversion, JOIN, GROUP BY, and UNION ALL. Unlike standard materialized views, they refresh automatically when base table data changes — no REFRESH statements required. Chain multiple real-time materialized views to build a complete ETL processing pipeline: when the base table changes, each view in the chain updates automatically.
Real-time data analysis — Query the materialized views directly for ad hoc queries, report displays, and real-time dashboards. AnalyticDB for PostgreSQL uses a vectorization execution engine based on single instruction, multiple data (SIMD) instruction sets, a cost-based optimizer (CBO), and column-oriented storage to deliver efficient analytical performance.

For complex data with large volumes, nonstandard formats, or high dependency on intermediate tables, AnalyticDB for PostgreSQL supports the standard hierarchical architecture: operational data store (ODS), data warehouse detail (DWD), data warehouse service (DWS), and application data service (ADS) layers.

Key capabilities

Simple architecture, lower costs

All data writing, processing, and analysis happens inside one system. This eliminates the need to maintain separate pipelines for moving data between systems, reducing both development effort and O&M complexity.

Benefit	Detail
Cost-effective	A single copy of data written to a single system — no duplication across multiple stores
Simple development	One set of SQL scripts covers the entire pipeline; no multi-system integration required
High performance	Incremental processing keeps resource consumption low and latency minimal
Simple O&M	One system to monitor; data anomalies are easy to trace and correct

Scheduling-free real-time ETL

Traditional batch ETL requires a scheduling system to manage job dependencies and trigger refreshes. With real-time materialized views, data dependencies are resolved automatically: when data is written to a base table, all downstream real-time materialized views in the chain update incrementally.

Complex SQL without stream compute limits

Stream processing engines impose significant constraints on SQL — many don't support joins against large offline dimension tables, multi-layer table nesting, or historical data lookback. Real-time materialized views in AnalyticDB for PostgreSQL are built on top of a database engine backed by disk storage, so they support complex join relationships across dozens of tables without data window limits. This makes it practical to convert complex batch processing jobs into real-time pipelines at lower cost and effort.

No data window limits

Because real-time materialized views are implemented on a disk-based database engine (rather than memory-based stream compute), they support joins over ultra-large tables and backtracking across all historical data. Updates to raw data are automatically reflected throughout the real-time processing chain. In practice, you can create a real-time materialized view by joining dozens of tables to implement real-time incremental updates without data window limits at a latency of several seconds.

Integrated stream and batch processing

Real-time materialized views use one SQL script for both modes:

Initial load: When a real-time materialized view is first created, SQL runs in batch processing mode to compute the full result set.
Ongoing updates: After that, the result set is incrementally updated as data arrives in real time.

For mixed workloads — where some tables need real-time updates and others only need periodic refreshes — configure specific tables to be excluded from real-time updates, then execute REFRESH statements to batch refresh the result sets that are configured with scheduled updates.

Zero-ETL

AnalyticDB for PostgreSQL also provides zero-ETL. After you create an AnalyticDB for PostgreSQL instance, you can use the instance to perform parallel data import, ETL processing, ad hoc queries, and data analysis in real time.