All Products
Search
Document Center

Realtime Compute for Apache Flink:Streaming lakehouse with Paimon

Last Updated:Jun 02, 2026

Apache Paimon is a lakehouse storage format that unifies streaming and batch data processing. Built on the log-structured merge-tree (LSM) structure, Paimon brings real-time update semantics directly into the lake layer, giving you consistent reads without sacrificing throughput. Use Paimon tables in Realtime Compute for Apache Flink to build a streaming lakehouse on cloud storage such as Object Storage Service (OSS).

Paimon integrates with Apache Flink for stream processing and Apache Spark for batch processing through a single storage format. Key capabilities include:

  • Real-time data ingestion: Ingest tens of millions of records from database change streams (such as MySQL CDC) with automatic schema change synchronization and low latency.

  • Unified stream and batch processing: Read the same Paimon table as a bounded batch source in Spark or as an unbounded changelog stream in Flink — no format conversion required.

  • Broad ecosystem integration: Connect Paimon tables to Realtime Compute for Apache Flink, E-MapReduce (Spark, StarRocks, Hive, and Trino), and MaxCompute without data duplication.

  • Low-latency OLAP queries: Deletion vectors and primary key indexes keep streaming, batch, and online analytical processing (OLAP) query latency at the minute level.

For the full Apache Paimon specification, see Apache Paimon.

Usage

Get started with Paimon

Create a Paimon catalog

A Paimon catalog is a centralized registry for Paimon tables stored in external systems such as OSS. Other Alibaba Cloud services can access tables through the same catalog. Set up a catalog in any of the following ways:

Create a Paimon table

Write data to a Paimon table

Consume data from a Paimon table

  • Query or consume data from a Paimon table in batch or streaming mode. For more information, see Consume data. To consume data from a primary key table in streaming mode, configure the changelog producer first.

  • Configure the consumer offset of a Paimon table. For more information, see Consume data from a specified offset.

  • Save the consumer offset of a Paimon table or keep expired snapshot files referenced by active consumers. For more information, see Save consumption progress with consumer ID.

  • Run a batch job to read the historical state of a Paimon table from a specific snapshot. For more information, see Time travel.

Maintain a Paimon table