All Products
Search
Document Center

Data Lake Formation:Paimon tables

Last Updated:Mar 26, 2026

Data Lake Formation (DLF) supports Paimon, a lakehouse format that unifies real-time and batch storage. This page covers three operations for Paimon tables in DLF: creating, viewing, and deleting tables.

Table types

DLF supports two Paimon table types. Choose based on whether your data has a primary key and whether you need per-row stream updates.

Table type Primary key Best for
Primary key table Required Stream processing, real-time inserts, updates, and deletes; OLAP queries filtered by primary key
Append-only table None Batch processing, stream writes without per-row updates; OLAP with sorting and bitmap indexes

Fully managed Paimon tables

All Paimon tables created in DLF are fully managed. DLF controls all metadata and underlying data files. Deleting a table removes both.

Feature What it does Managed automatically?
Compaction Merges small files; runs independently from data writes for stable operations Yes
Concurrent writes Multiple write jobs can write to the same partition simultaneously Yes
Partition-level metrics Tracks real-time row count, file count, and file size per partition Yes
Multi-version (time travel) Tracks table history; supports fine-grained insert, update, and delete Yes

DLF stores data at a path auto-generated from a universally unique identifier (UUID). No manual path configuration is needed.

Note

Paimon tables created in DLF use write-only mode by default. Background operations — compaction, snapshot cleanup, and partition cleanup — are handled automatically by DLF.

Prerequisites

Before you begin, ensure that you have:

Create a Paimon table

Create from the console

  1. Log on to the Data Lake Formation console.

  2. On the Data Catalog list page, click a catalog name.

  3. In the Database list, click a database name to open the table list.

  4. In the table list, click Create Table.

  5. Configure the following settings and click OK.

    Configuration item Description
    Table Format Select Paimon Table.
    Data Table Name Required. Must be unique within the database.
    Data Table Description Optional.
    Columns Define each column: name, primary key flag, not null flag, partition field flag, data type, length/type, and description.
    Custom Table Properties Add properties that overwrite DLF's default meta service parameters during table creation. For available options, see the official Paimon documentation. Supported file formats: PARQUET, AVRO, ORC, CSV, TEXT, JSON, LANCE, and BLOB. Example: file.format = LANCE.

Create from SQL

If you have associated a catalog on Flink, EMR (E-MapReduce), or another platform, create tables directly on those platforms — metadata is written directly to DLF. For details, see Engine integration.

Primary key tables

A primary key table uses a primary key as a unique row identifier. It supports real-time inserts, updates, and deletes, and automatically generates change logs for downstream stream consumers. Use this table type for stream data processing and online analytical processing (OLAP) queries filtered by primary key.

Flink SQL

CREATE TABLE orders (
  order_id BIGINT,
  price    BIGINT,
  customer STRING,
  PRIMARY KEY NOT ENFORCED(order_id)
);

Spark SQL

CREATE TABLE orders (
  order_id BIGINT,
  price    BIGINT,
  customer STRING
) TBLPROPERTIES (
  'primary-key' = 'order_id'
);

Bucket allocation (Postpone Bucket mode)

DLF uses Postpone Bucket mode by default. This adaptive strategy dynamically adjusts bucket count based on partition data volume, avoiding read performance degradation from too many buckets and write bottlenecks from too few.

Data visibility in Postpone mode: Newly written data is not visible until compaction completes. To eliminate this latency:

  • Use Flink (VVR 11.4 or later) or Spark (esr-4.5 or later). These versions write batches directly to buckets, removing the latency.

  • For latency-sensitive tables, explicitly set the bucket count. Example: 'bucket' = '5'. Target one bucket per 1 GB of partition data.

Dynamic bucketing and automatic scaling

The system adjusts bucket allocation based on six factors:

Factor How it influences bucket count
Total partition storage Larger total file size → more buckets
Data record scale More rows (when deletion vectors are enabled) → more buckets
Write traffic load Higher write throughput → more buckets to prevent bottlenecks
Data distribution skew Detected skew → more buckets for even distribution
Single-row data size Very small average row size → more buckets to optimize file structure
Historical partition reference Heuristic algorithm uses prior partition bucket config as a baseline

Advanced configuration

Configure the following Paimon options for additional tuning on primary key tables:

  • merge engine: Define custom merge logic for complex calculations during compaction.

  • Deletion Vectors (deletion-vectors.enabled = true): Significantly improves query performance. After enabling, all newly written data is visible only after compaction, regardless of bucket mode. This requires more compaction resources but delivers more stable query performance.

  • changelog-producer (changelog-producer = 'lookup'): Generates full change logs for downstream stream reads.

  • sequence.field: Handles out-of-order data and ensures correct update sequence.

If your upstream data is Change Data Capture (CDC) data, use Flink CDC or a data integration product to load data into DLF. These tools support full database sync, automatic table creation, and schema synchronization.

Note

For high-performance OLAP queries, enable deletion vectors mode. Although it consumes more compaction resources, it provides more stable and higher-performance OLAP queries.

Append-only tables

An append-only table has no primary key. It does not support per-row stream updates, but its batch processing performance is significantly better than primary key tables. Use it for most batch workloads or stream scenarios where per-row updates are not needed.

Append-only tables support:

  • Stream writes and stream reads, with DLF automatically merging small files in the background

  • Fine-grained DELETE, UPDATE, and MERGE INTO operations

  • Version management and time travel

  • Accelerated queries via sorting and bitmap indexes, with excellent direct read performance for OLAP engines

Flink SQL

CREATE TABLE orders (
  order_id BIGINT,
  price    BIGINT,
  customer STRING
);

Spark SQL

CREATE TABLE orders (
  order_id BIGINT,
  price    BIGINT,
  customer STRING
);

View a data table

  1. In the Database list, click a database name to open the table list.

  2. In the table list, click a table name to view its fields.

  3. Click the Table Details tab to see the table's basic information, field list, and partition list.

    Note

    On the Table Details tab, you can manually modify the storage class for both partitioned and non-partitioned tables. For details, see Manually change the storage class.

  4. Click the Permissions tab to grant table-level permissions to users or roles. For details, see Data authorization management.

Delete a data table

Warning

Deleting a table removes both its metadata and data. Data is retained for one day as a safeguard against accidental deletion. After one day, the data is permanently deleted.

  1. In the Database list, click a database name to open the table list.

  2. In the table list, click Delete in the Actions column.

  3. In the dialog box, click OK.