This topic describes how to manage Paimon tables in Data Lake Formation (DLF).
Introduction to Table Types
Features | Uses the lakehouse format Paimon table. Supports integrated real-time and batch storage. Enables efficient read and write access via compute engines and open source APIs. |
Use cases | Stream processing, real-time updates, and high-performance OLAP queries. |
Data management | Fully managed by DLF, including metadata and data files. Deleting a table removes both metadata and data. |
Storage system | DLF auto-generates the storage path using UUIDs. You do not need to specify a storage path manually. |
Deletion behavior | By default, data is retained for 1 day after table deletion to reduce the risk of accidental deletion. Data is permanently deleted after 1 day. |
Managed Paimon table features:
Fully managed compaction: Runs independently from data writes to improve stability.
Concurrent writes: Multiple jobs can write to the same partition of the same table simultaneously.
Real-time partition-level metrics: Includes row count, file count, and size.
Multi-version support: Enables time travel and fine-grained insert, update, and delete operations.
Create a table
Log on to the Data Lake Formation console.
On the Data Catalog list page, you can click the Catalog name to go to the Catalog details page.
In the Databases list, click your database name.
In the Tables list, click Create Table.
Configure the following settings and click OK.
Configuration item
Description
Table Format
Select Paimon Table.
Table Name
Required. Must be unique within the database.
Table Description
Optional. Enter a description.
Column
Define column information, including column name, whether it is a primary key, whether it is non-null, whether it is a partition field, data type, length or type, description, and actions.
User-defined Table Properties
Add custom properties. These overwrite the default parameters of the DLF global meta service during table creation. For supported configuration items, see the Apache Paimon documentation.
NotePaimon tables created in DLF use write-only mode by default. Background table optimizations—such as compaction, snapshot cleanup, and partition cleanup—are automatically handled by DLF.
SQL Examples
DLF supports primary key tables and append-only tables. If you have registered a DLF catalog in other platforms—such as EMR or Flink—you can create databases and tables on those platforms. Metadata is written directly to DLF. For more information, see Engine integration.
Primary key tables
A primary key table uses a primary key as its unique identifier. It is designed for stream processing scenarios. It supports real-time updates, inserts, and deletes on records and automatically generates precise change logs for downstream stream consumption. In addition, primary key tables support efficient data queries based on primary key conditions.
Flink SQL example
CREATE TABLE orders ( order_id BIGINT, price BIGINT, customer STRING, PRIMARY KEY NOT ENFORCED(order_id) );Spark SQL example
CREATE TABLE orders ( order_id BIGINT, price BIGINT, customer STRING ) TBLPROPERTIES ( 'primary-key' = 'order_id' );
DLF uses Postpone Bucket mode by default. This is an adaptive bucket allocation strategy that dynamically adjusts the number of buckets based on partition data volume. It avoids performance issues caused by too many buckets—reduced read performance—or too few buckets—reduced write performance. You do not need to configure buckets manually. However, Postpone mode introduces data latency. Newly written data is not visible until compaction completes.
To avoid latency, do one of the following:
Use Flink with Ververica Runtime (VVR) 11.4+ or Spark with esr-4.5+. These versions write batches directly to buckets in Postpone mode, eliminating latency.
For latency-sensitive tables, explicitly set the number of buckets. For example:
'bucket' = '5'. We recommend one bucket per 1 GB of partition data.
The system dynamically adjusts the number of buckets based on data characteristics and traffic loads. It performs automatic scaling as needed to maintain storage efficiency and read and write performance. Read the following section to learn more about the scaling strategy.
Dynamic bucketing and automatic scaling strategy
For other business-related configurations, define the following:
Merge engine (merge-engine) for complex calculations.
Deletion vectors (deletion-vectors.enabled) to significantly improve query performance.
NoteAfter you enable this feature, all newly written data must be compacted before it becomes visible, regardless of the bucket mode.
Changelog producer (changelog-producer) set to 'lookup' to generate changelogs for downstream stream reads.
Sequence field (sequence.field) to handle out-of-order data and ensure correct update order.
If your upstream data is CDC data, use Flink CDC or a data integration product to write data to the lake. These tools provide full-database synchronization, automatic table creation, and table schema synchronization.
To achieve high-performance OLAP queries on primary key tables, we highly recommend enabling deletion vectors. Although this consumes more compaction resources, it delivers more stable and higher-performing OLAP queries.
Append-only tables
An append-only table has no primary key. Unlike primary key tables, it does not support direct stream updates. However, its batch processing performance is significantly better.
Supports stream writes and stream reads. DLF automatically merges small files to improve data timeliness at low compute cost.
Supports fine-grained operations such as
DELETE,UPDATE, andMERGE INTO. Also provides version management and time travel to meet diverse business needs.Supports query acceleration through sorting and bitmaps. OLAP engines deliver excellent direct-read performance.
Use append-only tables for most batch processing scenarios or stream processing without a primary key. Compared to primary key tables, append-only tables are simpler to use and still deliver efficient data writes and queries.
Flink SQL example
CREATE TABLE orders ( order_id BIGINT, price BIGINT, customer STRING );Spark SQL example
CREATE TABLE orders ( order_id BIGINT, price BIGINT, customer STRING );
View a table
In the Database List, click the name of the database to view its table list.
In the Table List, click a table name to view its fields.
Click the Table Details tab to view basic table information, column list, and partition list.
NoteOn the Table Details tab, you can manually modify the storage class for both partitioned and non-partitioned tables. For more information, see Manually change the storage class.
Click the Permissions tab to grant table permissions to users or roles. For more information, see Data authorization management.
Delete a table
After you delete a table, data is retained for 1 day by default to reduce the risk of accidental deletion. Data is permanently deleted after 1 day.
In the Databases list, click your database name.
In the Tables list, click Delete in the Actions column.
In the confirmation dialog box, click OK to complete the deletion.