Data Lake Formation (DLF) supports Paimon, a lakehouse format that unifies real-time and batch storage. This page covers three operations for Paimon tables in DLF: creating, viewing, and deleting tables.
Table types
DLF supports two Paimon table types. Choose based on whether your data has a primary key and whether you need per-row stream updates.
| Table type | Primary key | Best for |
|---|---|---|
| Primary key table | Required | Stream processing, real-time inserts, updates, and deletes; OLAP queries filtered by primary key |
| Append-only table | None | Batch processing, stream writes without per-row updates; OLAP with sorting and bitmap indexes |
Fully managed Paimon tables
All Paimon tables created in DLF are fully managed. DLF controls all metadata and underlying data files. Deleting a table removes both.
| Feature | What it does | Managed automatically? |
|---|---|---|
| Compaction | Merges small files; runs independently from data writes for stable operations | Yes |
| Concurrent writes | Multiple write jobs can write to the same partition simultaneously | Yes |
| Partition-level metrics | Tracks real-time row count, file count, and file size per partition | Yes |
| Multi-version (time travel) | Tracks table history; supports fine-grained insert, update, and delete | Yes |
DLF stores data at a path auto-generated from a universally unique identifier (UUID). No manual path configuration is needed.
Paimon tables created in DLF use write-only mode by default. Background operations — compaction, snapshot cleanup, and partition cleanup — are handled automatically by DLF.
Prerequisites
Before you begin, ensure that you have:
-
Access to the Data Lake Formation console
-
An existing catalog and database in DLF
Create a Paimon table
Create from the console
-
Log on to the Data Lake Formation console.
-
On the Data Catalog list page, click a catalog name.
-
In the Database list, click a database name to open the table list.
-
In the table list, click Create Table.
-
Configure the following settings and click OK.
Configuration item Description Table Format Select Paimon Table. Data Table Name Required. Must be unique within the database. Data Table Description Optional. Columns Define each column: name, primary key flag, not null flag, partition field flag, data type, length/type, and description. Custom Table Properties Add properties that overwrite DLF's default meta service parameters during table creation. For available options, see the official Paimon documentation. Supported file formats: PARQUET, AVRO, ORC, CSV, TEXT, JSON, LANCE, and BLOB. Example: file.format = LANCE.
Create from SQL
If you have associated a catalog on Flink, EMR (E-MapReduce), or another platform, create tables directly on those platforms — metadata is written directly to DLF. For details, see Engine integration.
Primary key tables
A primary key table uses a primary key as a unique row identifier. It supports real-time inserts, updates, and deletes, and automatically generates change logs for downstream stream consumers. Use this table type for stream data processing and online analytical processing (OLAP) queries filtered by primary key.
Flink SQL
CREATE TABLE orders (
order_id BIGINT,
price BIGINT,
customer STRING,
PRIMARY KEY NOT ENFORCED(order_id)
);
Spark SQL
CREATE TABLE orders (
order_id BIGINT,
price BIGINT,
customer STRING
) TBLPROPERTIES (
'primary-key' = 'order_id'
);
Bucket allocation (Postpone Bucket mode)
DLF uses Postpone Bucket mode by default. This adaptive strategy dynamically adjusts bucket count based on partition data volume, avoiding read performance degradation from too many buckets and write bottlenecks from too few.
Data visibility in Postpone mode: Newly written data is not visible until compaction completes. To eliminate this latency:
-
Use Flink (VVR 11.4 or later) or Spark (esr-4.5 or later). These versions write batches directly to buckets, removing the latency.
-
For latency-sensitive tables, explicitly set the bucket count. Example:
'bucket' = '5'. Target one bucket per 1 GB of partition data.
Dynamic bucketing and automatic scaling
Advanced configuration
Configure the following Paimon options for additional tuning on primary key tables:
-
merge engine: Define custom merge logic for complex calculations during compaction.
-
Deletion Vectors (
deletion-vectors.enabled = true): Significantly improves query performance. After enabling, all newly written data is visible only after compaction, regardless of bucket mode. This requires more compaction resources but delivers more stable query performance. -
changelog-producer (
changelog-producer = 'lookup'): Generates full change logs for downstream stream reads. -
sequence.field: Handles out-of-order data and ensures correct update sequence.
If your upstream data is Change Data Capture (CDC) data, use Flink CDC or a data integration product to load data into DLF. These tools support full database sync, automatic table creation, and schema synchronization.
For high-performance OLAP queries, enable deletion vectors mode. Although it consumes more compaction resources, it provides more stable and higher-performance OLAP queries.
Append-only tables
An append-only table has no primary key. It does not support per-row stream updates, but its batch processing performance is significantly better than primary key tables. Use it for most batch workloads or stream scenarios where per-row updates are not needed.
Append-only tables support:
-
Stream writes and stream reads, with DLF automatically merging small files in the background
-
Fine-grained
DELETE,UPDATE, andMERGE INTOoperations -
Version management and time travel
-
Accelerated queries via sorting and bitmap indexes, with excellent direct read performance for OLAP engines
Flink SQL
CREATE TABLE orders (
order_id BIGINT,
price BIGINT,
customer STRING
);
Spark SQL
CREATE TABLE orders (
order_id BIGINT,
price BIGINT,
customer STRING
);
View a data table
-
In the Database list, click a database name to open the table list.
-
In the table list, click a table name to view its fields.
-
Click the Table Details tab to see the table's basic information, field list, and partition list.
NoteOn the Table Details tab, you can manually modify the storage class for both partitioned and non-partitioned tables. For details, see Manually change the storage class.
-
Click the Permissions tab to grant table-level permissions to users or roles. For details, see Data authorization management.
Delete a data table
Deleting a table removes both its metadata and data. Data is retained for one day as a safeguard against accidental deletion. After one day, the data is permanently deleted.
-
In the Database list, click a database name to open the table list.
-
In the table list, click Delete in the Actions column.
-
In the dialog box, click OK.