Optimization of Paimon primary key and append tables - Realtime Compute for Apache Flink

This topic explains how to improve write and read performance for Apache Paimon (Paimon) primary key tables and append scalable tables in Realtime Compute for Apache Flink.

Limits

Paimon tables are supported only in Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 8.0.5 or later.

Primary key tables

Optimize write performance

Write operations in Paimon primary key tables are blocked by small-file compaction. When a bucket contains many small files, or when the changelog-producer parameter is set to lookup, compaction must finish before each checkpoint completes. If compaction takes too long, checkpoints time out, causing backpressure and reduced throughput.

To address write performance issues, choose one or more of the following methods based on your workload:

Adjust sink parallelism — Scale the number of sink workers to match your write volume.
Adjust checkpoint settings — Increase the checkpoint interval or allow concurrent checkpoints.
Enable completely asynchronous compaction — Decouple compaction from the checkpoint cycle so writes are never blocked.
Change the file format to Avro — Reduce write overhead when Online Analytical Processing (OLAP) ad hoc queries are not required.
Limit local temporary file size — Cap disk usage from spilled write buffers.

Adjust sink parallelism

Use SQL Hints to set the sink.parallelism parameter. Increasing parallelism distributes the write and compaction load across more workers, reducing the chance that any single bucket becomes a bottleneck. Note that higher parallelism also increases resource consumption.

Adjust checkpoint settings

Because write performance in Paimon is closely tied to checkpoint frequency, adjusting checkpoint behavior is often the fastest way to reduce backpressure.

Use any combination of the following adjustments:

Increase the checkpoint interval by configuring execution.checkpointing.interval. For configuration steps, see How do I configure parameters for deployment running?
Important
The checkpoint interval directly affects data latency — the time between when data is written and when it becomes available for consumption. Increase this interval only if your workload tolerates higher latency.
Add execution.checkpointing.max-concurrent-checkpoints: 3 to allow up to three checkpoints to run simultaneously. This reduces the impact of slow checkpoints on overall throughput.
Switch to a batch deployment to eliminate checkpoint overhead entirely.

Enable completely asynchronous compaction

By default, compaction is synchronous with the checkpoint cycle. Completely asynchronous compaction breaks this dependency: compaction runs in the background whenever resources are available, and checkpoints proceed without waiting for compaction to finish.

The tradeoff is that small files can accumulate during high write periods. This does not affect stream consumption, but it degrades batch consumption and OLAP query performance until background compaction catches up. Monitor the number of small files in each bucket using the files table provided by Paimon.

Configure the following parameters using the ALTER TABLE statement or SQL Hints:

'num-sorted-run.stop-trigger' = '2147483647',
'sort-spill-threshold' = '10',
'changelog-producer.lookup-wait' = 'false'

Parameter	Type	Default	Description
`num-sorted-run.stop-trigger`	Integer	5	The maximum number of small files allowed in a bucket before writes are paused until compaction catches up. Setting this to `2147483647` effectively disables the write-pause threshold, allowing compaction to run fully in the background. A large number of small files reduces batch consumption and OLAP query efficiency but has minimal impact on stream consumption.
`sort-spill-threshold`	Integer	N/A	The number of small files at which in-memory merge sort switches to external sorting. Configure this parameter to prevent heap memory exhaustion when small files accumulate. If you are unsure where to start, set this to `10`.
`changelog-producer.lookup-wait`	Boolean	true	Controls whether the sink waits for changelog generation (which involves compaction) to complete before finishing a checkpoint. Set to `false` to allow tasks that have completed compaction to continue processing without waiting for others. When set to `false`, checkpoint duration no longer reflects data processing latency. This parameter applies only when `changelog-producer` is set to `lookup`.

Change the file format to Avro

If your business focuses on batch or stream consumption and does not require OLAP ad hoc queries, configure the following parameters to change the data file format and disable statistics collection. This improves the efficiency of write operations.

Configure the following parameters when creating the table:

'file.format' = 'avro',
'metadata.stats-mode' = 'none'

Note

These parameters must be set at table creation. You cannot change the file format of an existing table.

Limit local temporary file size

Set write-buffer-spill.max-disk-size in SQL Hints to cap the maximum disk space used by spilled write buffers. This prevents write workers from consuming excessive local disk.

Optimize read performance

Adjust source parallelism

Use SQL Hints to set the scan.parallelism parameter of the Paimon source. Increasing this value allows more parallel readers to process data simultaneously.

Use the read-optimized table

During batch consumption, the full-scan phase of stream consumption, and OLAP ad hoc queries, Paimon must merge data from small files in memory using merge sort. A large number of small files slows this process significantly.

If you do not need to consume the most recent data, you can use the read-optimized table to improve consumption efficiency. When you have enabled completely asynchronous compaction to maximize write throughput, small files accumulate faster than they are merged — in this case, using the read-optimized table to read only already-compacted data improves read performance at the cost of not seeing the most recently written records.

Limit the lookup join cache size

When performing lookup joins against a Paimon table, cached files can grow without bound. Configure the following parameters in SQL Hints to control cache growth:

lookup.cache-max-disk-size — The maximum disk space used by the lookup cache.
lookup.cache-file-retention — How long cached files are kept before expiration.

Append scalable tables

Optimize write performance

Write throughput for append scalable tables is determined by sink parallelism and the bandwidth of the underlying file system or Object Storage Service (OSS). Before tuning parameters, verify that your storage system has enough bandwidth to support your target write rate. Then apply the following optimizations:

Adjust sink parallelism — Scale the number of sink workers to match your write volume.
Resolve data skew — Force a shuffle between the upstream operator and the sink when upstream data is unevenly distributed.
Limit local temporary file size — Cap disk usage from spilled write buffers.

Adjust sink parallelism

Use SQL Hints to set the sink.parallelism parameter. Note that higher parallelism increases resource consumption.

Resolve data skew

Upstream data is not shuffled before being written to an append scalable table. If the upstream data distribution is skewed, some sink workers receive far more data than others, leaving some resources underutilized and creating a write bottleneck.

Set sink.parallelism to a value different from the upstream node's parallelism. This forces a shuffle between the upstream operator and the sink. To confirm that the shuffle is active, check the Realtime Compute for Apache Flink development console: if the sink operator and its upstream node appear in different subtasks, data is being shuffled.

Limit local temporary file size

Set write-buffer-spill.max-disk-size in SQL Hints to cap the disk space used by spilled write buffers.

Optimize read performance

Adjust source parallelism

Use SQL Hints to set the scan.parallelism parameter of the Paimon source.

Sort data to improve query efficiency

Data order significantly affects batch processing and OLAP ad hoc query performance. Sorting data by the columns most commonly used in filter conditions reduces the amount of data each query must scan.

Sorting requires running a Flink job in batch mode. Before you begin, complete the required setup described in Data management configuration. Then configure the job parameters in the Entry Point Main Arguments field.

When to use each sort strategy:

Strategy	Use when
`zorder`	Range queries with fewer than five filter columns
`hilbert`	Range queries with five or more filter columns
`order`	Queries that use only equality conditions

The following example sorts the data in specific partitions by the date and type columns using Z-order:

compact
--warehouse 'oss://your-bucket/data-warehouse'
--database 'your_database'
--table 'your_table'
--order_strategy 'zorder'
--order_by 'date,type'
--partition 'dt=20240311,hh=08;dt=20240312,hh=09'
--catalog_conf 'fs.oss.endpoint=oss-cn-hangzhou-internal.aliyuncs.com'
--catalog_conf 'fs.oss.endpoint=oss-cn-beijing-internal.aliyuncs.com'
--table_conf 'write-buffer-size=256 MB'
--table_conf 'your_table.logRetentionDuration=7 days'

Parameter	Description
`warehouse`	The OSS path of the data warehouse that contains the catalog for the Paimon table.
`database`	The name of the database that contains the Paimon table.
`table`	The name of the Paimon table.
`order_strategy`	The sort strategy. Valid values: `zorder`, `hilbert`, `order`. See the strategy selection table above.
`order_by`	The columns to sort by, separated by commas.
`partition`	The partitions to sort, separated by semicolons. Omit this parameter if the table is not partitioned.
`catalog_conf`	Parameters from the WITH clause of the catalog that contains the Paimon table. Specify one parameter per line.
`table_conf`	Temporary table configuration, equivalent to SQL Hints. Specify one parameter per line.