This report compares DLF Compaction and self-managed Paimon Compaction based on compute resource consumption. We developed a multi-dimensional performance evaluation framework focusing on resource utilization and elastic scheduling capabilities.
Test Scenarios
We systematically validate DLF’s technical advantages using the following three test scenarios:
Adaptive bucketing strategy: Dynamically adjusts the number of buckets based on partition data volume, enabling fine-grained resource allocation.
Deletion Vectors (DV) optimization: Optimizes lookup and merge efficiency in Deletion Vectors (DV) scenarios to improve Compaction performance.
Dynamic resource elasticity optimization: Automatically scales compute resources (CU) based on real-time load, eliminating waste from over-provisioning or shortages during peak demand.
Adaptive Bucketing Strategy
Test Motivation
In scenarios with highly skewed data distribution, traditional fixed-bucket strategies struggle to balance performance and resource efficiency. DLF implements a partition-level adaptive bucketing mechanism. This mechanism dynamically calculates the optimal number of buckets per partition based on actual data volume. This eliminates the need for manual bucket count configuration and enables precise resource allocation.
Test Plan
Table schema design
Create a Paimon table with dynamic partitioning and compare it with a hybrid partitioning strategy.
-- DLF table (intelligent dynamic partitioning) -- Enable Deletion Vectors; system manages buckets automatically CREATE TABLE perf_rest.pk_partitions_db.t ( -- Primary key fields and extended attributes ) PARTITIONED BY (`partition_id`) WITH ( 'deletion-vectors.enabled' = 'true' ); -- Self-managed Paimon table (fixed bucket configuration) -- Bucket count must be specified upfront CREATE TABLE perf_filesystem.pk_partitions_db.t ( -- Primary key fields and extended attributes ) PARTITIONED BY (`partition_id`) WITH ( 'bucket' = '500', 'write-only' = 'true', 'deletion-vectors.enabled' = 'true' );Data injection strategy
Initial data layer: Inject 500 GB of baseline data to create a non-uniform distribution. Assign 70% of the data to the primary partition and 3% each to nine secondary partitions, which creates significant data skew.
Incremental stream: Simulate real-time data writes under production workloads.
Compaction execution
DLF: Trigger intelligent Compaction. The system automatically adapts to partition characteristics.
Self-managed Paimon: Run Compaction jobs using Flink Action with fixed configurations.
Performance Evaluation
Metric Dimension | DLF Compaction | Self-managed Paimon Compaction |
Compaction CU Consumption | 237 CU | 482 CU |
Resource Allocation Method | Dynamic optimized allocation | Static fixed allocation |
DLF reduces resource consumption by 50.8% using its intelligent bucketing strategy. In asymmetric data distribution scenarios, this advantage stems from two key mechanisms:
Dynamic bucketing algorithm: Calculates the optimal bucket count per partition in real time to avoid resource misallocation.
Partition-level resource isolation: Eliminates the “long-tail effect” and prevents large partitions from degrading overall job performance.
Deletion Vectors Optimization
Test Motivation
In Partial-Update scenarios with Deletion Vectors (DV) enabled, data merging often encounters performance bottlenecks. DLF applies kernel-level optimizations for this scenario to improve Compaction efficiency during high-frequency updates.
Test Plan
Table schema design
Create a Paimon table that supports high-frequency updates, emphasizing Lookup File processing efficiency.
-- DLF table CREATE TABLE ... WITH ( 'deletion-vectors.enabled' = 'true', 'merge-engine' = 'partial-update' ); -- Self-managed Paimon table CREATE TABLE ... WITH ( 'bucket' = '1024', 'write-only' = 'true', 'deletion-vectors.enabled' = 'true', 'merge-engine' = 'partial-update' );Test workflow
Load injection: Simulate 100,000 mixed read-write operations per second.
Continuous monitoring: Record memory usage, garbage collection (GC) frequency, and system latency during each Compaction epoch.
Performance Evaluation
Metric Dimension | DLF Compaction | Self-managed Paimon Compaction |
Compaction CU Consumption | 41 CU | 102 CU |
DLF optimizes Lookup File processing efficiency at the core for the Deletion Vectors mode. Test results show that under equal throughput pressure, DLF consumes only 40% of the compute resources used by the self-managed cluster.
Dynamic Resource Elasticity Optimization
Test Motivation
Service traffic follows a peak-and-trough pattern. Self-managed Compaction jobs typically require over-provisioned compute resources to handle peak loads, which causes waste during low-traffic periods or shortages during peaks. DLF automatically adjusts CU consumption based on real-time data volume, delivering true cloud-native elasticity.
Test Plan
Table schema design
-- DLF table CREATE TABLE perf_rest.pk_elastic.t ( ... PRIMARY KEY (`id`,`item_id`) NOT ENFORCED ) WITH ( 'deletion-vectors.enabled' = 'true' ); -- Self-managed Paimon table CREATE TABLE perf_fs.pk_elastic.t ( ... PRIMARY KEY (`id`,`item_id`) NOT ENFORCED ) WITH ( 'bucket' = '500', 'write-only' = 'true', 'deletion-vectors.enabled' = 'true' );Test workflow
Baseline data: Preload 500 GB of data.
Dynamic traffic simulation: Write 10 million rows per minute for 20 minutes (peak), then 250,000 rows per minute for 40 minutes (off-peak).
Resource configuration: Self-managed jobs use a fixed CU count sized for peak demand. DLF uses automatic elastic scaling.
Performance Evaluation
Metric Dimension | DLF Compaction | Self-managed Paimon Compaction |
Average Compaction CU Consumption | 135 CU | 400 CU |
DLF significantly improves overall resource utilization through adaptive resource adjustment:
Elastic scaling: Automatically releases compute resources when data traffic drops, greatly reducing average CU consumption.
Fully managed: Eliminates operational overhead from manually adjusting job parallelism in response to traffic fluctuations.