All Products
Search
Document Center

DataWorks:Advanced parameters for real-time synchronization

Last Updated:Feb 11, 2026

This section describes the advanced parameters.

Parameter

Value

Default

Description

Scope

Auto-set Runtime Configuration

true/false

true

  • true (Default): The system automatically sets parameters such as Concurrency based on the Number of CUs. However, if the total Concurrency is not an even multiple of the number of Source shards, this can lead to an unbalanced distribution of shards among read threads, causing Data Skew and degrading performance.

  • false: Allows for manual parameter configuration. If you encounter a Performance Bottleneck or Data Skew, select false and tune the parameters to optimize data distribution.

General

Global Flush Interval (seconds)

Integer from 5 to 1000

60

Sets the interval at which all write threads batch-flush cached data to the Destination and advance the Synchronization Offset.

  • A higher value increases Throughput but also increases Data Visibility Latency.

  • A lower value reduces Data Visibility Latency but may reduce Throughput.

Recommendation: For high-volume data synchronization scenarios where latency is not a primary concern, consider increasing this value. For use cases that require high real-time performance, decrease this value.

General

Failover Restart Policy Time Window (minutes)

Integer from 1 to 60

30

When a Synchronization Task encounters a recoverable exception, the system checks within this time window whether to trigger an automatic restart. It checks if the number of failures within the window has exceeded the failure count Threshold.

  • If the failure count has not exceeded the Threshold, the Task restarts, and the failure count for the window is incremented.

  • If the failure count has exceeded the Threshold, the Task is marked as failed and will not restart.

This parameter defines the time frame for counting failures. A shorter window makes the system more tolerant of brief, intermittent issues.

Important

Setting this window too small can cause the Task to restart frequently when facing a persistent, unrecoverable issue. This wastes resources and can delay the detection and manual resolution of the underlying problem.

General

Failover Restart Policy Failure Count Threshold

Integer from 1 to 100

3

Specifies the maximum number of times the Task is allowed to automatically restart within the configured time window.

  • A higher value increases tolerance for frequent, intermittent errors.

  • A lower value allows the system to mark a persistent problem as a failure more quickly, preventing resource waste.

Recommendation: Set a relatively small value to balance fault tolerance with rapid problem detection.

General

Partition Cache Queue Size

Integer from 5 to 100

5

When writing to non-Delta partitioned tables in MaxCompute, data is cached per partition. Within the time window set by the Global Flush Interval, the Synchronization Task must allocate a separate cache for each destination partition. This parameter limits the maximum number of partitions that can be cached simultaneously.

If the number of partitions to be written within a single flush interval exceeds this queue size, the system triggers an early global flush to commit all cached data. Frequent early flushes can significantly reduce write efficiency and overall synchronization performance.

Recommendation: Set the Partition Cache Queue Size to a value greater than the maximum number of distinct partitions you expect to write to within a single Global Flush Interval to avoid premature flushes.

If the Synchronization Task is delayed and the log contains the message uploader map size has reached uploaderMapMaximumSize, the partition cache limit has been reached. Consider increasing this parameter to improve Throughput.

Important

Increasing this parameter raises memory consumption linearly. Use the following formula to estimate memory usage: Memory Consumption ≈ 10 MB × Partition Cache Queue Size × Number of Workers × Concurrency per Worker × Async Write Thread Pool Size.

Configure this value carefully based on your cluster's available resources to prevent Out of Memory (OOM) errors.

Writes to MaxCompute

Async Write Thread Pool Size

Integer from 1 to 100

1

If the Synchronization Task experiences delays and you identify the write Destination as the Performance Bottleneck, you can increase this parameter to improve write Throughput. For example, reading from Log Service (Loghub) is often faster than writing to MaxCompute. In such cases, increasing the number of write threads can help balance the pipeline.

Note

To prevent excessive overhead from internal thread scheduling on a Worker, we recommend keeping this value at 10 or less.

Writes to MaxCompute

Oversized Field Handling Rule

Do Not Process/Truncate/Set to Null

Do Not Process

MaxCompute enforces a maximum length for a single field, which defaults to 8 MB. This parameter defines the strategy for handling fields that exceed this limit during synchronization.

  • Do Not Process: The Task attempts to write the oversized field as-is. If the field's length exceeds the MaxCompute limit and you have not modified the default 8 MB limit, the Synchronization Task fails.

  • Truncate: The field is truncated to the maximum allowed length of 8 MB before being written.

  • Set to Null: The content of the oversized field is discarded, and a NULL value is written instead.

Writes to MaxCompute

Real-time Task Session Cache Size (bytes)

Positive integer

67108864 (64 MB)

When writing data to a MaxCompute Delta table, a Session is created for each destination partition. For a non-partitioned table, a single global Session is created. Each Session contains multiple Buckets, whose number is defined when the table is created. The Synchronization Task caches data for each Bucket. When the total size of cached data across all Buckets within a single Session exceeds this value (in bytes), the system triggers a batch commit of all data from that Session to the MaxCompute server.

Note

This parameter controls the trade-off between memory usage and write frequency. If the Synchronization Task fails due to an Out of Memory (OOM) error, consider decreasing this value to reduce peak memory pressure.

Writes to MaxCompute

Real-time Task Bucket Cache Size (bytes)

Positive integer

1048576 (1 MB)

When writing data to a MaxCompute Delta table, a Session is created for each destination partition (or a single global Session for a non-partitioned table). Each Session is divided into multiple Buckets. The Synchronization Task allocates an independent cache for each Bucket. When the amount of cached data in a single Bucket exceeds this value (in bytes), the data in that specific Bucket is committed to the MaxCompute server.

Note

This parameter controls the trade-off between memory consumption and write frequency. You typically do not need to adjust this setting; the default value is recommended.

Writes to MaxCompute

Dynamic Disk Write Threshold

Positive integer

None

This parameter applies when you write data to MaxCompute Delta tables. If the number of destination partitions with pending data exceeds this Threshold, the Bucket cache is offloaded from memory to disk to reduce memory consumption. Enabling this feature degrades synchronization performance and should be used only when the Task needs to write to a very large number of partitions, creating excessive memory pressure.

Writes to MaxCompute

Single-table Flush Concurrency

Positive integer

2

When you write data to a MaxCompute Delta table, this parameter determines how many Buckets from a single Session can be flushed to the MaxCompute server concurrently. In most cases, you do not need to modify this parameter.

Writes to MaxCompute

Data Partitioning Strategy

Primary key value/Table partition field value

Primary key value

When you write data to a MaxCompute Delta table with a write Concurrency greater than 1, you must configure a Data Partitioning Strategy to ensure data consistency. This strategy determines how Source records are distributed among the parallel writer instances.

  • By Primary key value: Records with the same Primary Key are always routed to the same writer instance.

    • Advantage: Primary Keys are often uniformly distributed, which helps prevent Data Skew and ensures a balanced write load across instances.

    • Disadvantage: Each writer must maintain an independent cache for every destination partition. This approach consumes more memory, calculated as Memory ≈ Concurrency × Partitions × Cache Overhead, and can cause OOM errors.

  • By Table partition field value: Records that belong to the same destination partition are processed by the same writer instance.

    • Advantage: Writer instances can share the cache for a given partition. This significantly reduces overall memory usage.

    • Disadvantage: If the data is unevenly distributed across partitions, for example, due to a hot partition, this can lead to writer Data Skew, where some instances are overloaded while others are idle.

Configuration recommendations:

  1. Prioritize the Primary key value strategy for optimal performance and load balancing in most scenarios. If the Synchronization Task fails with an OOM error due to excessive cache size, switch to the Table partition field value strategy to reduce memory pressure.

  2. Before you switch strategies, evaluate the distribution of your data across partitions to avoid introducing a new Performance Bottleneck.

Writes to MaxCompute

Concurrency per Worker

Integer from 1 to 100

Varies by Number of CUs

The total Concurrency of a Synchronization Task is calculated as Concurrency per Worker × Number of Workers. You can adjust this value to address Data Skew. For example, if the Source is a Log Service (Loghub) Logstore, the total Concurrency should ideally be equal to (Highest Shard ID - Lowest Shard ID) + 1.

To reduce overhead from thread scheduling within a Worker, we recommend setting the Concurrency per Worker to less than 10.

General

Number of Workers

Integer from 1 to 100

Varies by Number of CUs

The total Concurrency of a Synchronization Task is calculated as Concurrency per Worker × Number of Workers. Allocating too many CUs to a single Worker can lead to long resource scheduling delays. We recommend configuring the Number of Workers so that each Worker is allocated 10 CUs or fewer.

General