Advanced parameters for real-time synchronization - DataWorks

Real-time synchronization tasks expose advanced parameters for fine-tuning performance, failover behavior, and memory usage. Parameters are grouped by scope: General parameters apply to all tasks; Writes to MaxCompute parameters apply only when MaxCompute is the destination.

General parameters

These parameters apply to all real-time synchronization tasks regardless of the destination.

Auto-set runtime configuration

Attribute	Value
Values	`true` / `false`
Default	`true`

Controls whether the system automatically sets concurrency and related parameters based on the number of CUs.

true The system sets concurrency automatically. If the total concurrency is not an exact multiple of the number of source shards, shard distribution across read threads may be uneven, causing data skew and degrading performance.

false Enables manual configuration. Set this to false when you encounter a performance bottleneck or data skew, then tune the parameters below to optimize data distribution.

Global flush interval (seconds)

Attribute	Value
Values	Integer, 5–1000
Default	60

Sets the interval at which all write threads batch-flush cached data to the destination and advance the synchronization offset.

A higher value increases throughput but also increases data visibility latency.
A lower value reduces latency but may reduce throughput.

For high-volume scenarios where latency is not critical, increase this value. For workloads requiring low latency, decrease it.

Failover restart policy time window (minutes)

Attribute	Value
Values	Integer, 1–60
Default	30

Defines the sliding time window used to count task failures. When a synchronization task encounters a recoverable error, the system checks whether the failure count within this window has reached the configured threshold:

Below threshold: the task restarts and the failure count increments.
Exceeded threshold: the task is marked as failed and stops restarting.

A shorter window makes the system more tolerant of brief, intermittent errors. However, setting this too small when a persistent issue exists causes the task to restart repeatedly, wasting resources and delaying manual intervention.

Important

Avoid setting this window too small for tasks that face persistent, unrecoverable issues. Repeated restarts waste resources and delay detection of the root cause.

Failover restart policy failure count threshold

Attribute	Value
Values	Integer, 1–100
Default	3

The maximum number of automatic restarts allowed within the configured time window (see Failover restart policy time window).

A higher value increases tolerance for frequent, intermittent errors.
A lower value causes the system to mark a persistent problem as failed sooner, preventing resource waste.

Set a relatively small value to balance fault tolerance with rapid problem detection.

Concurrency per worker

Attribute	Value
Values	Integer, 1–100
Default	Varies by number of CUs

Total concurrency for a synchronization task is calculated as:

Total concurrency = Concurrency per worker × Number of workers

Adjust this value to address data skew. For Log Service (Loghub) Logstore sources, set total concurrency to match the number of active shards:

Total concurrency = (Highest shard ID − Lowest shard ID) + 1

Keep this value below 10 per worker to avoid excessive thread scheduling overhead.

Number of workers

Attribute	Value
Values	Integer, 1–100
Default	Varies by number of CUs

Total concurrency for a synchronization task is calculated as:

Total concurrency = Concurrency per worker × Number of workers

Allocate no more than 10 CUs per worker to avoid resource scheduling delays.

Writes to MaxCompute parameters

These parameters apply only when the destination is MaxCompute.

Partition cache queue size

Attribute	Value
Values	Integer, 5–100
Default	5

Applies when writing to non-Delta partitioned MaxCompute tables. Within each Global flush interval, the task allocates a separate cache for each destination partition. This parameter limits the maximum number of partitions cached simultaneously.

If the number of partitions written within a single flush interval exceeds this limit, the system triggers an early global flush to commit all cached data. Frequent early flushes reduce write efficiency and overall synchronization performance.

Set Partition cache queue size to a value greater than the maximum number of distinct partitions you expect to write within a single flush interval.

Diagnostic signal: If the task is delayed and the log contains uploader map size has reached uploaderMapMaximumSize, the partition cache limit has been reached. Increase this parameter to improve throughput.

Important

Increasing this parameter raises memory consumption linearly. Estimate memory usage with: Size this value carefully based on your cluster's available memory to prevent Out of Memory (OOM) errors.

Memory ≈ 10 MB × Partition cache queue size × Number of workers × Concurrency per worker × Async write thread pool size

Async write thread pool size

Attribute	Value
Values	Integer, 1–100
Default	1

Controls the number of async write threads per worker. Increase this value when the write destination is the performance bottleneck — for example, when reading from Log Service (Loghub) is faster than writing to MaxCompute.

Note

Keep this value at 10 or below to prevent excessive thread scheduling overhead on a worker.

Oversized field handling rule

Attribute	Value
Values	`Do Not Process` / `Truncate` / `Set to Null`
Default	`Do Not Process`

MaxCompute enforces a maximum field length of 8 MB. This parameter defines how the task handles fields that exceed this limit.

Do Not Process The task writes the field as-is. If the field exceeds the 8 MB MaxCompute limit and you have not modified the default 8 MB limit, the task fails.

Truncate The field is truncated to 8 MB before being written.

Set to Null The field content is discarded and a NULL value is written instead.

Real-time task session cache size (bytes)

Attribute	Value
Values	Positive integer
Default	67108864 (64 MB)

Applies when writing to MaxCompute Delta tables. A session is created for each destination partition (or a single global session for non-partitioned tables). Each session contains multiple buckets. The task caches data per bucket, and when the total cached data across all buckets in a session exceeds this value, a batch commit is triggered to the MaxCompute server.

Note

This parameter controls the trade-off between memory usage and write frequency. If the task fails with an OOM error, decrease this value to reduce peak memory pressure.

Real-time task bucket cache size (bytes)

Attribute	Value
Values	Positive integer
Default	1048576 (1 MB)

Applies when writing to MaxCompute Delta tables. Each session is divided into multiple buckets. The task allocates an independent cache per bucket. When a single bucket's cached data exceeds this value, that bucket is committed to the MaxCompute server.

Note

The default value is recommended for most workloads. Adjust only if you have specific memory or write-frequency requirements.

Dynamic disk write threshold

Attribute	Value
Values	Positive integer
Default	None (disabled)

Applies when writing to MaxCompute Delta tables. When the number of destination partitions with pending data exceeds this threshold, bucket caches are offloaded from memory to disk to reduce memory pressure.

Enabling this feature degrades synchronization performance. Use it only when the task must write to a very large number of partitions and is experiencing excessive memory pressure.

Single-table flush concurrency

Attribute	Value
Values	Positive integer
Default	2

Applies when writing to MaxCompute Delta tables. Controls how many buckets from a single session can be flushed to the MaxCompute server concurrently. The default value is appropriate for most workloads.

Data partitioning strategy

Attribute	Value
Values	`Primary key value` / `Table partition field value`
Default	`Primary key value`

Applies when writing to MaxCompute Delta tables with write concurrency greater than 1. This strategy determines how source records are distributed among parallel writer instances.

Primary key value Records with the same primary key are always routed to the same writer instance.

Advantage: Primary keys are typically uniformly distributed, which prevents data skew and ensures balanced write load.
Disadvantage: Each writer maintains an independent cache for every destination partition, increasing memory consumption significantly. Estimate memory with: Memory ≈ Concurrency × Partitions × Cache overhead. This can cause OOM errors with large partition counts.

Table partition field value Records belonging to the same destination partition are processed by the same writer instance.

Advantage: Writer instances share the cache for a given partition, significantly reducing overall memory usage.
Disadvantage: If data is unevenly distributed across partitions (for example, due to a hot partition), some writer instances become overloaded while others are idle.

Configuration guidance:

Start with Primary key value for optimal performance and load balancing. If the task fails with an OOM error due to excessive cache size, switch to Table partition field value to reduce memory pressure.
Before switching strategies, evaluate your data distribution across partitions to avoid introducing a new performance bottleneck.