All Products
Search
Document Center

DataWorks:Single-table batch synchronization tasks

Last Updated:Jun 02, 2026

Data Integration offline sync uses Reader and Writer plugins to synchronize full or incremental data between source and destination databases. Define data sources and configure scheduling parameters to automate the sync process.

Core capabilities

Offline sync provides the following capabilities:

image

Capability

Description

Data synchronization between disparate data sources

Data Integration supports 50+ data source types, including relational databases, unstructured storage, big data storage, and message queues. Reader and Writer plugins transfer data between any structured or semi-structured sources. Supported data sources and sync solutions.

Data synchronization in complex network environments

Supports ApsaraDB, on-premises data centers, self-managed databases on ECS, and databases outside Alibaba Cloud. Ensure network connectivity between the resource group and both endpoints before configuration. For more information about configuration, see Network connectivity solutions.

Sync scenarios

1. Sync modes

  • Periodic full synchronization: Periodically overwrites the destination table with all source data. Suitable for full-refresh scenarios.

  • Periodic incremental synchronization: Syncs only new or changed data on a daily or hourly basis. Uses built-in scheduling parameters such as ${bizdate} with a WHERE clause for data filtering to pull specified data into the corresponding time partition. Scenario: Configure an incremental offline sync task.

  • Historical data backfill: Use Data Backfill in the Operation Center to run sync tasks in batches and efficiently archive historical data.

Note

For more information about scheduling parameters, see Common scenarios for scheduling parameters in Data Integration and Scheduling parameter formats.

2. Source structures

  • Single table to single table: Syncs data from one source table to one destination table.

  • Sharded databases and tables to a single table:

    • Automatically aggregates data from multiple physical tables, such as order_01 and order_02, and writes the data to a single destination table.

    • Supports MySQL, SQL Server, Oracle, PostgreSQL, PolarDB, and AnalyticDB. Synchronize sharded databases and tables.

Configuration methods

Configure offline sync tasks using one of the following methods:

  • Codeless UI: A visual wizard for step-by-step configuration. Easy to learn but lacks some advanced features.

  • Code editor: Define sync logic directly in JSON. Supports complex configurations for fine-grained control.

  • Create using OpenAPI: Manage the entire task lifecycle programmatically through the OpenAPI.

Note

For more information about task configuration capabilities, see Feature overview.

Offline sync task O&M

  • Monitoring and alerting: Monitor offline sync task status and receive alerts when tasks are incomplete, fail, or succeed. Alerts are delivered by email, SMS, phone, DingTalk chatbot, or webhooks.

  • Data quality: After a task is submitted and published, configure data quality monitoring rules for the destination table in the Operation Center. Only some database types are supported.

  • Data source environment fencing: Bind one data source name to separate developer and production configurations. Tasks automatically switch data sources based on the runtime environment, preventing test operations from affecting production data.

Feature overview

image

Feature

Description

Full or incremental data synchronization

Use Data Filtering and scheduling parameters to perform full or incremental sync. Incremental sync configuration varies by plugin. Scenario: Configure an incremental offline sync task.

Field mapping

Map source fields to destination fields. Ensure compatible data types between source and destination.

  • Available mapping methods:

    • The codeless UI supports mapping by name, by row, and custom field relationships. Unmapped fields are ignored. Ensure destination fields have default values or allow nulls to avoid write failures.

    • The code editor maps fields by column order. The reader and writer must have identical field counts, or the task fails.

  • Destination fields support dynamic value assignment with constants, scheduling parameters, and built-in variables such as ${bizdate}. Parameter values are resolved during the scheduling phase.

Job rate limit control

  • Task concurrency control: Limits the maximum concurrent connections for reading from and writing to the database.

  • Sync rate: Controls traffic to prevent excessive load on data sources. Without a limit, the task uses the maximum available transfer performance.

Distributed task execution

For supported data sources, task segmentation distributes a sync task across multiple nodes for concurrent execution. Sync speed scales linearly with cluster size, breaking single-node bottlenecks. This pattern suits high-throughput, low-latency scenarios and improves hardware utilization.

Dirty data policy

Dirty data refers to records that fail to write due to exceptions such as type conflicts or constraint violations. Define a dirty data threshold and its impact on the task:

  • Ignore dirty data: Filters out dirty records and writes only compliant data. The task continues.

  • Tolerate a limited number of dirty data records: Set a threshold N. The task discards up to N dirty records and continues. If dirty records exceed N, the task fails.

  • Do not tolerate dirty data: The task fails immediately on any dirty record.

Time zone

Set the source time zone to convert timestamps automatically when source and destination are in different time zones.

Intelligent data processing

DataWorks can transform and process source data during synchronization before writing it to the destination:

String replacement: Perform lightweight in-flight data transformations during transfer without landing the data or requiring extra ETL steps.

AI-assisted processing: Integrate large AI models to perform semantic, sentiment, and other analyses on natural language from the source. Results are written directly to the destination table.

Data vectorization: Extract source data, create vector embeddings, and write them to a vector database.

Get started

Create a sync task using one of the following methods: