All Products
Search
Document Center

DataWorks:Batch database synchronization

Last Updated:Jun 02, 2026

DataWorks Data Integration lets you migrate all or selected tables from a source database to a destination store through full or incremental synchronization, on a one-time or recurring schedule. It automatically creates destination table schemas, eliminating the need to configure individual tasks per table.

Use cases

  • Data migration and cloud adoption

    • Migrate databases, such as MySQL and Oracle, from an on-premises data center to a cloud data warehouse or data lake.

    • Migrate data between different cloud platforms or database systems.

  • Data warehouse or data lake construction

    Periodically synchronize full or incremental data from OLTP databases to the ODS layer of a data warehouse or data lake for downstream analytics.

  • Data backup and disaster recovery

    • Regularly back up full datasets from production databases to cost-effective storage media, such as HDFS or OSS.

    • Implement cross-region or cross-availability-zone disaster recovery solutions.

Core capabilities

The following table describes the core capabilities.

image

Core capability

Feature

Description

Synchronization across heterogeneous data sources

-

Migrates data from on-premises data centers or other cloud platforms to a data warehouse or data lake such as MaxCompute, Hologres, or OSS. For more information, see Supported data sources and synchronization solutions.

Data synchronization in complex network environments

-

Transfers data from Alibaba Cloud databases, on-premises databases, self-managed databases on ECS instances, or non-Alibaba Cloud databases. Ensure network connectivity between the resource group and both the source and destination before configuration. For more information, see Network connectivity configuration.

Synchronization scenarios

Full synchronization

Synchronizes all data to destination tables or specified partitions, either once or on a schedule.

Incremental synchronization

Supports one-time or periodic incremental synchronization of data based on time, a partition, or a primary key.

Combined full and incremental

First run: Performs an automatic, one-time full synchronization.

Subsequent runs: The task automatically switches to periodic incremental synchronization to specified partitions.

Database and table mapping

Batch table synchronization

Synchronize all tables in a database, or use checkboxes and filter rules to select a subset.

Automatic schema creation

Processes hundreds of tables in a single configuration. The system automatically creates destination table schemas without manual intervention.

Flexible mapping

Define custom naming rules for destination databases and tables, and customize field type mappings to adapt to the destination data model.

Scheduling and dependency management

Scheduling time

Supports various scheduling cycles, including by the minute, hour, day, week, month, and year.

If you synchronize a large number of tables at once, stagger execution times to prevent task buildup and resource contention.

Task dependencies

Both entire-database tasks and table-level subtasks can serve as upstream dependencies. When a subtask completes, it automatically triggers downstream tasks.

Parameter support

Supports using scheduling parameters to implement incremental synchronization, such as using ${bizdate} to represent the business date.

Advanced parameters

Dirty data handling

Dirty data refers to records that fail to write due to type conflicts or constraint violations. Default: false (task fails on dirty data). If set to true, dirty data is ignored.

Reader and writer configuration

Configure the maximum connections for the reader (source) and writer (destination). You can also define a cleanup policy on the destination before writing.

Concurrency and rate limiting

  • Controls the maximum concurrent connections for reading from and writing to databases.

  • Rate limiting controls data flow to prevent excessive load on source or destination systems. When disabled, the task uses maximum available transfer throughput.

Operations and maintenance

Runtime intervention

Supports rerun, data backfill, mark as successful, freeze, and restore operations.

Monitoring and alerting

Configure monitoring rules for baselines, task status, and runtime duration, with alerts for triggered rules.

Data quality

After you submit and deploy a task, configure data quality monitoring rules for destination tables in Operation Center. Rules can be created manually or generated by AI. Currently supported for specific database types only. For more information, see Data quality.

Get started

To create a batch synchronization task, follow the steps in Configure a batch synchronization task for an entire database.

Supported data sources

The following table lists supported source-to-destination combinations.

Source

Destination

MaxCompute

Data Lake Formation

Hive

Hologres

OSS

OSS-HDFS

Elasticsearch

StarRocks

MySQL