All Products
Search
Document Center

DataWorks:Batch database synchronization

Last Updated:Feb 13, 2026

Data Integration in DataWorks offers an efficient way to perform batch synchronization of entire databases. This feature lets you migrate all or selected tables from a source database to a destination data store in a single operation or on a recurring schedule. It supports both full and incremental data transfers. This eliminates the need to create separate synchronization tasks for each table. The feature also automatically creates table schemas in the destination, simplifying database migrations.

Use cases

  • Data migration and cloud adoption

    • Migrate databases like MySQL and Oracle from an on-premises data center to a cloud-based Data Warehouse or Data Lake.

    • Migrate data between different cloud platforms or database systems.

  • Data warehouse and data lake construction

    Periodically migrate full or incremental data from online transactional (OLTP) databases to the Operational Data Store (ODS) layer of a Data Warehouse or Data Lake. This data serves as the foundation for downstream data analytics.

  • Data backup and disaster recovery

    • Regularly back up entire production databases to cost-effective storage like HDFS or Object Storage Service (OSS).

    • Implement disaster recovery solutions across different regions or Availability Zones.

Core capabilities

The following table describes the core capabilities of batch synchronization for entire databases.

image

Core capability

Feature

Description

Batch synchronization between heterogeneous data sources

-

Batch synchronization supports migrating data from on-premises data centers or other cloud platforms to MaxCompute, Hologres, Object Storage Service (OSS), and other Data Warehouses or Data Lakes. For more information, see Supported data sources and synchronization solutions.

Data synchronization in complex network environments

-

Batch synchronization supports data transfer from various environments, including ApsaraDB, on-premises data centers, self-hosted databases on Elastic Compute Service (ECS) instances, and databases on third-party clouds. Before you configure the task, ensure network connectivity between the resource group and both the source and destination data sources. For more information, see Network connectivity.

Synchronization modes

Full synchronization

Supports one-time or periodic full data synchronization to a destination table or a specified partition.

Incremental synchronization

Supports one-time or periodic incremental data synchronization based on time, partitions, or primary keys.

Combined full and incremental synchronization

Initial run: Performs a one-time full data synchronization.

Subsequent runs: Automatically switches to periodic incremental data synchronization to a specified partition.

Database and table mapping

Batch table sync

Supports synchronizing all tables in a database. You can also select specific tables by checking them or configuring filtering rules.

Automatic schema creation

A single configuration can handle hundreds of tables from the source database. The feature automatically creates table schemas at the destination, eliminating the need for manual intervention.

Flexible mapping

Supports custom naming rules for destination databases and tables. You can also define custom mappings for field types between the source and destination to adapt to the destination data model.

Scheduling and dependency management

Scheduling

Supports scheduling by the minute, hour, day, week, month, and year.

If you are synchronizing a large number of tables at once, we recommend staggering the execution times in your scheduling configuration to prevent task buildup and resource contention.

Task dependencies

In DataWorks, both the main database task and the individual table-level subtasks can be used as upstream dependencies for other development tasks. When a table synchronization task is complete, DataWorks automatically triggers its downstream development tasks.

Parameter support

Supports the use of scheduling parameters for incremental synchronization, such as using ${bizdate} to represent the business date.

Advanced parameters

Dirty data handling

Dirty data refers to any record that cannot be written to the destination due to an error, such as a type conflict or a constraint violation. The default is false, which means the task fails if dirty data occurs. If set to true, all dirty data is ignored.

Reader and writer configuration

You can configure the maximum number of connections for both the reader (source) and writer (destination). You can also define a cleanup policy for the destination before data is written.

Concurrency and rate limiting

  • Provides task concurrency control to limit the concurrency for reading from and writing to the database.

  • Rate limiting controls data transfer speed to prevent excessive pressure on the source or destination data source. If no rate limit is set, the task runs at the maximum possible transfer speed allowed by the hardware.

Operations and maintenance

Manual intervention

You can perform manual interventions such as Rerun, data backfill, mark as successful, and freeze or restore tasks.

Monitoring and alerting

You can configure monitoring rules for baselines, task statuses, and run durations, and send alerts when rules are triggered.

Data Quality

After a task is committed and deployed, you can configure data quality monitoring rules for the destination table in the Operation Center. You can configure rules manually or use AI-powered generation. Currently, quality rule monitoring is only available for specific database types. For more information, see Data Quality.

Get started

To create a batch synchronization task for entire databases, see Configure batch synchronization for entire databases.

Supported data sources

DataWorks supports migrating entire databases from various data sources to destinations like MaxCompute, Object Storage Service (OSS), and Elasticsearch. The supported data sources are listed below.

Source

Destination

MaxCompute

Data Lake Formation

Hologres

Object Storage Service (OSS)

Elasticsearch

StarRocks