All Products
Search
Document Center

DataWorks:Batch synchronization for entire databases

Last Updated:Oct 15, 2025

DataWorks Data Integration simplifies full-database migration with automated batch synchronization. Move all or selected tables from source to destination, on demand or on schedule, with full or incremental sync. Table schemas are created automatically at the destination.

Use cases

  • Data migration and cloud adoption

    • Migrate on-premises databases like MySQL and Oracle to cloud data warehouses or data lakes.

    • Move data between cloud platforms or database systems.

  • Data warehouse or data lake construction

    Periodically sync full or incremental data from online transactional processing (OLTP) databases to the operational data store (ODS) layer for downstream analysis.

  • Data backup and disaster recovery

    • Back up production databases regularly to low-cost storage like HDFS or OSS.

    • Implement cross-region or cross-zone disaster recovery.

Core features

Core feature

Feature

Description

Cross-source database migration

-

Migrate data from on-premises data centers or other cloud platforms to data warehouses or data lakes like MaxCompute, Hologres, and OSS. For more information, see Supported data sources and synchronization solutions.

Complex network environments

-

Sync data from ApsaraDB databases, self-managed databases in on-premises data centers or on ECS instances, or non-Alibaba Cloud databases. Ensure the resource group can connect to both source and destination. For more information, see Network connectivity.

Synchronization modes

Full synchronization

One-time or scheduled full sync to destination tables or partitions

Incremental synchronization

One-time or scheduled incremental sync based on time, partition, or primary key.

Combined full and incremental synchronization

First run: Automatic one-time full sync.

Subsequent runs: Automatic switch to scheduled incremental sync to specified partitions.

Database and table mapping

Batch table sync

Sync all database tables or select specific ones using filters.

Automatic schema creation

Process hundreds of source tables with a single configuration. Destination schemas are created automatically.

Flexible mapping

Customize destination database/table naming conventions and field type mappings to match destination structure.

Scheduling and dependency management

Scheduling

Multiple scheduling frequencies: minute, hour, day, week, month, year.

When syncing many tables, stagger execution to avoid resource bottlenecks.

Task dependencies

Both entire-database tasks and individual table subtasks can act as upstream dependencies. Downstream tasks trigger automatically when table sync completes.

Parameter support

Use scheduling parameters for incremental sync, such as ${bizdate} for business date.

Advanced parameters

Dirty data handling

Dirty data refers to records that fail to write due to errors like type conflicts or constraint violations. Default is false (task fails on dirty data). Set to true to skip all dirty data.

Reader and writer configuration

Configure maximum connections for reader and writer data sources, and define pre-write cleanup policies.

Concurrency and rate limiting

  • Control maximum concurrent connections for reading and writing.

  • Limit sync speed to reduce pressure on source or destination systems. Without rate limiting, tasks use maximum available transfer performance.

Operations and maintenance

Runtime intervention

Rerun tasks, backfill data, mark as successful, freeze, or restore tasks.

Monitoring and alerting

Configure monitoring rules for baselines, task status, and runtime duration with alert notifications.

Data Quality

After deployment, configure data quality monitoring rules for destination tables in Operation Center. Supports AI-powered generation and manual configuration. Currently available for certain database types only. For more information, see Data Quality.

Quick start

Configure batch synchronization for entire databases

Supported data sources

DataWorks supports entire-database migration from various sources to destinations like MaxCompute, OSS, and Elasticsearch:

Source data source

Destination data source

MaxCompute

Hologres

OSS

Elasticsearch

StarRocks