Batch synchronize entire databases in DataWorks - DataWorks

DataWorks Data Integration lets you migrate all or selected tables from a source database to a destination store through full or incremental synchronization, on a one-time or recurring schedule. It automatically creates destination table schemas, eliminating the need to configure individual tasks per table.

Use cases

Data migration and cloud adoption
- Migrate databases, such as MySQL and Oracle, from an on-premises data center to a cloud data warehouse or data lake.
- Migrate data between different cloud platforms or database systems.
Data warehouse or data lake construction

Periodically synchronize full or incremental data from OLTP databases to the ODS layer of a data warehouse or data lake for downstream analytics.
Data backup and disaster recovery
- Regularly back up full datasets from production databases to cost-effective storage media, such as HDFS or OSS.
- Implement cross-region or cross-availability-zone disaster recovery solutions.

Core capabilities

The following table describes the core capabilities.

Core capability	Feature	Description
Synchronization across heterogeneous data sources	-	Migrates data from on-premises data centers or other cloud platforms to a data warehouse or data lake such as MaxCompute, Hologres, or OSS. For more information, see Supported data sources and synchronization solutions.
Data synchronization in complex network environments	-	Transfers data from Alibaba Cloud databases, on-premises databases, self-managed databases on ECS instances, or non-Alibaba Cloud databases. Ensure network connectivity between the resource group and both the source and destination before configuration. For more information, see Network connectivity configuration.
Synchronization scenarios	Full synchronization	Synchronizes all data to destination tables or specified partitions, either once or on a schedule.
	Incremental synchronization	Supports one-time or periodic incremental synchronization of data based on time, a partition, or a primary key.
	Combined full and incremental	First run: Performs an automatic, one-time full synchronization. Subsequent runs: The task automatically switches to periodic incremental synchronization to specified partitions.
Database and table mapping	Batch table synchronization	Synchronize all tables in a database, or use checkboxes and filter rules to select a subset.
	Automatic schema creation	Processes hundreds of tables in a single configuration. The system automatically creates destination table schemas without manual intervention.
	Flexible mapping	Define custom naming rules for destination databases and tables, and customize field type mappings to adapt to the destination data model.
Scheduling and dependency management	Scheduling time	Supports various scheduling cycles, including by the minute, hour, day, week, month, and year. If you synchronize a large number of tables at once, stagger execution times to prevent task buildup and resource contention.
	Task dependencies	Both entire-database tasks and table-level subtasks can serve as upstream dependencies. When a subtask completes, it automatically triggers downstream tasks.
	Parameter support	Supports using scheduling parameters to implement incremental synchronization, such as using `${bizdate}` to represent the business date.
Advanced parameters	Dirty data handling	Dirty data refers to records that fail to write due to type conflicts or constraint violations. Default: `false` (task fails on dirty data). If set to `true`, dirty data is ignored.
	Reader and writer configuration	Configure the maximum connections for the reader (source) and writer (destination). You can also define a cleanup policy on the destination before writing.
	Concurrency and rate limiting	Controls the maximum concurrent connections for reading from and writing to databases. Rate limiting controls data flow to prevent excessive load on source or destination systems. When disabled, the task uses maximum available transfer throughput.
Operations and maintenance	Runtime intervention	Supports rerun, data backfill, mark as successful, freeze, and restore operations.
	Monitoring and alerting	Configure monitoring rules for baselines, task status, and runtime duration, with alerts for triggered rules.
	Data quality	After you submit and deploy a task, configure data quality monitoring rules for destination tables in Operation Center. Rules can be created manually or generated by AI. Currently supported for specific database types only. For more information, see Data quality.

Get started

To create a batch synchronization task, follow the steps in Configure a batch synchronization task for an entire database.

Supported data sources

The following table lists supported source-to-destination combinations.

Source	Destination
Amazon Redshift AnalyticDB for MySQL 3.0 ClickHouse Doris Hive Hologres data source MongoDB data source MySQL Oracle PolarDB PostgreSQL SQL Server ApsaraDB for OceanBase data source	MaxCompute
Hive	Data Lake Formation
MySQL	Hive
AnalyticDB for MySQL 3.0 ClickHouse Doris Hologres data source MaxCompute Oracle PolarDB PostgreSQL SQL Server	Hologres
MySQL PolarDB Hive	OSS
Hive MySQL	OSS-HDFS
MySQL SQL Server PolarDB	Elasticsearch
MySQL	StarRocks
Hologres data source	MySQL