DataWorks Data Integration provides a streamlined solution for batch database synchronization. It allows you to migrate all or selected tables from a source database to a destination data store, either as a one-time operation or on a recurring schedule, using full or incremental synchronization. This feature eliminates the need to create a task for each table manually and automatically creates destination table schemas, streamlining the entire database migration process.
Use cases
-
Data migration and cloud adoption
-
Migrate on-premises databases such as MySQL and Oracle to cloud data warehouses or data lakes.
-
Migrate data between different cloud platforms or database systems.
-
-
Data warehouse and data lake construction
Periodically synchronize full or incremental data from online transaction processing (OLTP) databases to the operational data store (ODS) layer of a data warehouse or data lake. This data then serves as the foundation for subsequent data analysis.
-
Data backup and disaster recovery
-
Regularly back up full data from production databases to cost-effective storage media, such as HDFS or Object Storage Service (OSS).
-
Implement cross-region or cross-Availability Zone disaster recovery solutions.
-
Core features
Batch synchronization for entire databases offers the following core features:
|
Core feature |
Feature |
Description |
|
Batch synchronization between heterogeneous data sources |
- |
Batch synchronization supports migrating data from an on-premises data center or other cloud platforms to a data warehouse or data lake, such as MaxCompute, Hologres, or OSS. For more information, see Supported data sources and synchronization solutions. |
|
Data synchronization in complex network environments |
- |
Batch synchronization supports data transfer from Alibaba Cloud databases, self-managed databases on ECS or in on-premises data centers, and non-Alibaba Cloud databases. Before you begin, ensure network connectivity between the resource group and both the source and destination data sources. For configuration details, see Network connectivity. |
|
Synchronization modes |
Full synchronization |
Supports one-time or scheduled full data synchronization to a destination table or a specified partition. |
|
Incremental synchronization |
Supports one-time or scheduled incremental data synchronization based on timestamps, partitions, or primary keys. |
|
|
Combined full and incremental synchronization |
The first run performs a one-time full data synchronization. Subsequent runs automatically switch to periodic incremental data synchronization to a specified partition. |
|
|
Database and table mapping |
Batch table synchronization |
Synchronize all tables in a database or select specific tables using checkboxes or filter rules. |
|
Automatic schema creation |
A single configuration can process hundreds of tables from the source database, and the system automatically creates the corresponding table structures at the destination without manual intervention. |
|
|
Flexible mapping |
Customize naming rules for destination databases and tables. You can also define mappings between source and destination field types to adapt to the target data model. |
|
|
Scheduling and dependency management |
Scheduling |
Supports multiple scheduling frequencies: minute, hour, day, week, month, and year. When synchronizing many tables at once, stagger the execution times in the schedule to prevent task queuing and resource contention. |
|
Task dependencies |
Both the entire-database task and its individual subtasks can be used as upstream dependencies for other tasks in DataWorks. When a synchronization task completes, its downstream tasks are automatically triggered. |
|
|
Parameter support |
You can use scheduling parameters to implement incremental synchronization. For example, you can use |
|
|
Advanced parameters |
Dirty data handling |
Dirty data refers to records that cannot be written to the destination due to errors such as type mismatches or constraint violations. By default, this option is |
|
Reader and writer configuration |
You can configure the maximum number of connections for both the reader and writer data sources and define cleanup policies that run on the destination before data is written. |
|
|
Concurrency and rate limiting |
|
|
|
O&M (Operations and Maintenance) |
Runtime intervention |
Supports runtime interventions such as rerunning tasks, backfilling data, marking tasks as successful, and freezing or restoring tasks. |
|
Monitoring and alerting |
You can configure monitoring rules for baselines, task status, and runtime duration, and set up alerts to trigger when rule conditions are met. |
|
|
Data quality |
After you commit and deploy a task, you can configure data quality monitoring rules for the destination tables in the Operation Center. The feature supports both AI-powered rule generation and manual configuration. This feature is currently available only for specific database types. For more information, see Data Quality. |
Get started
To create a batch synchronization task for an entire database, see Configure batch synchronization for entire databases.
Supported data sources
DataWorks supports batch database migration from various data sources to destinations such as MaxCompute, Object Storage Service (OSS), and Elasticsearch. The following table lists the supported source and destination data sources.
|
Source |
Destination |
|
MaxCompute |
|
|
Data Lake Formation |
|
|
Hive |
|
|
Hologres |
|
|
OSS |
|
|
OSS-HDFS |
|
|
Elasticsearch |
|
|
StarRocks |
|
|
MySQL |