DataWorks Data Integration lets you migrate all or selected tables from a source database to a destination store through full or incremental synchronization, on a one-time or recurring schedule. It automatically creates destination table schemas, eliminating the need to configure individual tasks per table.
Use cases
-
Data migration and cloud adoption
-
Migrate databases, such as MySQL and Oracle, from an on-premises data center to a cloud data warehouse or data lake.
-
Migrate data between different cloud platforms or database systems.
-
-
Data warehouse or data lake construction
Periodically synchronize full or incremental data from OLTP databases to the ODS layer of a data warehouse or data lake for downstream analytics.
-
Data backup and disaster recovery
-
Regularly back up full datasets from production databases to cost-effective storage media, such as HDFS or OSS.
-
Implement cross-region or cross-availability-zone disaster recovery solutions.
-
Core capabilities
The following table describes the core capabilities.
|
Core capability |
Feature |
Description |
|
Synchronization across heterogeneous data sources |
- |
Migrates data from on-premises data centers or other cloud platforms to a data warehouse or data lake such as MaxCompute, Hologres, or OSS. For more information, see Supported data sources and synchronization solutions. |
|
Data synchronization in complex network environments |
- |
Transfers data from Alibaba Cloud databases, on-premises databases, self-managed databases on ECS instances, or non-Alibaba Cloud databases. Ensure network connectivity between the resource group and both the source and destination before configuration. For more information, see Network connectivity configuration. |
|
Synchronization scenarios |
Full synchronization |
Synchronizes all data to destination tables or specified partitions, either once or on a schedule. |
|
Incremental synchronization |
Supports one-time or periodic incremental synchronization of data based on time, a partition, or a primary key. |
|
|
Combined full and incremental |
First run: Performs an automatic, one-time full synchronization. Subsequent runs: The task automatically switches to periodic incremental synchronization to specified partitions. |
|
|
Database and table mapping |
Batch table synchronization |
Synchronize all tables in a database, or use checkboxes and filter rules to select a subset. |
|
Automatic schema creation |
Processes hundreds of tables in a single configuration. The system automatically creates destination table schemas without manual intervention. |
|
|
Flexible mapping |
Define custom naming rules for destination databases and tables, and customize field type mappings to adapt to the destination data model. |
|
|
Scheduling and dependency management |
Scheduling time |
Supports various scheduling cycles, including by the minute, hour, day, week, month, and year. If you synchronize a large number of tables at once, stagger execution times to prevent task buildup and resource contention. |
|
Task dependencies |
Both entire-database tasks and table-level subtasks can serve as upstream dependencies. When a subtask completes, it automatically triggers downstream tasks. |
|
|
Parameter support |
Supports using scheduling parameters to implement incremental synchronization, such as using |
|
|
Advanced parameters |
Dirty data handling |
Dirty data refers to records that fail to write due to type conflicts or constraint violations. Default: |
|
Reader and writer configuration |
Configure the maximum connections for the reader (source) and writer (destination). You can also define a cleanup policy on the destination before writing. |
|
|
Concurrency and rate limiting |
|
|
|
Operations and maintenance |
Runtime intervention |
Supports rerun, data backfill, mark as successful, freeze, and restore operations. |
|
Monitoring and alerting |
Configure monitoring rules for baselines, task status, and runtime duration, with alerts for triggered rules. |
|
|
Data quality |
After you submit and deploy a task, configure data quality monitoring rules for destination tables in Operation Center. Rules can be created manually or generated by AI. Currently supported for specific database types only. For more information, see Data quality. |
Get started
To create a batch synchronization task, follow the steps in Configure a batch synchronization task for an entire database.
Supported data sources
The following table lists supported source-to-destination combinations.
|
Source |
Destination |
|
MaxCompute |
|
|
Data Lake Formation |
|
|
Hive |
|
|
Hologres |
|
|
OSS |
|
|
OSS-HDFS |
|
|
Elasticsearch |
|
|
StarRocks |
|
|
MySQL |