DataWorks Data Integration simplifies full-database migration with automated batch synchronization. Move all or selected tables from source to destination, on demand or on schedule, with full or incremental sync. Table schemas are created automatically at the destination.
Use cases
Data migration and cloud adoption
Migrate on-premises databases like MySQL and Oracle to cloud data warehouses or data lakes.
Move data between cloud platforms or database systems.
Data warehouse or data lake construction
Periodically sync full or incremental data from online transactional processing (OLTP) databases to the operational data store (ODS) layer for downstream analysis.
Data backup and disaster recovery
Back up production databases regularly to low-cost storage like HDFS or OSS.
Implement cross-region or cross-zone disaster recovery.
Core features
Core feature | Feature | Description |
Cross-source database migration | - | Migrate data from on-premises data centers or other cloud platforms to data warehouses or data lakes like MaxCompute, Hologres, and OSS. For more information, see Supported data sources and synchronization solutions. |
Complex network environments | - | Sync data from ApsaraDB databases, self-managed databases in on-premises data centers or on ECS instances, or non-Alibaba Cloud databases. Ensure the resource group can connect to both source and destination. For more information, see Network connectivity. |
Synchronization modes | Full synchronization | One-time or scheduled full sync to destination tables or partitions |
Incremental synchronization | One-time or scheduled incremental sync based on time, partition, or primary key. | |
Combined full and incremental synchronization | First run: Automatic one-time full sync. Subsequent runs: Automatic switch to scheduled incremental sync to specified partitions. | |
Database and table mapping | Batch table sync | Sync all database tables or select specific ones using filters. |
Automatic schema creation | Process hundreds of source tables with a single configuration. Destination schemas are created automatically. | |
Flexible mapping | Customize destination database/table naming conventions and field type mappings to match destination structure. | |
Scheduling and dependency management | Scheduling | Multiple scheduling frequencies: minute, hour, day, week, month, year. When syncing many tables, stagger execution to avoid resource bottlenecks. |
Task dependencies | Both entire-database tasks and individual table subtasks can act as upstream dependencies. Downstream tasks trigger automatically when table sync completes. | |
Parameter support | Use scheduling parameters for incremental sync, such as | |
Advanced parameters | Dirty data handling | Dirty data refers to records that fail to write due to errors like type conflicts or constraint violations. Default is |
Reader and writer configuration | Configure maximum connections for reader and writer data sources, and define pre-write cleanup policies. | |
Concurrency and rate limiting |
| |
Operations and maintenance | Runtime intervention | Rerun tasks, backfill data, mark as successful, freeze, or restore tasks. |
Monitoring and alerting | Configure monitoring rules for baselines, task status, and runtime duration with alert notifications. | |
Data Quality | After deployment, configure data quality monitoring rules for destination tables in Operation Center. Supports AI-powered generation and manual configuration. Currently available for certain database types only. For more information, see Data Quality. |
Quick start
Supported data sources
DataWorks supports entire-database migration from various sources to destinations like MaxCompute, OSS, and Elasticsearch:
Source data source | Destination data source |
MaxCompute | |
Hologres | |
OSS | |
Elasticsearch | |
StarRocks |