Data Integration offline sync uses Reader and Writer plugins to synchronize full or incremental data between source and destination databases. Define data sources and configure scheduling parameters to automate the sync process.
Core capabilities
Offline sync provides the following capabilities:
|
Capability |
Description |
|
Data synchronization between disparate data sources |
Data Integration supports 50+ data source types, including relational databases, unstructured storage, big data storage, and message queues. Reader and Writer plugins transfer data between any structured or semi-structured sources. Supported data sources and sync solutions. |
|
Data synchronization in complex network environments |
Supports ApsaraDB, on-premises data centers, self-managed databases on ECS, and databases outside Alibaba Cloud. Ensure network connectivity between the resource group and both endpoints before configuration. For more information about configuration, see Network connectivity solutions. |
|
Sync scenarios |
1. Sync modes
Note
For more information about scheduling parameters, see Common scenarios for scheduling parameters in Data Integration and Scheduling parameter formats. 2. Source structures
|
|
Configuration methods |
Configure offline sync tasks using one of the following methods:
Note
For more information about task configuration capabilities, see Feature overview. |
|
Offline sync task O&M |
|
Feature overview
|
Feature |
Description |
|
Full or incremental data synchronization |
Use Data Filtering and scheduling parameters to perform full or incremental sync. Incremental sync configuration varies by plugin. Scenario: Configure an incremental offline sync task. |
|
Field mapping |
Map source fields to destination fields. Ensure compatible data types between source and destination.
|
|
Job rate limit control |
|
|
Distributed task execution |
For supported data sources, task segmentation distributes a sync task across multiple nodes for concurrent execution. Sync speed scales linearly with cluster size, breaking single-node bottlenecks. This pattern suits high-throughput, low-latency scenarios and improves hardware utilization. |
|
Dirty data policy |
Dirty data refers to records that fail to write due to exceptions such as type conflicts or constraint violations. Define a dirty data threshold and its impact on the task:
|
|
Time zone |
Set the source time zone to convert timestamps automatically when source and destination are in different time zones. |
|
Intelligent data processing |
DataWorks can transform and process source data during synchronization before writing it to the destination: String replacement: Perform lightweight in-flight data transformations during transfer without landing the data or requiring extra ETL steps. AI-assisted processing: Integrate large AI models to perform semantic, sentiment, and other analyses on natural language from the source. Results are written directly to the destination table. Data vectorization: Extract source data, create vector embeddings, and write them to a vector database. |
Get started
Create a sync task using one of the following methods: