The offline sync feature of Data Integration provides Reader and Writer plugins. You can define source and destination data sources and use DataWorks scheduling parameters to synchronize full or incremental data from a source database to a destination database. This topic describes the capabilities of offline sync.
Core capabilities
The capabilities of offline sync are shown in the following figure:
Capability | Description |
Data synchronization between disparate data sources | Data Integration supports over 50 data source types, such as relational databases, unstructured storage, big data storage, and message queues. You can define source and destination data sources and use the provided Reader and Writer plugins to transfer data between any structured or semi-structured data sources. For more information, see Supported data sources and sync solutions. |
Data synchronization in complex network environments | Offline sync supports data synchronization for ApsaraDB, on-premises data centers, self-managed databases on ECS, and databases outside Alibaba Cloud. Before you configure the sync, ensure network connectivity between the resource group and both the source and destination. For more information about configuration, see Network connectivity solutions. |
Sync scenarios | 1. Supported synchronous modes
Note For more information about scheduling parameters, see Common scenarios for scheduling parameters in Data Integration and Supported formats for scheduling parameters. 2. Supported source structures
|
Configuration methods | You can configure Data Integration offline sync tasks in the following ways.
Note For more information about task configuration capabilities, see Function overview. |
Offline sync task O&M |
|
Function overview
Feature | Description |
Full or incremental data synchronization | Configure Data Filtering and use scheduling parameters in offline sync tasks to perform full or incremental data synchronization. The configuration for incremental sync varies by plugin. For more information about configuring incremental data synchronization, see Scenario: Configure an incremental offline sync task. |
Field mapping | Establish mapping rules between fields to write source data to the corresponding destination fields. Ensure that the data types of the source and destination fields are compatible during configuration.
|
Job rate limit control |
|
Distributed task execution | Data sources that support distributed execution can use task segmentation technology to distribute a sync task across multiple nodes for concurrent execution. This allows the sync speed to scale linearly with the cluster size, breaking through single-node performance bottlenecks. This pattern is especially useful for high-throughput, low-latency sync scenarios. It also efficiently schedules idle cluster resources, significantly improving hardware utilization. |
Dirty data policy | Dirty data refers to records that fail to be written to the destination due to exceptions, such as type conflicts or constraint violations. Offline sync supports defining a dirty data policy. You can define the number of tolerable dirty data records and their impact on the task.
|
Time zone | If the source and destination are in different time zones, set the source time zone to perform time zone conversion during synchronization. |
Intelligent data processing | DataWorks supports data processing capabilities during data synchronization. This lets you transform and process source data before writing it to the destination: String replacement: The offline sync task in DataWorks has a built-in string replacement feature. This lets you perform lightweight data transformations during data transfer without landing the data or requiring extra extract, transform, and load (ETL) steps. AI-assisted processing: During data synchronization, you can integrate large AI models to perform semantic, sentiment, and other analyses on natural language from the source. The processed results are then written directly to the destination table. Data vectorization: Extracts source data, creates vector embeddings, and writes them to a vector database. |
More operations
For more information about how to create a task, see: