DataWorks Data Integration moves data between databases, object storage, message queues, and SaaS systems through batch and real-time synchronization. This page helps you choose a synchronization approach and verify whether your data source supports the read or write operations you need.
Choose a synchronization approach
Two factors drive the choice:
-
Latency — Is a daily update sufficient (T+1 batch), or do you need changes reflected within seconds or minutes (real-time)?
-
Scale — Are you moving a handful of complex, heterogeneous tables, or replicating hundreds of uniform tables at once?
The following table maps those factors to the available approaches.
| Approach | Latency | Best for | Key trade-off |
|---|---|---|---|
| Single-table batch | T+1 or periodic | A small number of core tables with complex transformation logic, or non-standard sources such as APIs and log files | High configuration overhead and resource cost at scale; 100 single-table tasks consume roughly 100 CUs versus 2 CUs for one whole-database task |
| Whole-database batch | T+1 or periodic | Hundreds of homogeneous tables — building an Operational Data Store (ODS) layer, cloud migration, and periodic backups | No per-table transformation logic |
| Single-table real-time | Second-to-minute latency | Complex processing of real-time change streams from a single, critical table | Higher operational complexity than batch |
| Whole-database real-time | Second-to-minute latency | Real-time data warehouses, database disaster recovery, and real-time data lake integration | Requires Change Data Capture (CDC) support or a message queue source |
| Whole-database full and incremental | Full sync: batch; incremental: T+1 | Append-only targets (such as non-Delta MaxCompute tables) that cannot process CDC updates directly | Final merged state is visible only after the T+1 merge task completes |
| Serverless | T+1 or periodic | See Serverless synchronization task | — |
Prerequisite for incremental batch synchronization: The source table must have a field that tracks changes, such as a timestamp column (gmt_modified) or an auto-incrementing ID. Without one, use periodic full synchronization instead.
Prerequisite for real-time synchronization: The source must support CDC or act as a message queue. For MySQL, binary logging must be enabled.
How the whole-database full and incremental approach works
Append-only storage systems like non-Delta MaxCompute tables cannot process physical Update or Delete operations. Writing a CDC stream to them directly produces inconsistent data — for example, deleted rows remain visible in the target.
Data Integration addresses this with the Base + Log pattern, implemented as a whole-database full and incremental task:
-
A Base table holds the latest full snapshot.
-
A Log table captures the real-time CDC stream.
CDC changes are written to the Log table within minutes. On a T+1 schedule, the system automatically merges the Log table into the Base table to produce an updated full snapshot. This balances near-real-time data capture with the eventual consistency that batch data warehouses require.
For setup details, see Whole-database full and incremental (near real-time) task.
Synchronization types at a glance
The following table summarizes each synchronization type by source granularity, target granularity, latency, and typical scenario.
| Type | Source granularity | Target granularity | Timeliness | Synchronization scenario |
|---|---|---|---|---|
| Single-table batch | Single table | Single table/partition | T+1 or periodic | Periodic full or incremental synchronization |
| Sharding batch | Multiple tables with identical schema | Single table/partition | T+1 or periodic | Periodic full or incremental synchronization |
| Single-table real-time | Single table | Single table/partition | Second-to-minute latency | Change Data Capture (CDC) |
| Whole-database batch | Whole database or multiple tables | Matching tables and partitions | One-time or periodic | One-time or periodic full/incremental synchronization. Supports an initial full synchronization followed by periodic incremental updates. |
| Whole-database real-time | Whole database or multiple tables | Matching tables and partitions | Second-to-minute latency | Full synchronization + Change Data Capture (CDC) |
| Whole-database full and incremental | Whole database or multiple tables | Matching tables and partitions | Initial full synchronization: batch processing; subsequent incremental synchronization: T+1 | Full synchronization + Change Data Capture (CDC) |
Data source read/write capabilities
The table below lists every supported data source and its read/write capabilities across each synchronization type. Read means the source can act as a data source; Write means it can act as a destination. A dash (—) indicates the combination is not supported.
Sources are grouped by category to help you find your data source faster.
Alibaba Cloud databases
| Data source | Single-table batch | Single-table real-time | Whole-database batch | Whole-database real-time | Whole-database full and incremental |
|---|---|---|---|---|---|
| AnalyticDB for MySQL 2.0 | Read/Write | — | — | — | — |
| AnalyticDB for MySQL 3.0 | Read/Write | Write | Read | Write | — |
| AnalyticDB for PostgreSQL | Read/Write | — | Read | — | — |
| ApsaraDB for OceanBase | Read/Write | Write | — | Read/Write | — |
| ApsaraDB for Memcache | Write | — | — | — | — |
| DataHub | Read/Write | Read/Write | — | Write | — |
| Data Lake Formation | Read/Write | Write | Write | Write | — |
| Hologres | Read/Write | Read/Write | Read/Write | Write | — |
| Lindorm | Read/Write | Write | — | Write | — |
| MaxCompute | Read/Write | Write | Write | Write | Write |
| MaxGraph | Write | — | — | — | — |
| MetaQ | Read | — | — | — | — |
| Milvus | Read/Write | — | — | — | — |
| Object Storage Service (OSS) | Read/Write | — | Write | Write | — |
| OpenSearch | Write | — | — | — | — |
| OSS-HDFS | Read/Write | — | Write | Write | — |
| PolarDB | Read/Write | Read | Read | Read | Read |
| PolarDB-X 2.0 | Read/Write | — | Read | Read | — |
| Simple Log Service (SLS) | Read/Write | Read | — | — | — |
| Tablestore | Read/Write | Write | — | — | — |
Relational databases
| Data source | Single-table batch | Single-table real-time | Whole-database batch | Whole-database real-time | Whole-database full and incremental |
|---|---|---|---|---|---|
| DB2 | Read/Write | — | Read | — | — |
| DM (Dameng) | Read/Write | — | Read | — | — |
| DRDS (PolarDB-X 1.0) | Read/Write | — | Read | — | — |
| KingbaseES (Renda Jingcang) | Read/Write | — | — | — | — |
| MariaDB | Read/Write | — | — | — | — |
| MySQL | Read/Write | Read | Read | Read | Read |
| Oracle | Read/Write | Read | Read | Read | Read |
| PostgreSQL | Read/Write | — | Read | Read | — |
| SAP HANA | Read/Write | — | — | — | — |
| SQL Server | Read/Write | — | Read | — | — |
| TiDB | Read/Write | — | — | — | — |
| GBase8a
GBase8a |
Read/Write | — | — | — | — |
| Vertica Vertica |
Read/Write | — | — | — | — |
Analytical and columnar databases
| Data source | Single-table batch | Single-table real-time | Whole-database batch | Whole-database real-time | Whole-database full and incremental |
|---|---|---|---|---|---|
| ClickHouse | Read/Write | — | Read | — | — |
| Doris | Read/Write | Write | Read | — | — |
| StarRocks | Read/Write | Write | Write | Write | — |
NoSQL and search
| Data source | Single-table batch | Single-table real-time | Whole-database batch | Whole-database real-time | Whole-database full and incremental |
|---|---|---|---|---|---|
| Elasticsearch
Elasticsearch |
Read/Write | Write | Write | Write | — |
| HBase HBase |
HBase: Read/Write; HBase 20xsql: Read; HBase 11xsql: Write | — | — | — | — |
| MongoDB | Read/Write | — | — | Read | — |
| Redis | Write | — | — | — | — |
| TSDB | Write | — | — | — | — |
Object storage and file systems
| Data source | Single-table batch | Single-table real-time | Whole-database batch | Whole-database real-time | Whole-database full and incremental |
|---|---|---|---|---|---|
| Amazon S3 | Read/Write | — | — | — | — |
| Azure Blob Storage | Read | — | — | — | — |
| COS | Read | — | — | — | — |
| FTP | Read/Write | — | — | — | — |
| HDFS | Read/Write | — | — | — | — |
| Hive
Hive |
Read/Write | — | Read/Write | — | — |
| HttpFile | Read | — | — | — | — |
| TOS | Read | — | — | — | — |
Message queues
| Data source | Single-table batch | Single-table real-time | Whole-database batch | Whole-database real-time | Whole-database full and incremental |
|---|---|---|---|---|---|
| Kafka | Read/Write | Read/Write | — | Write | — |
Cloud data warehouses and analytics platforms
| Data source | Single-table batch | Single-table real-time | Whole-database batch | Whole-database real-time | Whole-database full and incremental |
|---|---|---|---|---|---|
| Amazon Redshift | Read/Write | — | — | — | — |
| BigQuery | Read | — | — | — | — |
| Databricks | Read | — | — | — | — |
| Public dataset | Read | — | — | — | — |
| Snowflake | Read/Write | — | — | — | — |
SaaS and APIs
| Data source | Single-table batch | Single-table real-time | Whole-database batch | Whole-database real-time | Whole-database full and incremental |
|---|---|---|---|---|---|
| RestAPI | Read/Write | — | — | — | — |
| Salesforce | Read/Write | — | — | — | — |
| Sensors Data (Shen Ce) | Write | — | — | — | — |
If your data source is not listed, use the RestAPI connector for sources that expose RESTful APIs.
What's next
-
Configure a data source connection: Data source management
-
Create a synchronization task:
-
Explore use cases: