Supported data sources and synchronization solutions - DataWorks

DataWorks Data Integration moves data between databases, object storage, message queues, and SaaS systems through batch and real-time synchronization. This page helps you choose a synchronization approach and verify whether your data source supports the read or write operations you need.

Choose a synchronization approach

Two factors drive the choice:

Latency — Is a daily update sufficient (T+1 batch), or do you need changes reflected within seconds or minutes (real-time)?
Scale — Are you moving a handful of complex, heterogeneous tables, or replicating hundreds of uniform tables at once?

The following table maps those factors to the available approaches.

Approach	Latency	Best for	Key trade-off
Single-table batch	T+1 or periodic	A small number of core tables with complex transformation logic, or non-standard sources such as APIs and log files	High configuration overhead and resource cost at scale; 100 single-table tasks consume roughly 100 CUs versus 2 CUs for one whole-database task
Whole-database batch	T+1 or periodic	Hundreds of homogeneous tables — building an Operational Data Store (ODS) layer, cloud migration, and periodic backups	No per-table transformation logic
Single-table real-time	Second-to-minute latency	Complex processing of real-time change streams from a single, critical table	Higher operational complexity than batch
Whole-database real-time	Second-to-minute latency	Real-time data warehouses, database disaster recovery, and real-time data lake integration	Requires Change Data Capture (CDC) support or a message queue source
Whole-database full and incremental	Full sync: batch; incremental: T+1	Append-only targets (such as non-Delta MaxCompute tables) that cannot process CDC updates directly	Final merged state is visible only after the T+1 merge task completes
Serverless	T+1 or periodic	See Serverless synchronization task	—

Prerequisite for incremental batch synchronization: The source table must have a field that tracks changes, such as a timestamp column (gmt_modified) or an auto-incrementing ID. Without one, use periodic full synchronization instead.

Prerequisite for real-time synchronization: The source must support CDC or act as a message queue. For MySQL, binary logging must be enabled.

How the whole-database full and incremental approach works

Append-only storage systems like non-Delta MaxCompute tables cannot process physical Update or Delete operations. Writing a CDC stream to them directly produces inconsistent data — for example, deleted rows remain visible in the target.

Data Integration addresses this with the Base + Log pattern, implemented as a whole-database full and incremental task:

A Base table holds the latest full snapshot.
A Log table captures the real-time CDC stream.

CDC changes are written to the Log table within minutes. On a T+1 schedule, the system automatically merges the Log table into the Base table to produce an updated full snapshot. This balances near-real-time data capture with the eventual consistency that batch data warehouses require.

For setup details, see Whole-database full and incremental (near real-time) task.

Synchronization types at a glance

The following table summarizes each synchronization type by source granularity, target granularity, latency, and typical scenario.

Type	Source granularity	Target granularity	Timeliness	Synchronization scenario
Single-table batch	Single table	Single table/partition	T+1 or periodic	Periodic full or incremental synchronization
Sharding batch	Multiple tables with identical schema	Single table/partition	T+1 or periodic	Periodic full or incremental synchronization
Single-table real-time	Single table	Single table/partition	Second-to-minute latency	Change Data Capture (CDC)
Whole-database batch	Whole database or multiple tables	Matching tables and partitions	One-time or periodic	One-time or periodic full/incremental synchronization. Supports an initial full synchronization followed by periodic incremental updates.
Whole-database real-time	Whole database or multiple tables	Matching tables and partitions	Second-to-minute latency	Full synchronization + Change Data Capture (CDC)
Whole-database full and incremental	Whole database or multiple tables	Matching tables and partitions	Initial full synchronization: batch processing; subsequent incremental synchronization: T+1	Full synchronization + Change Data Capture (CDC)

Data source read/write capabilities

The table below lists every supported data source and its read/write capabilities across each synchronization type. Read means the source can act as a data source; Write means it can act as a destination. A dash (—) indicates the combination is not supported.

Sources are grouped by category to help you find your data source faster.

Alibaba Cloud databases

Data source	Single-table batch	Single-table real-time	Whole-database batch	Whole-database real-time	Whole-database full and incremental
AnalyticDB for MySQL 2.0	Read/Write	—	—	—	—
AnalyticDB for MySQL 3.0	Read/Write	Write	Read	Write	—
AnalyticDB for PostgreSQL	Read/Write	—	Read	—	—
ApsaraDB for OceanBase	Read/Write	Write	—	Read/Write	—
ApsaraDB for Memcache	Write	—	—	—	—
DataHub	Read/Write	Read/Write	—	Write	—
Data Lake Formation	Read/Write	Write	Write	Write	—
Hologres	Read/Write	Read/Write	Read/Write	Write	—
Lindorm	Read/Write	Write	—	Write	—
MaxCompute	Read/Write	Write	Write	Write	Write
MaxGraph	Write	—	—	—	—
MetaQ	Read	—	—	—	—
Milvus	Read/Write	—	—	—	—
Object Storage Service (OSS)	Read/Write	—	Write	Write	—
OpenSearch	Write	—	—	—	—
OSS-HDFS	Read/Write	—	Write	Write	—
PolarDB	Read/Write	Read	Read	Read	Read
PolarDB-X 2.0	Read/Write	—	Read	Read	—
Simple Log Service (SLS)	Read/Write	Read	—	—	—
Tablestore	Read/Write	Write	—	—	—

Relational databases

Data source	Single-table batch	Single-table real-time	Whole-database batch	Whole-database real-time	Whole-database full and incremental
DB2	Read/Write	—	Read	—	—
DM (Dameng)	Read/Write	—	Read	—	—
DRDS (PolarDB-X 1.0)	Read/Write	—	Read	—	—
KingbaseES (Renda Jingcang)	Read/Write	—	—	—	—
MariaDB	Read/Write	—	—	—	—
MySQL	Read/Write	Read	Read	Read	Read
Oracle	Read/Write	Read	Read	Read	Read
PostgreSQL	Read/Write	—	Read	Read	—
SAP HANA	Read/Write	—	—	—	—
SQL Server	Read/Write	—	Read	—	—
TiDB	Read/Write	—	—	—	—
GBase8a GBase8a	Read/Write	—	—	—	—
Vertica Vertica	Read/Write	—	—	—	—

Analytical and columnar databases

Data source	Single-table batch	Single-table real-time	Whole-database batch	Whole-database real-time	Whole-database full and incremental
ClickHouse	Read/Write	—	Read	—	—
Doris	Read/Write	Write	Read	—	—
StarRocks	Read/Write	Write	Write	Write	—

NoSQL and search

Data source	Single-table batch	Single-table real-time	Whole-database batch	Whole-database real-time	Whole-database full and incremental
Elasticsearch Elasticsearch	Read/Write	Write	Write	Write	—
HBase HBase	HBase: Read/Write; HBase 20xsql: Read; HBase 11xsql: Write	—	—	—	—
MongoDB	Read/Write	—	—	Read	—
Redis	Write	—	—	—	—
TSDB	Write	—	—	—	—

Object storage and file systems

Data source	Single-table batch	Single-table real-time	Whole-database batch	Whole-database real-time	Whole-database full and incremental
Amazon S3	Read/Write	—	—	—	—
Azure Blob Storage	Read	—	—	—	—
COS	Read	—	—	—	—
FTP	Read/Write	—	—	—	—
HDFS	Read/Write	—	—	—	—
Hive Hive	Read/Write	—	Read/Write	—	—
HttpFile	Read	—	—	—	—
TOS	Read	—	—	—	—

Message queues

Data source	Single-table batch	Single-table real-time	Whole-database batch	Whole-database real-time	Whole-database full and incremental
Kafka	Read/Write	Read/Write	—	Write	—

Cloud data warehouses and analytics platforms

Data source	Single-table batch	Single-table real-time	Whole-database batch	Whole-database real-time	Whole-database full and incremental
Amazon Redshift	Read/Write	—	—	—	—
BigQuery	Read	—	—	—	—
Databricks	Read	—	—	—	—
Public dataset	Read	—	—	—	—
Snowflake	Read/Write	—	—	—	—

SaaS and APIs

Data source	Single-table batch	Single-table real-time	Whole-database batch	Whole-database real-time	Whole-database full and incremental
RestAPI	Read/Write	—	—	—	—
Salesforce	Read/Write	—	—	—	—
Sensors Data (Shen Ce)	Write	—	—	—	—

If your data source is not listed, use the RestAPI connector for sources that expose RESTful APIs.

What's next

Configure a data source connection: Data source management
Create a synchronization task:
Explore use cases:
- Use case: Real-time synchronization for a single table
- Use case: Batch whole-database synchronization
Data Integration FAQ

DataWorks:Supported data sources and synchronization solutions