All Products
Search
Document Center

DataWorks:Data Integration

Last Updated:Mar 26, 2026

DataWorks Data Integration supports data synchronization in complex network environments. Two synchronization modes are available: batch synchronization for periodic offline data transfers, and real-time synchronization for continuous incremental replication. Configure both on the DataStudio page.

Choose a synchronization mode

The two modes differ in transfer cadence and data volume per run.

Batch synchronizationReal-time synchronization
Transfer cadenceScheduled (periodic)Continuous
Data transferredFull or incremental snapshotsIncremental changes only
Typical use casePeriodic reporting, data warehousingLow-latency pipelines
Source topologySingle table to single table; tables in sharded databases to single tableStar-shaped multi-source link
ConfigurationCodeless UI or code editorInput/output configuration

Use batch synchronization when:

  • Downstream workloads tolerate a delay (for example, daily or hourly refreshes)

  • You need to backfill historical data into specific partitions

  • Your source is one of the 40+ supported data source types, including relational databases, unstructured storage systems, big data storage systems, and message queues

Use real-time synchronization when:

  • Data must arrive at the destination within seconds of a source change

  • You want to continuously replicate an entire database to a destination

Batch synchronization is not ideal when:

  • You need sub-minute data freshness

  • Your source does not support any of the 40+ compatible data source types

For additional synchronization solutions — including combined full and incremental sync and whole-database batch sync — see Supported data source types and data synchronization solutions.

Prerequisites

Before you begin, ensure that you have:

  • The Development role in your DataWorks workspace

To add a RAM (Resource Access Management) user and assign roles, see Add a RAM user to a workspace as a member and assign roles to the member.

Batch synchronization

How it works

Batch synchronization reads data from a source using a Reader plug-in and writes it to a destination using a Writer plug-in. Before creating a batch synchronization node, add the data sources to DataWorks so they are available during node configuration.

Each run transfers either full data or incremental data to a specific partition in the destination table. Use the built-in scheduling parameter $bizdate — assigned to the built-in variable ${bizdate} by default — to target the correct partition for each scheduled run. You can also use the data backfill feature in Operation Center to synchronize historical data to specific tables or specific partitions based on the configurations of the batch synchronization node.

Configure a batch synchronization node

Choose the configuration method based on your data source and requirements:

ScenarioMethodReference
Data source is added to DataWorks and supports the codeless UICodeless UIConfigure a batch synchronization node by using the codeless UI (2.0)
Data source cannot be added to DataWorksCode editorConfigure a batch synchronization node by using the code editor (2.0)
Data source does not support the codeless UICode editorConfigure a batch synchronization node by using the code editor (2.0)
Reader or Writer plug-in parameters can only be set in script modeCode editorConfigure a batch synchronization node by using the code editor (2.0)

For the full list of supported data sources, Reader plug-ins, and Writer plug-ins, see Supported data source types, Reader plug-ins, and Writer plug-ins and Overview of the batch synchronization feature.

Real-time synchronization

Real-time synchronization uses a star-shaped synchronization link that combines multiple data source types. Configure the input and output of a real-time synchronization node to sync from a single table to another single table, or to replicate all data from an entire database to a destination.

For supported data source types and setup details, see Data source types that support real-time synchronization and Overview of the real-time synchronization feature.

Configure scheduling dependencies

Scheduling dependencies control when a node runs relative to other nodes in the workspace.

Batch synchronization node

  • Ancestor node: Set the root node of the workspace or a zero load node as the ancestor. This triggers the batch synchronization node within the workspace scheduling cycle.

  • Descendant node: To let DataWorks automatically parse the dependency between a batch synchronization node and a downstream SQL node, configure the output of the batch synchronization node in Project name.Table name format.

Real-time synchronization node

Real-time synchronization nodes run continuously and do not generate outputs the same way auto-triggered nodes do. Table-lineage-based scheduling dependencies are not supported for downstream nodes. Instead, set the root node of the workspace or a zero load node as the ancestor of the downstream node directly.

Note To make sure a real-time synchronization node produces data as expected, configure a monitoring rule for the node.

Use scheduling parameters in batch synchronization

DataWorks provides the built-in variable ${bizdate} for batch synchronization nodes. By default, the scheduling parameter $bizdate is assigned to ${bizdate} as its value.