All Products
Search
Document Center

DataWorks:Configure batch synchronization in the codeless UI

Last Updated:Mar 27, 2026

Data Integration's codeless UI lets you set up periodic batch synchronization tasks without writing code. You select a source and destination, configure field mappings and scheduling properties, then publish — no scripts required. This topic covers the general configuration flow for a single-table batch synchronization task. Settings vary by data source; for source-specific details, see Supported data sources and synchronization solutions.

Note

The codeless UI does not support all advanced features. If you need fine-grained control, you can convert the task to a script and configure it in the code editor at any time. If a data source does not support the codeless UI, the UI displays a message prompting you to switch. See Code editor configuration.

Prerequisites

Before you begin, ensure that you have:

Step 1: Create a Data Integration node

Data Studio (new version)

  1. Log on to the DataWorks console. In the left-side navigation pane, choose Data Development and O&M > Data Studio. Select your workspace from the drop-down list and click Go to Data Studio.

  2. Create a workflow. See Workflows.

  3. Create a Data Integration node using one of the following methods:

    • Method 1: In the workflow list, click the image icon in the upper-right corner and choose Create Node > Data Integration.

    • Method 2: Double-click the workflow name and drag the Data Integration node from the Data Integration directory to the workflow editor on the right.

  4. Configure the source and destination types, select Single Table Batch Sync, and click OK.

Data Studio (legacy version)

  1. Log on to the DataWorks console. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select your workspace from the drop-down list and click Go to Data Development.

  2. Create a workflow. See Create a workflow.

  3. Create a batch synchronization node using one of the following methods:

    • Method 1: Expand the workflow, right-click Data Integration, and choose Create Node > Batch Synchronization.

    • Method 2: Double-click the workflow name and drag the Batch Synchronization node from the Data Integration directory to the workflow editor on the right.

  4. Follow the prompts to finish creating the node.

Step 2: Configure data sources and runtime resources

image
  1. In the Source and Destination sections, select the source and destination objects to read from and write to.

  2. In the Runtime Resources section, select a resource group and allocate CUs. If the task fails with an out-of-memory (OOM) error, increase the CU value. For recommended configurations, see Data Integration performance metrics.

  3. Verify that both the source and destination pass the connectivity test. If connectivity fails, configure the network connection as prompted or see Configure network connectivity.

Note

If your resource group doesn't appear in the list, confirm it's associated with the workspace. See Use a serverless resource group.

Step 3: Configure the synchronization solution

Important

Plugin configurations vary. The following sections describe common settings. For plugin-specific support and configuration details, see Data source list.

Source

Configure the source table and fill in the required parameters.

Setting Required Description
Data filtering No Filter source rows using a WHERE clause condition (omit the WHERE keyword). Only rows that match the condition are synchronized. If no filter is set, the task synchronizes the full table by default. Configure a filter when you need incremental synchronization — for example, gmt_create >= '${bizdate}' synchronizes only new rows from the current day. Assign a value to the variable in the scheduling properties. See Supported formats of scheduling parameters and Scenario: Configure a batch synchronization task for incremental data.
Sharding key No The source field to use as the sharding key (splitPk). The task splits data into subtasks based on this key for concurrent, batched reads. Configure a sharding key when you want to improve read throughput on large tables. Use the table's primary key — primary keys are usually distributed evenly, which prevents data skew across shards. The sharding key must be an integer; string, float, date, and other types are not supported. If an unsupported type is specified, DataWorks ignores it and falls back to a single channel. If no sharding key is set, the task uses a single channel. Not all plugins support sharding key configuration — check your plugin's documentation for details.
The method for configuring incremental synchronization varies by data source and plugin.

Data processing

Important

Data processing is available in the new version of Data Studio. To use it in the legacy version, select Use New Version (with data processing feature) when creating the task. To access all features, upgrade your workspace to the new version. See Data Studio upgrade guide.

Data processing lets you transform source data — using string replacement, AI-assisted processing, or data embedding — before writing it to the destination table.

image
  1. Click the switch to enable data processing.

  2. In the data processing list, click Add Node and select a processing type: Replace String, AI Process, or Data Embedding. Add multiple nodes as needed — DataWorks runs them in sequential order.

  3. Configure the processing rules as prompted. For details on AI-assisted processing and data embedding, see Data processing.

Note

Data processing consumes additional computing resources, increasing resource overhead and task runtime. Keep processing logic as simple as possible to avoid affecting synchronization throughput.

Destination

Configure the destination table and fill in the required parameters.

Setting Required Description
Pre- and post-synchronization statements No Execute SQL statements on the destination before writing data (pre-sync) and after the write completes (post-sync). Configure a pre-sync statement when you need to prepare the destination table before loading — for example, use truncate table tablename in the Pre-import Preparation Statement field to clear existing data before writing new rows. MySQL Writer supports preSql and postSql for this purpose.
Write mode Depends on plugin Specify how to handle write conflicts — for example, when a primary key already exists in the destination. The available options depend on the destination data source and writer plugin. See your plugin's documentation for details.

Field mapping

After selecting the source and destination, define the mapping between source and destination columns. The task writes data from each source field to its mapped destination field.

  • Source fields not mapped to any destination field are not synchronized.

  • If the automatic mapping is incorrect, you can adjust it manually.

  • To exclude a field, delete the connecting line between the source and destination fields.

Type mismatches between source and destination fields during synchronization can produce dirty data and cause write failures. Set the tolerance for dirty data in Advanced Settings (the next step).

Map fields by name or by row position. Additional options:

  • Assign values to destination fields: In the Source Field column, click Add Field to add constants, scheduling parameters (for example, '${scheduling_parameter}'), or built-in variables (for example, '#{built_in_variable}#'). For scheduling parameter syntax, see Supported formats of scheduling parameters.

  • Edit source fields: Click Manually Edit Mapping to apply source database functions to fields (for example, Max(id) to synchronize only the maximum value), or to manually add fields that weren't retrieved during auto-mapping.

    Note

    MaxCompute Reader does not support source database functions.

The following built-in variables are available for select plugins:

Built-in variable Description Supported plugins
'#{DATASOURCE_NAME_SRC}#' Name of the source data source MySQL Reader, MySQL (Sharded) Reader, PolarDB Reader, PolarDB (Sharded) Reader, PostgreSQL Reader, PolarDB-O Reader, PolarDB-O (Sharded) Reader
'#{DB_NAME_SRC}#' Name of the database where the source table resides MySQL Reader, MySQL (Sharded) Reader, PolarDB Reader, PolarDB (Sharded) Reader, PostgreSQL Reader, PolarDB-O Reader, PolarDB-O (Sharded) Reader
'#{SCHEMA_NAME_SRC}#' Name of the schema where the source table resides PolarDB Reader, PolarDB (Sharded) Reader, PostgreSQL Reader, PolarDB-O Reader, PolarDB-O (Sharded) Reader
'#{TABLE_NAME_SRC}#' Name of the source table MySQL Reader, MySQL (Sharded) Reader, PolarDB Reader, PolarDB (Sharded) Reader, PostgreSQL Reader, PolarDB-O Reader, PolarDB-O (Sharded) Reader

Step 4: Configure advanced settings

Note

Advanced settings was previously called Channel Control in earlier versions.

Use advanced settings to control synchronization behavior. For parameter details, see Relationship between concurrency and throttling for batch synchronization.

Parameter Required Description
Expected maximum concurrency No The maximum number of threads used to concurrently read from the source or write to the destination. Due to resource constraints, actual runtime concurrency may not reach this value. Fees for the debugging resource group are based on actual concurrency, not the configured value. Task scheduling fees are based on the number of batch synchronization tasks, not concurrency. See Performance metrics.
Synchronization rate No Controls data transfer speed. Throttling caps the extraction rate to protect the source database from excessive load — the minimum speed limit is 1 MB/s. No throttling lets the task run at the maximum transfer rate the hardware supports, within the configured concurrency limit. The traffic metric reflects throughput within Data Integration itself, not actual network interface card (NIC) traffic. NIC traffic is typically one to two times the channel traffic, depending on the serialization overhead of the data source's transfer protocol.
Policy for dirty data records No Dirty data refers to data records that could not be written to the destination due to exceptions, such as type conflicts or constraint violations. By default, dirty data is allowed and does not affect task execution. Set the tolerance to 0 to fail the task if any dirty data is generated. Set a threshold to allow dirty data up to that limit — data within the threshold is skipped and not written to the destination; if the limit is exceeded, the task fails. Excessive dirty data can slow overall synchronization.
Distributed execution No Splits task slices into multiple processes for concurrent execution, breaking the single-process bottleneck and improving throughput. Enable this option when you have high throughput requirements and sufficient resources. Requires concurrency of 8 or greater. Disable this option if an OOM error occurs at runtime.
Time zone No Set the source time zone to ensure correct timestamp conversion when synchronizing data across time zones.
Note

Overall synchronization speed depends not only on the settings above, but also on source database performance and network conditions. For tuning guidance, see Accelerate or rate-limit batch synchronization tasks.

Step 5: Configure scheduling properties

For a periodically scheduled task, click Scheduling in the right-side panel to configure scheduling parameters, scheduling policy, schedule time, and scheduling dependencies.

For scheduling parameter usage examples, see Typical scenarios of using scheduling parameters in Data Integration.

Step 6: Test and publish the task

Configure debugging parameters

On the task configuration page, click Run Configuration in the right-side panel.

Parameter Description
Resource group Select a resource group connected to the data source.
Script parameters Assign values to placeholder parameters. For example, if the task uses ${bizdate}, enter a date in yyyymmdd format.

Run the task

Click the image Run icon in the toolbar to run and debug the task. After it completes, query the destination table to verify the data was synchronized correctly.

Publish the task

After the task runs successfully, if it needs to be scheduled periodically, click the image icon on the node configuration page to publish it to the production environment. See Publish tasks.

Limitations

  • Single-table batch synchronization tasks can only be configured in Data Development.

  • Some data sources do not support the codeless UI. If a message indicates that the codeless UI is not supported after you select a data source, click the image.png icon in the toolbar to switch to the code editor. See Code editor configuration.

    image.png

  • The codeless UI does not support all advanced features. For fine-grained control, convert the task to a script and configure it in the code editor.

Next steps

After publishing the task, go to Operation Center to monitor its schedule and execution. For task management, status monitoring, and resource group O&M, see O&M for batch synchronization tasks.

References