All Products
Search
Document Center

DataWorks:Configure a batch synchronization node by using the Code Editor

Last Updated:Mar 26, 2026

The Code Editor gives you full control over offline sync task configuration by letting you write a JSON script directly. Use it to sync full or incremental data from a single source table or sharded tables to a destination table, with support for DataWorks scheduling parameters to automate periodic runs.

For data-source-specific configuration details, see Data source list.

When to use the Code Editor

Use the Code Editor instead of the codeless UI in these situations:

  • The data source does not support codeless UI configuration.

    The data source page in the DataWorks UI indicates whether a data source supports the codeless UI.

    image.png

  • The data source has configuration parameters that are only available in the Code Editor.

  • The data source cannot be created in the DataWorks UI.

Prerequisites

Before you begin, make sure that:

Step 1: Create a batch synchronization node

Data Studio (new version)

  1. Log on to the DataWorks console. Switch to the destination region. In the left navigation pane, choose Data Development & O&M > Data Development. Select the desired workspace from the drop-down list and click Go to Data Studio.

  2. Create a workflow. See Orchestrate workflows.

  3. Create a batch synchronization node using one of these methods:

    • Method 1: Click the image icon in the upper-right corner of the workflow list and choose Create Node > Data Integration > Batch Synchronization.

    • Method 2: Double-click the workflow name and drag the Batch Synchronization node from the Data Integration directory to the workflow editor.

  4. Configure the basic information, source, and destination, then click OK.

DataStudio (legacy version)

  1. Log in to the DataWorks console. Switch to the destination region. In the left navigation pane, click Data Development & O&M > Data Development. Select the desired workspace from the drop-down list and click Go to Data Development.

  2. Create a workflow. See Create a workflow.

  3. Create a batch synchronization node using one of these methods:

    • Method 1: Expand the workflow, right-click Data Integration, and select Create Node > Batch Synchronization.

    • Method 2: Double-click the workflow name and drag the Batch Synchronization node from the Data Integration directory to the workflow editor.

  4. Create the node as prompted.

Step 2: Configure the data source and resource group

Switch from the codeless UI to the Code Editor at any step. For a fully populated JSON script, follow this order:

  1. Select the data source and resource group in the codeless UI and test network connectivity. The system automatically populates the generated JSON script with this information.

  2. Switch to the Code Editor.

Alternatively, switch to the Code Editor directly, specify the data source in the JSON code, and set the resource group and required resources in the Advanced Settings panel on the right.

If a resource group is not displayed, check whether it is attached to the workspace. See Use a Serverless resource group and Use exclusive resource groups for Data Integration. For recommended resource specifications, see Resource group performance metrics - Data Integration.

Step 3: Switch to the Code Editor and import a template

In the toolbar, click the Code Editor image icon.

image

If the script is not yet configured, click the Import Template Import template icon in the toolbar and follow the on-screen instructions to import a script template.

Step 4: Edit the script

The sync task script has the following top-level structure:

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "<reader-plugin>",
      "parameter": {
        "column": [],
        "where": "",
        "splitPk": ""
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "<writer-plugin>",
      "parameter": {
        "writeMode": "",
        "preSql": [],
        "postSql": []
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "executeMode": null,
    "speed": {
      "concurrent": 1,
      "throttle": false
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}
The type and version fields have default values and cannot be changed. You can ignore processor-related configurations.

The script has three functional sections: reader, writer, and channel control (in the setting section). Configuration details vary by plug-in. For plug-in-specific parameters, see the Reader Script Demo and Writer Script Demo sections for each data source in the Data source list.

Reader parameters

Configure the basic information and field mappings for reading source data.

ParameterDescriptionRequired?
whereA filter condition (a WHERE clause without the where keyword) to limit which source data is synced. Combine with scheduling parameters for incremental sync — for example, gmt_create >= '${bizdate}' syncs only records created on the current business date. If not set, all data is synced. See Scenario: Configure a batch synchronization task for incremental data and Supported formats of scheduling parameters.No
splitPkThe field used to split source data into shards for concurrent reading. Use the table's primary key — primary keys are typically evenly distributed, which prevents data hot spots. Only integer fields are supported; strings, floating-point numbers, and dates are not. If not set or left blank, data is synced through a single channel. Not all plug-ins support this parameter.No
columnAn array of source fields to sync. Supports constants, variables (for example, ${variable_name}), and functions (for example, now()).Yes
The incremental sync method varies by data source (plug-in).

Writer parameters

Configure how data is written to the destination.

ParameterDescriptionRequired?
preSqlSQL statements to run on the destination before data is written. For example, configure truncate table tablename in MySQL Writer to clear existing data before the sync starts.No
postSqlSQL statements to run on the destination after data is written.No
writeModeDefines how to write data when conflicts occur, such as path or primary key conflicts. The behavior and available values vary by data source and writer plug-in.Yes

Channel control parameters

Configure performance settings in the setting section.

ParameterDescriptionRequired?
executeModeControls distributed processing. Set to distribute to split the task into shards and distribute them across multiple execution nodes for concurrent execution — this allows sync speed to scale horizontally with the cluster size. Set to null for single-node mode, where concurrency is limited to a single machine. A concurrency of 8 or more is required to enable distributed mode. If an out-of-memory (OOM) error occurs at runtime, disable distributed mode.No
concurrentThe maximum number of threads for parallel reading from the source or writing to the destination. The actual concurrency at runtime may be less than or equal to the configured value, depending on resource specifications. See Performance metrics.No
throttleControls the sync rate. Set to true to enable throttling and protect the source database from excessive extraction load — also set the mbps parameter to define the rate (minimum 1 MB/s). Set to false to use the maximum transfer performance within the configured concurrency limits.No
errorLimitThe threshold for dirty data records. If not set, dirty data is allowed and the task continues running. Set to 0 to fail the task on any dirty data. Set a positive integer to allow dirty data up to that count — the task fails if the count is exceeded.No
For executeMode: If the exclusive resource group has only one machine, distributed mode cannot leverage multi-machine resources. If a single machine meets your speed requirements, use single-node mode to simplify task execution.
For throttle: The traffic metric is internal to Data Integration and does not represent actual network interface card (NIC) traffic. NIC traffic is typically 1–2 times the channel traffic.
Dirty data is any record that fails to write to the destination due to errors such as type mismatches (for example, writing a VARCHAR value to an INT column). An excessive amount of dirty data can reduce overall sync speed.
Overall sync speed is also affected by the source data source performance and the network environment. See Optimize an offline sync task.

Step 5: Configure scheduling properties

For periodically scheduled batch synchronization, configure the scheduling properties. On the node's edit page, click Scheduling on the right to configure scheduling parameters, a scheduling policy, a scheduling time, and scheduling dependencies.

For scheduling parameter usage examples, see Common scenarios of scheduling parameters in Data Integration.

Step 6: Submit and publish the task

Configure test parameters

On the batch synchronization task configuration page, click Debugging Configurations on the right and set the following:

Configuration itemDescription
Resource GroupSelect a resource group that is connected to the data source.
Script ParametersAssign values to placeholder parameters in the sync script. For example, if the script uses ${bizdate}, enter a date in yyyymmdd format.

Run the task

Click the image Run icon in the toolbar. After the task completes, create a node of the destination table type to query the destination table and verify that the synced data meets your expectations.

Publish the task

After the task runs successfully, click the image icon in the toolbar to publish the task to the production environment. See Publish tasks.

What's next

After publishing, go to Operation Center in the production environment to view and manage the scheduled task. See O&M for batch synchronization tasks.

References