This topic describes how to configure a sync node by using the code editor.

Development process

To create a sync node by using the code editor, follow these steps:
  1. Create connections.
  2. Create a batch synchronization node.
  3. Apply a template.
  4. Configure a reader for the sync node.
  5. Configure a writer for the sync node.
  6. Map the fields in the source and destination tables.
  7. Configure channel control policies, such as the maximum transmission rate and the maximum number of dirty data records allowed.
  8. Configure the node properties.

Create connections

A sync node can synchronize data between various homogeneous and heterogeneous data stores. In the DataWorks console, click the Workspace Manage icon in the upper-right corner. On the page that appears, click Data Source and add a connection. For more information, see Add connections.

After a connection is added, you can directly select it when configuring a sync node on the DataStudio page. For more information about connection types supported by Data Integration, see Supported data stores.

Note
  • Data Integration does not support connectivity testing for some connection types. For more information, see Test data store connectivity.
  • If a data store is deployed on the premises and does not have a public IP address or cannot be directly connected over a network, the connectivity test fails when you configure the connection. Data Integration allows you to resolve this issue by using a custom resource group. For more information, see Add a custom resource group.
    Note When a data store cannot be directly connected over a network, Date Integration cannot obtain the table schema. In this case, you can only create a sync node for this data store by using the code editor.

Create a workflow

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
  2. On the DataStudio page that appears, move the pointer over the Create a workflow icon and click Workflow.
  3. In the Create Workflow dialog box that appears, set Workflow Name and Description.
  4. Click Create.

Create a batch synchronization node

  1. Click the workflow to show its content and right-click Data Integration.
  2. Choose Create > Batch Synchronization.
  3. In the Create Node dialog box that appears, set Node Name and Location.
  4. Click Commit.

Apply a template

  1. After the sync node is created, the node configuration tab appears. Click the Switch to Code Editor icon in the toolbar.
  2. In the Confirm dialog box that appears, click OK to switch to the code editor.
    Note The code editor supports more features than the codeless user interface (UI). For example, you can configure sync nodes in the code editor even when the connectivity test fails.
  3. Click the Apply Template icon in the toolbar.
  4. In the Apply Template dialog box that appears, set Source Connection Type, Connection, Target Connection Type, and Connection.
  5. Click OK.

Configure a reader for the sync node

After the template is applied, the basic settings of the reader are generated. You can configure the source connection and source table as needed.
{"type": "job",
    "version": "2.0",
    "steps": [   // Do not modify the preceding lines. They indicate the header code of the sync node.
        {
            "stepType": "mysql",
            "parameter": {
                "datasource": "MySQL",
                "column": [
                    "id",
                    "value",
                    "table"
                ],
                "socketTimeout": 3600000,
                "connection": [
                    {
                        "datasource": "MySQL",
                        "table": [
                            "`case`"
                        ]
                    }
                ],
                "where": "",
                "splitPk": "",
                "encoding": "UTF-8"
            },
            "name": "Reader",
            "category": "reader" // Specifies that these settings are related to the reader.
        },   
The parameters are described as follows:
  • type: the type of the sync node. You must set the value to job.
  • version: the version number of the sync node. You can set the value to 1.0 or 2.0.
Note
  • For more information about how to configure the source connection, see Reader Configuration.
  • Incremental data synchronization is required when you configure the source connection for some sync nodes. In this case, you can use the scheduling parameters of DataWorks to obtain the date and time required by incremental data synchronization.

Configure a writer for the sync node

After the reader is configured, you can configure the destination connection and destination table as needed.

{ 
  "stepType": "odps",
  "parameter": {
      "partition": "",
      "truncate": true,
      "compress": false,
      "datasource": "odps_first",
      "column": [
          "*"
       ],
       "emptyAsNull": false,
       "table": ""
     },
     "name": "Writer",
     "category": "writer" // Specifies that these settings are related to the writer.
   }
 ],   
Note
  • For more information about how to configure the destination connection, see Writer Configuration.
  • You can select the writing method for most nodes. For example, the writing method can be overwriting or appending. Supported writing methods vary with the connection type.

Map the fields in the source and destination tables

The code editor only supports mapping of fields in the same row. Note that the data types of the fields must match.
Note Make sure that the data type of a source field is the same as or compatible with that of the mapped destination field.

Configure channel control policies

When the preceding steps are completed, you can continue to configure the channel control policies of the sync node. The setting parameter specifies node efficiency parameters, including the number of concurrent threads, bandwidth throttling, dirty data policy, and resource group.

"setting": {
        "errorLimit": {
            "record": "1024" // The maximum number of dirty data records allowed.
        },
        "speed": {
            "throttle": false, // Specifies whether to enable bandwidth throttling.
            "concurrent": 1, // The maximum number of concurrent threads.   
        }
    },
Parameter Description
Expected Maximum Concurrency The maximum number of concurrent threads to read and write data to data storage within the sync node. You can configure the concurrency for a node on the codeless UI.
Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value.
Dirty Data Records Allowed The maximum number of dirty data records allowed.
Resource Group You can specify a resource group by clicking Resource Group in the upper-right corner of the configuration tab of the sync node.

The resource group used for running the sync node. If a large number of nodes including this sync node are deployed on the default resource group, the sync node may need to wait for resources. We recommend that you purchase an exclusive resource group for data integration or add a custom resource group. For more information, see DataWorks exclusive resources and Add a custom resource group.

Configure the node properties

This section describes how to use scheduling parameters for data filtering.

On the DataStudio page, double-click the target batch synchronization node in the workflow. On the node configuration tab that appears, click the Properties tab in the right-side navigation pane to configure the node properties.

On the Properties tab, you can configure the properties of the sync node, such as the recurrence, time when the sync node is run, and dependencies. Batch synchronization nodes have no ancestor nodes because they are run before extract, transform, and load (ETL) nodes. We recommend that you specify the root node of the workspace as their parent node.

After the sync node is configured, save and commit the node. For more information about how to configure the node properties, see Schedule.