This topic describes how to create a sync node by using the code editor.

Procedure

To create a sync node by using the code editor, perform the following steps:
  1. Create connections.
  2. Create a batch sync node.
  3. Apply a template.
  4. Configure a reader for the sync node.
  5. Configure a writer for the sync node.
  6. Map the fields in the source and destination tables.
  7. Configure channel control policies, such as the maximum transmission rate and the maximum number of dirty data records allowed.
  8. Configure the node properties.

Create connections

A sync node can synchronize data between various homogeneous and heterogeneous data stores. Log on to the DataWorks console, go to the workspace in which you want to create connections, and then click the Workspace Manage icon in the upper-right corner. On the page that appears, click Data Source in the left-side navigation pane. On the Data Source page, click New data source in the upper-right corner to create a connection. For more information, see Configure an SQL Server connection.

After you create a connection, you can select it when you configure a sync node on the DataStudio page. For more information about the types of connection that Data Integration supports, see Supported data sources, readers, and writers.
Note
  • Data Integration does not support connectivity testing for some connection types. For more information, see Select a network connectivity solution.
  • If a data store is deployed locally and does not have a public IP address or cannot be directly connected over a network, the connectivity testing fails when you configure the connection. To resolve the connection failure, you can add a custom resource group. For more information, see Create a custom resource group for Data Integration.

    If a data store cannot be directly connected over a network, Data Integration cannot obtain the table schema. In this case, you can create a sync node for this data store only by using the code editor.

Create a workflow

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region where the required workspace resides, find the workspace and click Data Analytics.
  4. On the DataStudio page, move the pointer over the Create icon icon and select Workflow.
  5. In the Create Workflow dialog box, set the Workflow Name and Description parameters.
    Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
  6. Click Create.

Create a batch sync node

  1. Click the workflow that you created in the previous step to show its content and right-click Data Integration.
  2. Choose Create > Batch Synchronization.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
  4. Click Commit.

Apply a template

  1. On the node configuration tab that appears, click the Switch to Code Editor icon in the toolbar.
  2. In the Confirm message, click OK.
    Note The code editor supports more features than the codeless user interface (UI). For example, you can configure sync nodes in the code editor even when the connectivity test fails.
  3. Click the Apply Template icon in the toolbar.
  4. In the Apply Template dialog box, set the following parameters: Source Connection Type, Connection, Target Connection Type, and Connection.
  5. Click OK.

Configure a reader for the sync node

After the template is applied, the basic settings of the reader are generated. You can configure the source connection and source table as needed.
{"type": "job",
    "version": "2.0",
    "Steps": [ // Do not modify the preceding lines. They indicate the header code of the sync node.
        {
            "stepType": "mysql",
            "parameter": {
                "datasource": "MySQL",
                "column": [
                    "id",
                    "value",
                    "table"
                ],
                "socketTimeout": 3600000,
                "connection": [
                    {
                        "datasource": "MySQL",
                        "table": [
                            "`case`"
                        ]
                    }
                ],
                "where": "",
                "splitPk": "",
                "encoding": "UTF-8"
            },
            "name": "Reader",
            "category": "reader" // Specifies that these settings are related to the reader.
        },   
Parameter description:
  • type: the type of the sync node. You must set the value to job.
  • version: the version number of the sync node. You can set the value to 1.0 or 2.0.
Note
  • For more information about how to configure the source connection, see Configure MaxCompute Reader.
  • Some sync nodes may need to synchronize incremental data. In this case, you can use the scheduling parameters of DataWorks to specify the date and time for incremental data synchronization. For more information, see Configure scheduling parameters.

Configure a writer for the sync node

After the reader is configured, you can configure the destination connection and destination table as needed.
{ 
  "stepType": "odps",
  "parameter": {
      "postSql":[],// The SQL statement to execute after the sync node is run.
      "partition": "",
      "truncate": true,
      "compress": false,
      "datasource": "odps_first",
      "column": [
          "*"
       ],
       "emptyAsNull": false,
       "table": "",
       "preSql":[ 
               "delete from XXX;" // The SQL statement to execute before the sync node is run. Separate multiple statements with semicolons (;).
             ]
     },
     "name": "Writer",
     "category": "writer" // Specifies that these settings are related to the writer.
   }
 ],   
Note
  • For more information about how to configure the destination connection, see Configure MaxCompute Writer.
  • You can select the writing method for most nodes. For example, the writing method can be overwriting or appending. Supported writing methods vary with the connection type.

Map the fields in the source and destination tables

The code editor only supports mapping of fields in the same row. The data types of the fields must match.
Note Make sure that the data type of a source field is the same as or compatible with that of the mapped destination field.

Configure channel control policies

When the preceding steps are completed, you can continue to configure the channel control policies of the sync node. The setting parameter specifies node efficiency parameters, including the number of concurrent threads, bandwidth throttling, dirty data policy, and resource group.
"setting": {
        "errorLimit": {
            "record": "1024" // The maximum number of dirty data records allowed.
        },
        "speed": {
            "throttle": false // Specifies whether to enable bandwidth throttling.
            "concurrent": 1, // The maximum number of concurrent threads.   
        }
    },
Parameter Description
concurrent The maximum number of concurrent threads that the sync node uses to read data from or write data to data stores. You can configure the concurrency for the node on the codeless UI.
throttle Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value.
record The maximum number of dirty data records allowed.

Configure the node properties

This section describes how to use scheduling parameters for data filtering.

On the DataStudio page, double-click the batch sync node in the workflow. On the node configuration tab that appears, click the Properties tab in the right-side navigation pane to configure the node properties.

On the Properties tab, you can configure the properties of the sync node, such as the recurrence, time when the sync node is run, and dependencies. Batch sync nodes have no ancestor nodes because they are run before extract, transform, and load (ETL) nodes. We recommend that you specify the root node of the workspace as their parent node.

After the sync node is configured, save and commit the node. For more information about the node properties, see Basic properties.