This topic describes how to configure a sync node by using the codeless user interface (UI).

Development process

  1. Create connections.
  2. Create a batch synchronization node.
  3. Select a source connection.
  4. Select a destination connection.
  5. Map the fields in the source and destination tables.
  6. Configure channel control policies, such as the maximum transmission rate and the maximum number of dirty data records allowed.
  7. Configure the node properties.

Create connections

A sync node can synchronize data between various homogeneous and heterogeneous data stores. In the DataWorks console, click the Workspace Manage icon in the upper-right corner. On the page that appears, click Data Source and add a connection. For more information, see Add connections.

After a connection is added, you can directly select it when configuring a sync node on the DataStudio page. For more information about connection types supported by Data Integration, see Supported data stores.

Note
  • Data Integration does not support connectivity testing for some connection types. For more information, see Test data store connectivity.
  • If a data store is deployed on the premises and does not have a public IP address or cannot be directly connected over a network, the connectivity test fails when you configure the connection. Data Integration allows you to resolve this issue by using a custom resource group. For more information, see Add a custom resource group. When a data store cannot be directly connected over a network, Date Integration cannot obtain the table schema. In this case, you can only create a sync node for this data store by using the code editor.

Create a workflow

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
  2. On the DataStudio page that appears, move the pointer over the Create a workflow icon and click Workflow.
  3. In the Create Workflow dialog box that appears, set Workflow Name and Description.
  4. Click Create.

Create a batch synchronization node

  1. Click the workflow to show its content and right-click Data Integration.
  2. Choose Create > Batch Synchronization.
  3. In the Create Node dialog box that appears, set Node Name and Location.
  4. Click Commit.

Select a source connection

After the sync node is created, you can configure the source connection and source table as needed.Select a source connection
Note
  • For more information about how to configure the source connection, see Reader configuration.
  • Incremental data synchronization is required when you configure the source connection for some sync nodes. In this case, you can use the scheduling parameters of DataWorks to obtain the date and time required by incremental data synchronization.

Select a destination connection

After the source settings are completed, you can configure the destination connection and destination table as needed.
Note
  • For more information about how to configure the destination connection, see Writer configuration.
  • You can select the writing method for most nodes. For example, the writing method can be overwriting or appending. Supported writing methods vary with the connection type.

Map the fields in the source and destination tables

After selecting the source and destination connections, you must specify the mappings between fields in the source and destination tables. You can click Map Fields with the Same Name, Map Fields in the Same Line, Delete All Mappings, or Auto Layout to perform related operations. Mappings
Parameter Description
Map Fields with the Same Name Click Map Fields with the Same Name to establish a mapping between fields with the same name. Note that the data types of the fields must match.
Fields in the Same Line Click Map Fields in the Same Line to establish a mapping for fields in the same row. Note that the data types of the fields must match.
Delete All Mappings Click Delete All Mappings to remove mappings that have been established.
Auto Layout Click Auto Layout. The fields are automatically sorted based on specified rules.
Change Fields Click the Change Fields icon. In the Change Fields dialog box that appears, you can manually edit fields in the source table. Each field occupies a row. The first and the last blank rows are included, whereas other blank rows are ignored.
Add
  • Click Add to add a field. You can enter constants. Each constant must be enclosed in single quotation marks (' '), such as 'abc' and '123'.
  • You can use scheduling parameters, such as ${bizdate}.
  • You can enter functions supported by relational databases, such as now() and count(1).
  • Fields that cannot be parsed are indicated by Unidentified.
Note Make sure that the data type of a source field is the same as or compatible with that of the mapped destination field.

Configure channel control policies

When the preceding steps are completed, you can continue to configure the channel control policies of the sync node.Channel
Parameter Description
Expected Maximum Concurrency The maximum number of concurrent threads to read and write data to data storage within the sync node. You can configure the concurrency for a node on the codeless UI.
Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value.
Dirty Data Records Allowed The maximum number of dirty data records allowed.
Resource Group The resource group used for running the sync node. If a large number of nodes including this sync node are deployed on the default resource group, the sync node may need to wait for resources. We recommend that you purchase an exclusive resource group for data integration or add a custom resource group. For more information, see DataWorks exclusive resources and Add a custom resource group.

Configure the node properties

This section describes how to use scheduling parameters for data filtering.

On the configuration tab of the batch synchronization node, click the Properties tab in the right-side navigation pane.

You can specify the scheduling parameters by using ${Variable name}. After a variable is specified, enter the initial value of the variable in the Arguments field. In this example, the initial value of the variable is identified by $[]. The content can be a time expression or a constant.

For example, if you write ${today} in the code and enter today=$[yyyymmdd] in the Arguments field, the value of the time variable is the current date. For more information about how to add and subtract the date, see Scheduling parameters.

On the Properties tab, you can configure the properties of the sync node, such as the recurrence, time when the sync node is run, and dependencies. Batch synchronization nodes have no ancestor nodes because they are run before extract, transform, and load (ETL) nodes. We recommend that you specify the root node of the workspace as their parent node.