This topic describes how to configure a sync node by using the codeless user interface (UI).

Development process

  1. Create connections.
  2. Create a batch sync node.
  3. Configure a connection to the source data store.
  4. Configure a connection to the destination data store.
  5. Map the fields in the source and destination tables.
  6. Configure channel control policies, such as the maximum transmission rate and the maximum number of dirty data records allowed.
  7. Configure the node properties.

Create connections

A sync node can synchronize data between various homogeneous and heterogeneous data stores. Log on to the DataWorks console. Go to the DataStudio page of the workspace in which you want to create connections and click the Workspace Manage icon in the upper-right corner. On the page that appears, click Data Source in the left-side navigation pane. On the Data Source page, click New data source in the upper-right corner. In the Add data source dialog box, create a connection. For more information, see Connection configuration.

After you create a connection, you can select it when you configure a sync node on the DataStudio page. For more information about the types of connections that Data Integration supports, see Supported data sources, readers, and writers.
Note
  • Data Integration does not support connectivity testing for some types of connections. For more information, see Select a network connectivity solution.
  • If an on-premises data store does not have a public IP address or is not accessible from a network, the connectivity testing fails when you configure the connection. To resolve the connection failure, you can create a custom resource group. For more information, see Create a custom resource group for Data Integration. If a data store is not accessible from a network, Data Integration cannot obtain the table schema. In this case, you can configure a sync node for this data store only in the code editor.

Create a workflow

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region where the required workspace resides, find the workspace and click Data Analytics.
  4. On the DataStudio page, move the pointer over the Create icon icon and select Workflow.
  5. In the Create Workflow dialog box, set the Workflow Name and Description parameters.
    Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
  6. Click Create.

Create a batch sync node

  1. Click the workflow that you created in the previous step to show its content and right-click Data Integration.
  2. Choose Create > Batch Synchronization.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
  4. Click Commit.

Select a connection to the source data store and a source table

After you create a batch sync node, you must select a connection to the source data store and a source table. Data source
Note
  • For more information about how to set the parameters in the Source section, see Reader configuration.
  • By default, a maximum of 25 tables in the selected data store are displayed in the Table drop-down list. If the selected data store contains more than 25 tables and the table that you want to select is not displayed in the Table drop-down list, enter the name of the table in the Table field. Alternatively, configure the batch sync node in the code editor.
  • Some sync nodes may need to synchronize incremental data. In this case, you can use the scheduling parameters of DataWorks to specify the date and time for incremental data synchronization. For more information, see Configure scheduling parameters.

Select a connection to the destination data store and a destination table

After you select a connection to the source data store and a source table, you must select a connection to the destination data store and a destination table.
Note
  • For more information about how to set the parameters in the Target section, see Writer configuration.
  • You can specify the write mode, for example, overwriting or appending, for most sync nodes. The write mode that you can specify for a sync node varies based on the connection type that you selected.

Map the fields in the source and destination tables

After you specify the source and destination tables, you must specify the mappings between fields in the source and destination tables. You can click Map Fields with the Same Name, Map Fields in the Same Line, Delete All Mappings, or Auto Layout to perform related operations. Mappings section
GUI element Description
Map Fields with the Same Name Click Map Fields with the Same Name to establish a mapping between fields with the same name. The data types of the fields must match.
Map Fields in the Same Line Click Map Fields in the Same Line to establish a mapping between fields in the same row. The data types of the fields must match.
Delete All Mappings Click Delete All Mappings to remove mappings that have been established.
Auto Layout Click Auto Layout to sort the fields based on specified rules.
Change Fields Click the Change Fields icon. In the Change Fields dialog box, you can manually edit the fields in the source table. Each field occupies a row. The first and the last blank rows are included, whereas other blank rows are ignored.
Add
  • Click Add to add a field. You can enter constants. Each constant must be enclosed in single quotation marks (' '), for example, 'abc' and '123'.
  • You can use scheduling parameters such as ${bizdate}.
  • You can enter functions that are supported by relational databases, for example, now() and count(1).
  • Fields that cannot be parsed are indicated by Unidentified.
Note Make sure that the data type of a source field is the same as or compatible with that of the mapped destination field.

Configure channel control policies

After you complete the preceding steps, you can configure the channel control policies for the sync node. Channel section
Parameter Description
Expected Maximum Concurrency The maximum number of concurrent threads that the sync node uses to read data from or write data to data stores. You can configure the concurrency for the node on the codeless UI.
Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value.
Dirty Data Records Allowed The maximum number of dirty data records allowed.

Configure the node properties

This section describes how to use scheduling parameters for data filtering.

On the configuration tab of the batch sync node, click the Properties tab in the right-side navigation pane.

You can specify scheduling parameters by using ${Variable name}. After a variable is specified, enter the initial value of the variable in the Arguments field. In this example, the initial value of the variable is identified by $[]. The content can be a time expression or a constant.

For example, if you write ${today} in the code and enter today=$[yyyymmdd] in the Arguments field, the value of the time variable is the current date. For more information about how to add and subtract the date, see Configure scheduling parameters.

On the Properties tab, you can configure the properties of the sync node, such as the recurrence, time when the sync node is run, and dependencies. Batch sync nodes have no ancestor nodes because they are run before extract, transform, and load (ETL) nodes. We recommend that you specify the root node of the workspace as their parent node.