This topic describes how to configure a synchronization node by using the codeless user interface (UI) in Data Integration.

Procedure

  1. Add data sources.
  2. Create a batch synchronization node.
  3. Select the source.
  4. Select the destination.
  5. Map the fields in the source and destination tables.
  6. Configure channel control policies, such as the maximum transmission rate and the maximum number of dirty data records allowed.
  7. Configure the properties of the synchronization node.

Add data sources

A synchronization node can synchronize data between various homogeneous or heterogeneous data sources. On the DataStudio page of the DataWorks console, click the Workspace Manage icon in the upper-right corner. On the page that appears, click Data Source in the left-side navigation pane. On the Data Source page, add a data source. For more information, see Connection configuration.

After you add a data source, you can select the data source when you configure a synchronization node on the DataStudio page. For more information about the types of data sources that are supported by Data Integration, see Supported data sources, readers, and writers.
Note
  • Data Integration does not support connectivity testing for some types of data sources. For more information, see Select a network connectivity solution.
  • If an on-premises data source does not have a public IP address or is not accessible from a network, the connectivity testing fails when you configure the data source. To resolve the connection failure, you can use a custom resource group to connect to the data source. For more information, see Create a custom resource group for Data Integration. If a data source is not accessible from a network, Data Integration cannot obtain the table schema of the data source. In this case, you can configure a synchronization node for this data source only by using the code editor.

Create a workflow

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region in which the workspace that you want to manage resides, find the workspace and click Data Analytics in the Actions column.
  4. On the DataStudio page, move the pointer over the Create icon icon and select Workflow.
  5. In the Create Workflow dialog box, set the Workflow Name and Description parameters.
    Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
  6. Click Create.

Create a batch sync node

  1. Click the workflow that you created in the previous step to show its content and right-click Data Integration.
  2. Choose Create > Batch Synchronization.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
  4. Click Commit.

Select the source

After you create a batch synchronization node, you must select a data source and a table in the Source section. Source section
Note
  • For more information about how to set the parameters in the Source section, see Configure a reader.
  • By default, the Table drop-down list displays a maximum of 25 tables. If the selected data source contains more than 25 tables and the table that you want to select is not displayed in the Table drop-down list, enter the name of the table in the Table field. Alternatively, configure the batch synchronization node by using the code editor.
  • You may need to synchronize incremental data from the source. In this case, you can use the scheduling parameters of DataWorks to specify the date and time for incremental data synchronization. For more information, see Configure scheduling parameters.

Configure the destination

After you configure the source, you must select a data source and a table in the Target section.
Note
  • For more information about how to set the parameters in the Target section, see Configure a writer.
  • You must specify a write mode, such as overwriting or appending, for most synchronization nodes. The write mode that you can specify for a synchronization node varies based on the data source type.

Map the fields in the source and destination tables

After you specify the source and destination tables, you must configure the mappings between fields in the source and destination tables. You can click Map Fields with the Same Name, Map Fields in the Same Line, Delete All Mappings, or Auto Layout to perform the related operation. Mappings section
Parameter Description
Map Fields with the Same Name Click Map Fields with the Same Name to establish mappings between fields with the same name. The data types of the fields must match.
Map Fields in the Same Line Click Map Fields in the Same Line to establish mappings between fields in the same row. The data types of the fields must match.
Delete All Mappings Click Delete All Mappings to remove the mappings that have been established.
Auto Layout Click Auto Layout to sort the fields based on specific rules.
Change Fields Click the Change Fields icon. In the Change Fields dialog box, you can manually edit the fields in the source table. Each field occupies a row. The first and the last blank rows are included, whereas other blank rows are ignored.
Add
  • You can enter constants. Each constant must be enclosed in single quotation marks ('), such as 'abc' and '123'.
  • You can use scheduling parameters, such as ${bizdate}.
  • You can specify the partition key columns from which you want to read data, such as pt.
  • You can use functions supported by rational databases, such as now() and count(1). MaxCompute functions are not supported.
  • If the field that you entered cannot be parsed, the value of the Type parameter for the field is displayed as Custom.

    For example, if you add a partition key column of a MaxCompute table or a column of a LogHub table that cannot be previewed, the value of the Type parameter for this column is displayed as Custom. This does not affect the execution of the synchronization node.

Note Make sure that the data type of a source field is the same as that of the mapped destination field or the data type conversion is feasible.

Configure channel control policies

After you complete the preceding steps, you can configure channel control policies for the synchronization node. Channel section
Parameter Description
Expected Maximum Concurrency The maximum number of parallel threads that the synchronization node uses to read data from the source or write data to the destination. You can configure the parallelism for the synchronization node on the codeless UI.
Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and specify a maximum transmission rate to prevent heavy read workloads on the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to an appropriate value based on the configurations of the source.
Dirty Data Records Allowed The maximum number of dirty data records allowed.
Distributed Execution The distributed execution mode. In distributed mode, your node can be sliced to multiple ECS instances for parallel execution. This speeds up synchronization. If a large number of synchronization nodes are run in parallel, excessive access requests are sent to the data source. Evaluate the load on the data source before you use this mode. You can use this mode only when you use exclusive resource groups for Data Integration.

Configure the properties of the synchronization node

In most cases, synchronization nodes use scheduling parameters to filter data. This section describes how to set scheduling parameters for a synchronization node.

On the configuration tab of the batch synchronization node, click Properties in the right-side navigation pane.

You can specify a variable in scheduling parameters in the format of ${Variable name}. After a variable is specified, enter the initial value of the variable in the corresponding field. In this example, the initial value of the variable is identified by $[]. The content can be a time expression or a constant.

For example, if you write ${today} in the code and enter today=$[yyyymmdd] in the corresponding field, the value of the time variable is the current date. For more information about how to add or subtract the date, see Configure scheduling parameters.

On the Properties tab, you can configure the properties of the synchronization node, such as the recurrence, time when the node is run, and dependencies. Batch synchronization nodes do not have ancestor nodes because they are run before extract, transform, and load (ETL) nodes. We recommend that you specify the root node of the workspace as their ancestor node.