DataWorks provides data comparison nodes that allow you to compare data between different tables in multiple ways. You can use data comparison nodes in workflows. This topic describes how to use a data comparison node to develop tasks.
Node introduction
Data comparison nodes can be used for data synchronization and support data comparison between tables. You can specify custom ranges and metrics to implement data comparison from different aspects.
Limits
Data comparison nodes support only serverless resource groups. For more information about how to use a serverless resource group, see Resource group management.
Procedure
Step 1: Create a data comparison node
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the Scheduled Workflow pane of the DataStudio page, move the pointer over the
icon and choose .
In the Create Node dialog box, configure the Path and Name parameters as prompted and click Confirm. The configuration tab of the node appears.
Step 2: Configure the data comparison node
Configure parameters in the Configure Information of Tables to Compare section
Data comparison nodes allow you to compare table data from different data sources based on only simple information configuration of tables whose data you want to compare. The following table describes the configuration details.
Parameter | Description |
Resource Group | Select an existing resource group from the drop-down list. |
Task Resource Usage | The number of compute units (CUs) that are allocated to run the data comparison node. You can configure this parameter based on your business requirements. |
Data Source Type | Select the types of data sources to which the source and destination tables separately belong. |
Data Source Name | Select the data sources to which the source and destination tables separately belong. |
Connection Status | Click Test to the right of the Connection Status parameter to check whether the selected data sources are connected to the selected resource group. |
Table Name | Separately select the source and destination tables from the drop-down list. |
WHERE Condition | Enter a WHERE condition to filter data in the source and destination tables. |
Shard Key | Specifies a column in the source table as the shard key. We recommend that you use the primary key or an indexed column as the shard key. |
Configure parameters in the Configure Comparison Rule section
You can configure metric-based comparison or full-text comparison rules for data comparison.
Configure scheduling properties
After you configure comparison rules, you can configure scheduling properties for the data comparison node. For more information, see Node scheduling configuration.
Step 3: Deploy and perform O&M operations on the data comparison node
Deploy the data comparison node
After a task on the data comparison node is configured, you must commit and deploy the node. After you commit and deploy the node, the system runs the node on a regular basis based on scheduling configurations.
Click the
icon in the top toolbar to save the node.
Click the
icon in the top toolbar to commit the node.
In the Submit dialog box, configure the Change description parameter. Then, determine whether to review node code and perform smoke testing after you commit the node based on your business requirements.
NoteYou must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
You can use the code review feature to ensure the code quality of nodes and prevent execution errors caused by invalid node code. If you enable the code review feature, the node code that is committed can be deployed only after the node code passes the code review. For more information, see Code review.
To ensure that a task on the node you created can be run as expected, we recommend that you perform smoke testing before you deploy the node. For more information, see Perform smoke testing.
If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner of the node configuration tab to deploy a task on the node to the production environment for running after you commit the task on the node. For more information, see Deploy nodes.
Perform O&M operations on the data comparison node
After the data comparison node is deployed, you can perform O&M operations on the node in Operation Center. For more information, see Operation Center.
View a data comparison report
You can use one of the following methods to view a data comparison report:
View in Operation Center:
In the upper-left corner of the current page, click the
icon and choose .
In the left-side navigation pane of the Operation Center page, choose
. On the Instance Perspective tab of the page that appears, find the instance that is generated for the data comparison node and choose More > View Runtime Log in the Actions column.On the Running Details tab of the page that appears, click the Data Comparison tab in the Execution step.
View on the Log tab:
If the data comparison node is run only on the DataStudio page, you can click the link that is shown in the following figure on the Log tab in the Execution step to go to the data comparison report page to view details.