DataWorks provides data comparison nodes that allow you to compare data between different tables in multiple ways. You can use data comparison nodes in workflows. This topic describes how to use a data comparison node to develop tasks.
Node introduction
Data comparison nodes are used not only for data integration but also support comparison between tables. They also support custom comparison scopes and custom comparison metrics, enabling more comprehensive data comparisons.
Limitations
Only Serverless resource groups are supported. For more information about how to add and use Serverless resource groups, see Add and use a Serverless resource group.
I. Create a data comparison node
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
In the navigation pane on the left, click
to access Data Development. On the right side of Project Directory, click
and select . Follow the interface prompts to enter the node path and name information to complete the node creation.
II. Configure the data comparison node
1. Configure comparison table information
The data comparison node only requires simple configuration of comparison table information to compare table data from different data sources. The configuration details are as follows:
Parameter | Description |
Resource Group | Select an existing resource group from the drop-down list. |
Task Resource Usage | The number of compute units (CUs) that are allocated to run the data comparison node. You can configure this parameter as needed. |
Data Source Type | Select the types of data sources to which the source and destination tables separately belong. |
Data Source Name | Select the data sources to which the source and destination tables separately belong. |
Connectivity | After the configuration is complete, click Test to check whether the data source is connected to the resource group. |
Table Name | Select the source and destination tables to be compared from the drop-down list. |
Where Filter | Enter a WHERE condition to filter data in the source and destination tables. Note
|
Shard Key | Specifies a column in the source table as the shard key. We recommend that you use the primary key or an indexed column as the shard key. |
2. Configure comparison rules
Comparison rules can be set for Metric-based Comparison or Full-text Comparison, allowing you to compare data sources and targets using different comparison rules.
Metric-based comparison
Full-text comparison
3. Scheduling configuration
After completing the rule configuration, you can click Scheduling Configuration on the right side of the page to configure scheduling for the data comparison node. For configuration details, see Configure scheduling for a node.
III. Deployment and operations
1. Deploy the data comparison node
After a task on the data comparison node is configured, you must commit and deploy the node. After you commit and deploy the node, the system runs the node on a regular basis based on scheduling configurations.
Click the
icon in the top toolbar to save the node.Click the
icon in the top toolbar to deploy the node.
For detailed operations on deploying nodes, see Deploy a node or workflow.
2. Operate the data comparison node
After the data comparison node is successfully deployed, you can perform operations on the node in Operation Center. For more information, see Operation Center.
3. View the data validation report
You can view the data validation report in the task execution log through the following methods:
View in Operation Center:
Click the
button in the upper-left corner of the page and select to enter Operation Center.In the navigation pane on the left of Operation Center, click to view the instances generated by the data comparison node. Click Operation column's More and select View Running Log.
On the log page, click the Data Comparison tab to view.
View on the Log tab:
If you only run the data comparison node on the Data Development page, you can click the link shown in the image below on the Data Development page, which will redirect you to the data validation report page.

