DataWorks provides data comparison nodes that allow you to compare data between different tables in multiple ways. You can use data comparison nodes in workflows. This topic describes how to use a data comparison node to develop tasks.
Node introduction
Data comparison nodes are used not only for data integration but also support comparison between tables. They also support custom comparison scopes and custom comparison metrics, enabling more comprehensive data comparisons.
Limitations
Only Serverless resource groups are supported. For more information about how to add and use Serverless resource groups, see Add and use a Serverless resource group.
I. Create a data comparison node
-
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
-
In the navigation pane on the left, click the
icon to go to Data Development. To the right of the Project Directory, click the
icon and choose . Follow the prompts to specify a path and name for the node.
II. Configure the data comparison node
1. Configure comparison table information
The data comparison node only requires simple configuration of comparison table information to compare table data from different data sources. The configuration details are as follows:
|
Parameter |
Description |
|
Resource Group |
Select an existing resource group from the drop-down list. |
|
Task Resource Usage |
The number of compute units (CUs) that are allocated to run the data comparison node. You can configure this parameter as needed. |
|
Data Source Type |
Select the types of data sources to which the source and destination tables separately belong. |
|
Data Source Name |
Select the data sources to which the source and destination tables separately belong. |
|
Connectivity |
After you complete the configuration, click Test to verify connectivity between the data source and the resource group. |
|
Table Name |
Select the source and destination tables to be compared from the drop-down list. Note
For a MaxCompute data source, you can select a schema. |
|
WHERE Condition |
Enter a WHERE condition to filter data in the source and destination tables. Note
|
|
Shard Key |
Specifies a column in the source table as the shard key. We recommend that you use the primary key or an indexed column as the shard key. |
2. Configure comparison rules
You can configure rules for a Metric-based Comparison or a Full-text Comparison to compare source and destination data.
Metric-based comparison
Full-text comparison
3. Scheduling configuration
After you configure the rules, click Scheduling Settings in the right-side pane to configure the scheduling properties for the data comparison node. For more information, see Configure scheduling for a node.
III. Deployment and operations
1. Deploy the data comparison node
After a task on the data comparison node is configured, you must commit and deploy the node. After you commit and deploy the node, the system runs the node on a regular basis based on scheduling configurations.
-
Click the
icon in the top toolbar to save the node. -
Click the
icon in the top toolbar to deploy the node.
For detailed operations on deploying nodes, see Deploy a node or workflow.
2. Operate the data comparison node
After the data comparison node is successfully deployed, you can perform operations on the node in Operation Center. For more information, see Operation Center.
3. View the data validation report
You can view the data validation report in the task execution log through the following methods:
-
View in Operation Center:
-
In the upper-left corner of the page, click the
icon and choose to go to Operation Center. -
In the navigation pane on the left of Operation Center, choose to view the instances that are generated for the data comparison node. In the Operation column, click More and select View Run Logs.
-
On the log page, click the Data Comparison tab to view the report.
-
-
View on the Log tab:
If you run the data comparison node on the Data Development page, click the link in the log to open the data validation report.
Run successfully Click url below to view more details: https://dqc-cn-shanghai.data.aliyun.com/?defaultProjectId=814397&instanceId=1748247966625d1dfc93063c04f38a6549986 2196c455#/job/consistency-result-check/detail 2025-05-26 16:27:14 INFO ======================================================================== 2025-05-26 16:27:14 INFO Exit code of the Shell command 0 2025-05-26 16:27:14 INFO --- Invocation of Shell command completed --- 2025-05-26 16:27:14 INFO Shell run successfully! 2025-05-26 16:27:14 INFO Current task status: FINISH 2025-05-26 16:27:14 INFO Cost time is: 70.195s /home/admin/alisatasknode/taskinfo//20250526/executor/16/25/45/scn55otppszv9757kmhjsjbw/T3_6915242209.log-END-EOF