The DataWorks data comparison node lets you compare data between tables and can be used in a workflow. This topic describes how to develop tasks by using a data comparison node.
Introduction
The data comparison node supports table-to-table comparisons and allows you to customize the comparison scope and metrics for various scenarios. It is not limited to data integration.
Limitations
This feature supports only serverless resource groups. To learn more about using them, see Resource group management.
Procedure
Step 1: Create a data comparison node
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.
-
Click the
icon and choose .Follow the on-screen instructions to enter the node path, name, and other information.
Step 2: Configure the data comparison node
Configure table information
Configure the table information for the data comparison node to compare data from different data sources. The following table describes the parameters.
|
Parameter |
Description |
|
Resource Group |
Select an existing resource group from the drop-down list. |
|
Task Resource Usage |
Adjust the amount of resources that the data comparison node consumes when it runs. |
|
Data Source Type |
Select the data source types for the source and destination tables to be compared. |
|
Data Source Name |
Select the data sources for the source and destination tables to be compared. |
|
Connection Status |
After you complete the configuration, click Test to check the connectivity between the data source and the resource group. |
|
Table Name |
Select the source and destination tables from the drop-down list. |
|
WHERE Condition |
Filters the data from the source and destination tables for comparison. |
|
Sharding Key |
Configure a shard key for the source table to partition data by a specific column. Use a primary key or an indexed column as the shard key. |
Configure comparison rules
You can perform a Metric-based Comparison or a Full-text Comparison to compare data between the source and destination tables based on different rules.
Scheduling configuration
After you configure the rules, you can configure the scheduling properties for the data comparison node. For more information, see Configure scheduling properties for a node.
Step 3: Deploy and manage
Deploy the node
After configuring the node, submit and deploy it. The deployed node then runs periodically based on its scheduling configuration.
-
In the toolbar, click the
icon to Save the node. -
In the toolbar, click the
icon to Submit the node.In the Submission dialog box, enter a Change Description. If required, select whether to perform a code review and smoke testing after the node is submitted.
Note-
You must configure the Rerun attribute and Parent Nodes dependencies before you can submit the node.
-
Code review helps ensure code quality and prevents tasks with flawed code from being deployed to the production environment. If code review is enabled, the submitted node code must be approved by a reviewer before it can be deployed. For more information, see Code review.
-
To ensure that the scheduled node runs as expected, we recommend that you perform smoke testing before deployment. For more information, see Smoke testing.
-
If you are using a workspace in standard mode, you must also click Deploy in the upper-right corner of the node editing page after you submit the task. This action publishes the task to the production environment. For more information, see Deploy tasks.
Manage the node
After the data comparison node is deployed, you can manage its operations in the Operation Center. For more information, see Operation Center.
Data validation report
You can view the data validation report in the task's runtime log. You can access the report in the following ways:
-
View in Operation Center:
-
Click the
icon and choose to go to the Operation Center. -
In the navigation pane on the left of Operation Center, go to to view the node's instances. In the Operation column, click More and select View Runtime Log.
-
On the log page, click the Data Comparison tab to view the report.
-
-
View from the runtime log:
When you run the data comparison node from the Data Development page, a link to the data validation report appears. Click this link to view the report.
Click url below to view more details: xxx