DataWorks data comparison nodes allow you to compare data between different tables in various ways. You can use these nodes in workflows. This topic describes how to use a data comparison node to develop a task.
Node introduction
Data comparison nodes are used for more than just data integration. They support data comparison between tables. You can also specify custom comparison ranges and metrics for more flexible data comparisons.
Limits
Data comparison nodes support only serverless resource groups. For more information about serverless resource groups, see Resource group management.
Procedure
Step 1: Create a data comparison node
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Click the
icon and choose .Follow the on-screen instructions to specify the node path and name.
Step 2: Configure the data comparison node
Configure table information for comparison
You can compare table data from different data sources by configuring the basic information of the tables. The following table describes the parameters.
Parameter | Description |
Resource Group | Select an existing resource group from the drop-down list. |
Task Resource Usage | Adjust the amount of resources that the data comparison node consumes when it runs. |
Data Source Type | Select the data source types for the source and destination tables that you want to compare. |
Data Source Name | Select the data sources for the source and destination tables that you want to compare. |
Connection Status | After you complete the configuration, click Test to check if the data source is connected to the resource group. |
Table Name | Select the source and destination tables from the drop-down list. |
WHERE Condition | Filter the data in the source and destination tables that you want to compare. |
Shard Key | Configure a shard key for the source table. A shard key is a column used to partition the data. We recommend that you use a primary key or an indexed column as the shard key. |
Configure comparison rules
You can configure Metric-based Comparison or Full-text Comparison rules to compare the source data with the destination data.
Scheduling configuration
After you configure the rules, you can configure scheduling properties for the data comparison node. For more information, see Node scheduling configuration.
Step 3: Deploy and maintain the node
Deploy the data comparison node
After you configure the node task, you must commit and deploy it. After the task is committed and deployed, it runs periodically based on the scheduling configuration.
Click the
icon in the toolbar to save the node.Click the
icon in the toolbar to submit the node.In the Submit dialog box, enter Change description. If required, select whether to perform a code review and smoke testing after the node is committed.
NoteYou must set the Rerun property and Parent Nodes for the node before you can commit it.
Code review helps control the quality of your task code. It prevents task errors that can occur if incorrect code is published to the production environment without review. If you enable code review, the committed code must be approved by a reviewer before it can be deployed. For more information, see Code review.
To ensure that the scheduled node task runs as expected, we recommend that you perform smoke testing on the task before you deploy it. For more information, see Smoke testing.
If you use a workspace in standard mode, you must also click Deploy in the upper-right corner of the node editing page after the task is committed. This publishes the task to the production environment. For more information, see Deploy tasks.
Maintain the data comparison node
After the data comparison node is deployed, you can perform operations and maintenance (O&M) on the node in the Operation Center. For more information, see Operation Center.
View the data validation report
You can view the data validation report in the task run log. You can view the report in the following ways:
View in the Operation Center:
Click the
icon and choose to go to the Operation Center.In the navigation pane on the left of the Operation Center, choose to view the instance generated for the data comparison node. In the Actions column, click More and select View Runtime Log.
On the log page, click the Data Comparison tab to view the report.
View in the runtime log:
If you run the data comparison node from the Data Development page, you can click the link in the runtime log, as shown in the following figure, to go to the data validation report page.
