DataWorks data comparison nodes let you verify data consistency between tables across different data sources. Use them in workflows to detect discrepancies in row counts, field-level metrics, or row content — without writing custom scripts.
Prerequisites
Before you begin, ensure that you have:
A DataWorks workspace with Data Development access
At least one serverless resource group. Data comparison nodes only run on serverless resource groups. For details, see Resource group management
Data sources for both the source and destination tables already configured in DataWorks
Step 1: Create a data comparison node
Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select your workspace from the drop-down list and click Go to Data Development.
On the DataStudio page, click the
icon and choose Create Node > Data Quality > Data Comparison.Specify the node path and name, then confirm to create the node.
Step 2: Configure the node
Configure table information
Set up the source and destination tables to compare.
Parameter | Description |
Resource Group | Select the serverless resource group to run the node. |
Task Resource Usage | Adjust the compute resources allocated to the node. |
Data Source Type | Select the data source type for the source and destination tables. |
Data Source Name | Select the data sources for the source and destination tables. |
Connection Status | Click Test to verify the data source can connect to the resource group. |
Table Name | Select the source and destination tables from the drop-down list. |
WHERE Condition | (Optional) Filter the rows to include in the comparison. |
Shard Key | Specify a column to partition the source data. Use a primary key or indexed column for best performance. |
Choose a comparison type
Select either Metric-based Comparison or Full-text Comparison based on your validation goal.
Goal | Comparison type |
Verify row counts or aggregated field values (SUM, AVG, MAX, MIN) are within an acceptable range | Metric-based Comparison |
Verify every row's content matches exactly, or that all source rows exist in the destination | Full-text Comparison |
Configure scheduling
After configuring the comparison rules, set up scheduling to run the node automatically. For details, see Node scheduling configuration.
Step 3: Deploy the node
Click the
icon in the toolbar to save the node.Click the
icon to submit the node. In the Submit dialog box, enter a Change description.Code review (optional): If enabled, a reviewer must approve the code before deployment. This prevents untested code from reaching the production environment. For details, see Code review.
Smoke testing (optional): Run a test before deployment to verify the node runs as expected. For details, see Smoke testing.
You must set the Rerun property and Parent Nodes for the node before you can commit it.
If your workspace uses standard mode, click Deploy in the upper-right corner after submitting to publish the node to the production environment. For details, see Deploy tasks.
After deployment, the node runs automatically on the schedule you configured.
View the data validation report
After a run completes, check the data validation report to see whether the comparison passed and where discrepancies were found.
View from Operation Center
Click the
icon and choose All Products > Data Development And Task Operation > Operation Center.In the left navigation pane, choose Auto Triggered Node O&M > Auto Triggered Instances to find the instance for your data comparison node.
In the Actions column, click More and select View Runtime Log.
On the log page, click the Data Comparison tab to view the report.
View from the Data Development page
If you run the node directly from DataStudio, click the link in the runtime log to go to the data validation report page.

Manage the node
After the node is deployed, perform operations and maintenance in Operation Center. For details, see Operation Center.
What's next
Node scheduling configuration — adjust when and how often the node runs
Operation Center — monitor instances and view run history