All Products
Search
Document Center

DataWorks:Data comparison node

Last Updated:Jun 21, 2026

The DataWorks data comparison node lets you compare data between tables and can be used in a workflow. This topic describes how to develop tasks by using a data comparison node.

Introduction

The data comparison node supports table-to-table comparisons and allows you to customize the comparison scope and metrics for various scenarios. It is not limited to data integration.

Limitations

This feature supports only serverless resource groups. To learn more about using them, see Resource group management.

Procedure

Step 1: Create a data comparison node

  1. Log on to the DataWorks console. In the target region, click Data Development and O&M > Data Development in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.

  2. Click the image icon and choose Create Node > Data Quality > Data Comparison.

    Follow the on-screen instructions to enter the node path, name, and other information.

Step 2: Configure the data comparison node

Configure table information

Configure the table information for the data comparison node to compare data from different data sources. The following table describes the parameters.

Parameter

Description

Resource Group

Select an existing resource group from the drop-down list.

Task Resource Usage

Adjust the amount of resources that the data comparison node consumes when it runs.

Data Source Type

Select the data source types for the source and destination tables to be compared.

Data Source Name

Select the data sources for the source and destination tables to be compared.

Connection Status

After you complete the configuration, click Test to check the connectivity between the data source and the resource group.

Table Name

Select the source and destination tables from the drop-down list.

WHERE Condition

Filters the data from the source and destination tables for comparison.

Sharding Key

Configure a shard key for the source table to partition data by a specific column. Use a primary key or an indexed column as the shard key.

Configure comparison rules

You can perform a Metric-based Comparison or a Full-text Comparison to compare data between the source and destination tables based on different rules.

Metric-based Comparison

  • Table Row Comparison:

    Metric-based Comparison supports table-level comparisons by checking the number of rows. The comparison passes if the difference is within the specified error threshold.

    Note

    The error threshold can be a Percentage, an Absolute Value, or Consistent or Not.

  • Field-level Comparison:

    Field-level comparison matches fields with the same name by default. If the field names in the source and destination tables differ, click Add Field for Comparison to manually select the source and destination fields to create a comparison pair.

    • Source field: Select the field from the source table for comparison.

    • Destination field: Select the field from the destination table for comparison.

    • Comparison Metric: Includes common metrics such as MAX, AVG, MIN, and SUM.

      • You can configure multiple comparison metrics for a single pair of source and destination fields.

      • You can set different error thresholds and ignore options for each comparison metric.

    • Error Threshold: The calculated difference between the source and destination tables is compared against this threshold. The comparison passes if the difference is less than the threshold. You can set the threshold as a Percentage, an Absolute Value, or Consistent or Not.

      Note
      • Absolute value of error threshold = |Source table metric value - Destination table metric value|

      • Percentage of error threshold = (|Source table metric value - Destination table metric value|) / Source table metric value × 100%

    • Ignore: The supported ignore options vary based on the data types of the fields being compared.

      Field type for comparison

      Supported ignore options

      Integer type fields (such as INT, BIGINT, etc.)

      You can ignore the Difference Between Null Value and Value 0.

      String type fields (such as STRING, VARCHAR, TEXT, etc.)

      You can ignore Difference Between Null Value and Empty String.

      Numeric type fields (including integer and floating-point types)

      • Set the Floating Precision for the comparison.

      • You can ignore the Difference Between Null Value and Value 0.

      • You can Ignore trailing zeros in the decimal part..

      Integer and string type comparison

      You can Ignore trailing zeros in the decimal part..

      Integer and floating-point type comparison

      • You can Ignore trailing zeros in the decimal part..

      • You can ignore the Difference Between Null Value and Value 0.

      Floating-point and string type comparison

      You can Ignore trailing zeros in the decimal part..

    • Operation: You can remove redundant or unnecessary fields from the field comparison.

  • Configure Custom Comparison Rules:

    You can add custom SQL comparison metrics to compare the source and destination tables. The following steps describe the process:

    1. Click Add Custom SQL Comparison Metric to add a metric. You can manually rename the metric.

      After you add the metric, you can set the error threshold (supports percentages and absolute values), click Configure to edit the custom SQL content, or click Delete to remove the metric.

    2. Adjust the Error Threshold as needed. Supported options are Percentage, Absolute Value, and Consistent or Not.

    3. After you configure the error threshold, click Configure in the Custom SQL column to write SQL for the source and destination tables to calculate a custom metric.

    4. After you complete the configuration, click OK.

Full-text Comparison

  1. Full-text Comparison provides the following methods:

    • Source Data Contained in Destination: The comparison passes if the destination table contains all rows from the source table. The destination table may also contain additional data.

    • Comparison by Row: Compares the row count and content between the source and destination tables row by row.

      When configuring a row-by-row comparison, you must set an error threshold. Supported options are Percentage, Absolute Value, and Consistent or Not.

      Note
      • Absolute value of error threshold = |Source table metric value - Destination table metric value|

      • Percentage of error threshold = (|Source table metric value - Destination table metric value|) / Source table metric value × 100%

  2. After you configure the full-text comparison method, you can select the fields to compare. By default, fields with the same name are compared. To compare fields with different names, click Add Field for Comparison and select the source and destination fields.

    • Source field: Select the field from the source table for comparison.

    • Destination field: Select the field from the destination table for comparison.

    • Full-text Comparison Based on Primary Keys: A full-text comparison requires a primary key to match rows before comparing their content.

    • Ignore: The supported ignore options vary based on the data types of the fields being compared.

      Field type for comparison

      Supported ignore options

      Integer type fields (such as INT, BIGINT, etc.)

      You can ignore the Difference Between Null Value and Value 0.

      String type fields (such as STRING, VARCHAR, TEXT, etc.)

      You can ignore Difference Between Null Value and Empty String.

      Numeric type fields (including integer and floating-point types)

      • Set the Floating Precision for the comparison.

      • You can ignore the Difference Between Null Value and Value 0.

      • You can Ignore trailing zeros in the decimal part..

      Integer and string type comparison

      You can Ignore trailing zeros in the decimal part..

      Integer and floating-point type comparison

      • You can Ignore trailing zeros in the decimal part..

      • You can ignore the Difference Between Null Value and Value 0.

      Floating-point and string type comparison

      You can Ignore trailing zeros in the decimal part..

    • Operation: You can Delete redundant or unnecessary fields from the field comparison.

  3. The results of a Full-text Comparison must be stored in a configured data source for later review.

    • Data Source Type: Only MaxCompute data sources are supported.

    • Data Source Name: Select a MaxCompute data source that is bound to the workspace from the drop-down list.

    • Connection Status: Ensure that the selected MaxCompute data source can connect to the resource group that is configured for the table comparison.

    • Table for Storage: Click Generate Storage Table to create a table named data_comparison_xxxxxx.

    • Tunnel quota: Select a MaxCompute data transmission resource. For more information, see Purchase and use exclusive resource groups for Data Transmission Service.

Scheduling configuration

After you configure the rules, you can configure the scheduling properties for the data comparison node. For more information, see Configure scheduling properties for a node.

Step 3: Deploy and manage

Deploy the node

After configuring the node, submit and deploy it. The deployed node then runs periodically based on its scheduling configuration.

  1. In the toolbar, click the image icon to Save the node.

  2. In the toolbar, click the image icon to Submit the node.

    In the Submission dialog box, enter a Change Description. If required, select whether to perform a code review and smoke testing after the node is submitted.

    Note
    • You must configure the Rerun attribute and Parent Nodes dependencies before you can submit the node.

    • Code review helps ensure code quality and prevents tasks with flawed code from being deployed to the production environment. If code review is enabled, the submitted node code must be approved by a reviewer before it can be deployed. For more information, see Code review.

    • To ensure that the scheduled node runs as expected, we recommend that you perform smoke testing before deployment. For more information, see Smoke testing.

If you are using a workspace in standard mode, you must also click Deploy in the upper-right corner of the node editing page after you submit the task. This action publishes the task to the production environment. For more information, see Deploy tasks.

Manage the node

After the data comparison node is deployed, you can manage its operations in the Operation Center. For more information, see Operation Center.

Data validation report

You can view the data validation report in the task's runtime log. You can access the report in the following ways:

  • View in Operation Center:

    1. Click the image icon and choose All Products > Data Development and O&M > Operation and Maintenance Center (Workflow) to go to the Operation Center.

    2. In the navigation pane on the left of Operation Center, go to Auto Triggered Task O&M > Auto Triggered Instances to view the node's instances. In the Operation column, click More and select View Runtime Log.

    3. On the log page, click the Data Comparison tab to view the report.

  • View from the runtime log:

    When you run the data comparison node from the Data Development page, a link to the data validation report appears. Click this link to view the report.

    Click url below to view more details: xxx