All Products
Search
Document Center

DataWorks:Data comparison

Last Updated:Jun 21, 2026

DataWorks provides data comparison nodes that allow you to compare data between different tables in multiple ways. You can use data comparison nodes in workflows. This topic describes how to use a data comparison node to develop tasks.

Node introduction

Data comparison nodes are used not only for data integration but also support comparison between tables. They also support custom comparison scopes and custom comparison metrics, enabling more comprehensive data comparisons.

Limitations

Only Serverless resource groups are supported. For more information about how to add and use Serverless resource groups, see Add and use a Serverless resource group.

I. Create a data comparison node

    1. Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose Shortcuts > Data Studio in the Actions column.

  1. In the navigation pane on the left, click the image icon to go to Data Development. To the right of the Project Directory, click the image icon and choose New Node > Data Quality > Data Comparison. Follow the prompts to specify a path and name for the node.

II. Configure the data comparison node

1. Configure comparison table information

The data comparison node only requires simple configuration of comparison table information to compare table data from different data sources. The configuration details are as follows:

Parameter

Description

Resource Group

Select an existing resource group from the drop-down list.

Task Resource Usage

The number of compute units (CUs) that are allocated to run the data comparison node. You can configure this parameter as needed.

Data Source Type

Select the types of data sources to which the source and destination tables separately belong.

Data Source Name

Select the data sources to which the source and destination tables separately belong.

Connectivity

After you complete the configuration, click Test to verify connectivity between the data source and the resource group.

Table Name

Select the source and destination tables to be compared from the drop-down list.

Note

For a MaxCompute data source, you can select a schema.

WHERE Condition

Enter a WHERE condition to filter data in the source and destination tables.

Note
  • You do not need to enter the WHERE keyword when configuring.

  • When comparing partitioned tables, we recommend that you specify the partition before execution. Otherwise, the following error will occur: Semantic analysis exception - physical plan generation failed: Table(<MaxCompute project name>,<table name>) is full scan with all partitions, please specify partition predicates.

Shard Key

Specifies a column in the source table as the shard key. We recommend that you use the primary key or an indexed column as the shard key.

2. Configure comparison rules

You can configure rules for a Metric-based Comparison or a Full-text Comparison to compare source and destination data.

Metric-based comparison

  • Table-level Comparison:

    Table Row Comparison: Metric-based comparison supports table-level checks, such as comparing the row counts of two tables. The comparison succeeds if the difference is within the specified error threshold.

    Note

    The error threshold can be set as a Percentage, Absolute Value, or Consistent or Not.

  • Field-level Comparison:

    By default, fields with the same name are compared. If the field names in the source and destination tables are different, click Add Field for Comparison to manually map a source field to a destination field for comparison.

    • Source Field: Select the field from the source table to compare.

    • Target Field: Select the field from the destination table to compare.

    • Comparison Metric: You can compare fields by using common aggregate functions, including MAX, AVG, MIN, and SUM.

      • You can configure multiple comparison metrics for a pair of source and destination table fields.

      • You can set the Error Threshold and Ignored Object parameters to different values for different comparison metrics.

    • Error Threshold: The comparison succeeds if the actual difference is less than the configured error threshold. You can set the threshold as a Percentage, Absolute Value, or Consistent or Not.

      Note
      • Error Threshold Absolute Value = |Source Table Metric Value - Destination Table Metric Value|

      • Error Threshold of Error Threshold = (|Source metric value - Destination metric value|) / Source metric value × 100%

    • Ignore: The supported ignore options vary based on the data types of the fields being compared.

      Field type for comparison

      Supported ignore options

      Integer type fields (such as INT, BIGINT, etc.)

      You can ignore the Difference Between Null Value and Value 0.

      String type fields (such as STRING, VARCHAR, TEXT, etc.)

      You can ignore Difference Between Null Value and Empty String.

      Numeric type fields (including integer and floating-point types)

      • Set the Floating Precision for the comparison.

      • You can ignore the Difference Between Null Value and Value 0.

      • You can Ignore trailing zeros in the decimal part..

      Integer and string type comparison

      You can Ignore trailing zeros in the decimal part..

      Integer and floating-point type comparison

      • You can Ignore trailing zeros in the decimal part..

      • You can ignore the Difference Between Null Value and Value 0.

      Floating-point and string type comparison

      You can Ignore trailing zeros in the decimal part..

    • Operation: You can remove unneeded comparison fields.

  • Configure Custom Comparison Rules:

    You can perform the following operations to add custom SQL comparison metrics to compare data in the source and destination tables:

    1. Click Add Custom SQL Comparison Metric to add the metrics that you want to compare. You can manually rename the metrics.

      After you add a metric, the table displays options to configure the Error Threshold, write the Custom SQL (by clicking Configure), and delete the metric by using the Delete option.

    2. Adjust the Error Threshold as needed. You can set the threshold as a Percentage, Absolute Value, or Consistent or Not.

    3. After you configure the error threshold, click Custom SQL in the Custom SQL column to write SQL queries for the source and destination tables to define how the metric is calculated.

    4. After you complete the configuration, click OK.

Full-text comparison

  1. When you set the comparison method to full-text, you can adjust the Full-text Comparison Type to achieve different comparison effects.

    • Source Data Contained in Destination: The comparison succeeds if every row in the source table is also present in the destination table. In this case, the destination table may contain more rows than the source table.

    • Comparison by Row: Compares the source and destination tables row by row to identify differences in content and row count.

      When you configure a row-by-row comparison, you must set an Error Threshold. You can set the threshold as a Percentage, Absolute Value, or Consistent or Not.

      Note
      • Error Threshold Absolute Value = |Source Table Metric Value - Destination Table Metric Value|

      • Error Threshold of Error Threshold = (|Source metric value - Destination metric value|) / Source metric value × 100%

  2. After you configure the Full-text Comparison Type, you can select the fields to compare. By default, fields with the same name are compared. To compare fields with different names, you must manually click Add Comparison Field and select the source and destination fields.

    • Source Field: Select the source table field to compare.

    • Target Field: Select the destination table field to compare.

    • Compare Primary Keys: For a full-text comparison, you must specify a primary key to match rows before comparing their content.

    • Ignore: The supported ignore options vary based on the data types of the fields being compared.

      Field type for comparison

      Supported ignore options

      Integer type fields (such as INT, BIGINT, etc.)

      You can ignore the Difference Between Null Value and Value 0.

      String type fields (such as STRING, VARCHAR, TEXT, etc.)

      You can ignore Difference Between Null Value and Empty String.

      Numeric type fields (including integer and floating-point types)

      • Set the Floating Precision for the comparison.

      • You can ignore the Difference Between Null Value and Value 0.

      • You can Ignore trailing zeros in the decimal part..

      Integer and string type comparison

      You can Ignore trailing zeros in the decimal part..

      Integer and floating-point type comparison

      • You can Ignore trailing zeros in the decimal part..

      • You can ignore the Difference Between Null Value and Value 0.

      Floating-point and string type comparison

      You can Ignore trailing zeros in the decimal part..

    • Operation: Click Delete to remove unneeded comparison fields.

  3. Full-text comparison results need to be stored so you can view the data comparison details after the comparison is complete. You can configure a data source to store comparison results.

    • Data Source Type: Only MaxCompute data sources are supported.

    • Data Source Name: Select a MaxCompute data source that is bound to the workspace from the drop-down list.

    • Connectivity: Ensure that the selected MaxCompute data source can connect to the resource group that you configured for the comparison tables.

    • Table for Storage: Click Generate Storage Table to create a storage table with a name in the data_comparison_xxxxxx format.

    • Tunnel Quota: Select a MaxCompute data transmission resource from the drop-down list. For more information, see Purchase and use an exclusive resource group for Data Integration.

3. Scheduling configuration

After you configure the rules, click Scheduling Settings in the right-side pane to configure the scheduling properties for the data comparison node. For more information, see Configure scheduling for a node.

III. Deployment and operations

1. Deploy the data comparison node

After a task on the data comparison node is configured, you must commit and deploy the node. After you commit and deploy the node, the system runs the node on a regular basis based on scheduling configurations.

  1. Click the image icon in the top toolbar to save the node.

  2. Click the image icon in the top toolbar to deploy the node.

For detailed operations on deploying nodes, see Deploy a node or workflow.

2. Operate the data comparison node

After the data comparison node is successfully deployed, you can perform operations on the node in Operation Center. For more information, see Operation Center.

3. View the data validation report

You can view the data validation report in the task execution log through the following methods:

  • View in Operation Center:

    1. In the upper-left corner of the page, click the image icon and choose All Products > Data Development and O&M > Operation and Maintenance Center (Workflow) to go to Operation Center.

    2. In the navigation pane on the left of Operation Center, choose Auto Triggered Task O&M > Cycle Examples to view the instances that are generated for the data comparison node. In the Operation column, click More and select View Run Logs.

    3. On the log page, click the Data Comparison tab to view the report.

  • View on the Log tab:

    If you run the data comparison node on the Data Development page, click the link in the log to open the data validation report.

    Run successfully
    Click url below to view more details: https://dqc-cn-shanghai.data.aliyun.com/?defaultProjectId=814397&instanceId=1748247966625d1dfc93063c04f38a6549986
    2196c455#/job/consistency-result-check/detail
    2025-05-26 16:27:14 INFO ========================================================================
    2025-05-26 16:27:14 INFO Exit code of the Shell command 0
    2025-05-26 16:27:14 INFO --- Invocation of Shell command completed ---
    2025-05-26 16:27:14 INFO Shell run successfully!
    2025-05-26 16:27:14 INFO Current task status: FINISH
    2025-05-26 16:27:14 INFO Cost time is: 70.195s
    /home/admin/alisatasknode/taskinfo//20250526/executor/16/25/45/scn55otppszv9757kmhjsjbw/T3_6915242209.log-END-EOF