All Products
Search
Document Center

DataWorks:Data comparison

Last Updated:Nov 27, 2025

DataWorks provides data comparison nodes that allow you to compare data between different tables in multiple ways. You can use data comparison nodes in workflows. This topic describes how to use a data comparison node to develop tasks.

Node introduction

Data comparison nodes are used not only for data integration but also support comparison between tables. They also support custom comparison scopes and custom comparison metrics, enabling more comprehensive data comparisons.

Limitations

Only Serverless resource groups are supported. For more information about how to add and use Serverless resource groups, see Add and use a Serverless resource group.

I. Create a data comparison node

    1. Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose Shortcuts > Data Studio in the Actions column.

  1. In the navigation pane on the left, click image to access Data Development. On the right side of Project Directory, click image and select Create Node > Data Quality > Data Comparison. Follow the interface prompts to enter the node path and name information to complete the node creation.

II. Configure the data comparison node

1. Configure comparison table information

The data comparison node only requires simple configuration of comparison table information to compare table data from different data sources. The configuration details are as follows:

Parameter

Description

Resource Group

Select an existing resource group from the drop-down list.

Task Resource Usage

The number of compute units (CUs) that are allocated to run the data comparison node. You can configure this parameter as needed.

Data Source Type

Select the types of data sources to which the source and destination tables separately belong.

Data Source Name

Select the data sources to which the source and destination tables separately belong.

Connectivity

After the configuration is complete, click Test to check whether the data source is connected to the resource group.

Table Name

Select the source and destination tables to be compared from the drop-down list.

Where Filter

Enter a WHERE condition to filter data in the source and destination tables.

Note
  • You do not need to enter the WHERE keyword when configuring.

  • When comparing partitioned tables, we recommend that you specify the partition before execution. Otherwise, the following error will occur: Semantic analysis exception - physical plan generation failed: Table(<MaxCompute project name>,<table name>) is full scan with all partitions, please specify partition predicates.

Shard Key

Specifies a column in the source table as the shard key. We recommend that you use the primary key or an indexed column as the shard key.

2. Configure comparison rules

Comparison rules can be set for Metric-based Comparison or Full-text Comparison, allowing you to compare data sources and targets using different comparison rules.

Metric-based comparison

  • Table-level Comparison:

    Table Row Count Comparison: Metric-based comparison supports table-level comparison, which can compare the number of rows in tables. If the difference rate of the comparison result is less than the difference threshold specified by the Error Threshold parameter, the comparison is successful.

    Note

    The error threshold supports Percentage, Absolute Value, and Exact Match judgment methods.

  • Field-level Comparison:

    For field-level comparison, fields with the same name are compared by default. If the field names in the source and destination tables are different, you can click the Add Comparison Field button to manually select source and destination fields for comparison.

    • Source Field: Select the field from the source table that you want to compare.

    • Destination Field: Select the field from the destination table that you want to compare.

    • Comparison Metric: Comparison metrics include MAX, AVG, MIN, and SUM, which are four common metrics for comparison.

      • You can configure multiple comparison metrics for a pair of source and destination table fields.

      • You can set the Error Threshold and Ignored Object parameters to different values for different comparison metrics.

    • Error Threshold: The difference rate generated when comparing the source and destination tables will be compared with the configured difference rate. If the difference rate of the comparison result is less than the error threshold, the comparison is considered successful. You can compare using three types of thresholds: Percentage, Absolute Value, and Exact Match.

      Note
      • Error Threshold Absolute Value = |Source Table Metric Value - Destination Table Metric Value|

      • Error Threshold percentage = (|Source table metric value - Destination table metric value|) / Source table metric value × 100%

    • Ignored Object: Different field types support different ignore configurations:

      Field type for comparison

      Supported ignore options

      Integer type fields (such as INT, BIGINT, etc.)

      You can ignore Difference Between Null Value and Value 0.

      String type fields (such as STRING, VARCHAR, TEXT, etc.)

      You can ignore Difference Between Null Value and Empty String.

      Numeric type fields (including integer and floating-point types)

      • You can set the Floating Precision for comparison.

      • You can ignore Difference Between Null Value and Value 0.

      • You can Ignore trailing zeros in the decimal part.

      Integer and string type comparison

      You can Ignore trailing zeros in the decimal part.

      Integer and floating-point type comparison

      • You can Ignore trailing zeros in the decimal part.

      • You can ignore Difference Between Null Value and Value 0.

      Floating-point and string type comparison

      You can Ignore trailing zeros in the decimal part.

    • Operation: You can delete extra comparison fields or fields that do not need to be compared from the field comparison.

  • Custom Comparison:

    You can perform the following operations to add custom SQL comparison metrics to compare data in the source and destination tables:

    1. Click the Add Custom SQL Comparison Metric button to add the metrics you want to compare. You can manually rename the metric names.

      image

    2. Adjust the Error Threshold as needed. You can configure Percentage, Absolute Value, and Exact Match.

    3. After configuring the error threshold, you can click the configuration in the Custom SQL column to configure SQL for the source and destination tables to customize calculation metrics.

    4. After the configuration is complete, click OK to complete the custom comparison configuration.

Full-text comparison

  1. When configuring the comparison method as full-text comparison, you can adjust the Full-text Comparison Method to achieve different comparison effects.

    • Destination Data Contains Source Data: As long as each row of source data is in the destination, the comparison is considered successful. In this case, the destination data may contain more data than the source.

    • Row-by-Row Comparison: Compare the differences in row count and content between the source and destination row by row.

      When configuring row-by-row comparison, you need to configure the corresponding Error Threshold, which supports Percentage, Absolute Value, and Exact Match.

      Note
      • Error Threshold Absolute Value = |Source Table Metric Value - Destination Table Metric Value|

      • Error Threshold percentage = (|Source table metric value - Destination table metric value|) / Source table metric value × 100%

  2. After completing the Full-text Comparison Method configuration, you can select and configure the fields to be compared. By default, fields with the same name will be compared. To compare fields with different names, you need to manually click Add Comparison Field and select the source and destination fields. You can refer to the following content:

    • Source Field: Select the source table field to be compared.

    • Destination Field: Select the destination table field to be compared.

    • Comparison Primary Key: Full-text comparison needs to be based on a primary key to compare whether the rest of the content is the same.

    • Ignored Object: Different field types support different ignore configurations:

      Field type for comparison

      Supported ignore options

      Integer type fields (such as INT, BIGINT, etc.)

      You can ignore Difference Between Null Value and Value 0.

      String type fields (such as STRING, VARCHAR, TEXT, etc.)

      You can ignore Difference Between Null Value and Empty String.

      Numeric type fields (including integer and floating-point types)

      • You can set the Floating Precision for comparison.

      • You can ignore Difference Between Null Value and Value 0.

      • You can Ignore trailing zeros in the decimal part.

      Integer and string type comparison

      You can Ignore trailing zeros in the decimal part.

      Integer and floating-point type comparison

      • You can Ignore trailing zeros in the decimal part.

      • You can ignore Difference Between Null Value and Value 0.

      Floating-point and string type comparison

      You can Ignore trailing zeros in the decimal part.

    • Operation: You can Delete extra comparison fields or fields that do not need to be compared from the field comparison.

  3. Full-text comparison results need to be stored so you can view the data comparison details after the comparison is complete. You can configure a data source to store comparison results.

    • Data Source Type: Only MaxCompute data sources are supported.

    • Data Source Name: Select a MaxCompute data source bound to this workspace from the drop-down list.

    • Connectivity: Ensure that the selected MaxCompute data source has normal connectivity with the resource group configured when configuring the comparison table information.

    • Storage Table: Click Generate Storage Table to generate a storage table in the format of data_comparison_xxxxxx.

    • Tunnel Quota: Select the data transmission resource for MaxCompute from the drop-down list. For more information, see Purchase and use an exclusive resource group for Data Integration.

3. Scheduling configuration

After completing the rule configuration, you can click Scheduling Configuration on the right side of the page to configure scheduling for the data comparison node. For configuration details, see Configure scheduling for a node.

III. Deployment and operations

1. Deploy the data comparison node

After a task on the data comparison node is configured, you must commit and deploy the node. After you commit and deploy the node, the system runs the node on a regular basis based on scheduling configurations.

  1. Click the image icon in the top toolbar to save the node.

  2. Click the image icon in the top toolbar to deploy the node.

For detailed operations on deploying nodes, see Deploy a node or workflow.

2. Operate the data comparison node

After the data comparison node is successfully deployed, you can perform operations on the node in Operation Center. For more information, see Operation Center.

3. View the data validation report

You can view the data validation report in the task execution log through the following methods:

  • View in Operation Center:

    1. Click the image button in the upper-left corner of the page and select All Products > Data Development And Operations > Operation Center (Workflow) to enter Operation Center.

    2. In the navigation pane on the left of Operation Center, click Cycle Task Maintenance > Cycle Instance to view the instances generated by the data comparison node. Click Operation column's More and select View Running Log.

    3. On the log page, click the Data Comparison tab to view.

  • View on the Log tab:

    If you only run the data comparison node on the Data Development page, you can click the link shown in the image below on the Data Development page, which will redirect you to the data validation report page.

    image