All Products
Search
Document Center

DataWorks:Data comparison node

Last Updated:Mar 26, 2026

DataWorks data comparison nodes let you verify data consistency between tables across different data sources. Use them in workflows to detect discrepancies in row counts, field-level metrics, or row content — without writing custom scripts.

Prerequisites

Before you begin, ensure that you have:

  • A DataWorks workspace with Data Development access

  • At least one serverless resource group. Data comparison nodes only run on serverless resource groups. For details, see Resource group management

  • Data sources for both the source and destination tables already configured in DataWorks

Step 1: Create a data comparison node

  1. Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select your workspace from the drop-down list and click Go to Data Development.

  2. On the DataStudio page, click the image icon and choose Create Node > Data Quality > Data Comparison.

  3. Specify the node path and name, then confirm to create the node.

Step 2: Configure the node

Configure table information

Set up the source and destination tables to compare.

Parameter

Description

Resource Group

Select the serverless resource group to run the node.

Task Resource Usage

Adjust the compute resources allocated to the node.

Data Source Type

Select the data source type for the source and destination tables.

Data Source Name

Select the data sources for the source and destination tables.

Connection Status

Click Test to verify the data source can connect to the resource group.

Table Name

Select the source and destination tables from the drop-down list.

WHERE Condition

(Optional) Filter the rows to include in the comparison.

Shard Key

Specify a column to partition the source data. Use a primary key or indexed column for best performance.

Choose a comparison type

Select either Metric-based Comparison or Full-text Comparison based on your validation goal.

Goal

Comparison type

Verify row counts or aggregated field values (SUM, AVG, MAX, MIN) are within an acceptable range

Metric-based Comparison

Verify every row's content matches exactly, or that all source rows exist in the destination

Full-text Comparison

Metric-based comparison

Metric-based comparison checks whether row counts or field-level aggregated metrics differ by more than an acceptable threshold.

Table row comparison

Compares the total number of rows in the source and destination tables. Set an Error Threshold to define the acceptable difference. The comparison passes if the actual difference is within the threshold.

The error threshold can be set as:

  • Percentage(|Metric value of source table - Metric value of destination table|) / (Metric value of source table) × 100%

  • Absolute Value|Metric value of source table - Metric value of destination table|

  • Consistent or Not — passes only if the row counts are exactly equal

Field-level comparison

Compares aggregated metric values for specific fields. Fields with the same name are compared by default. To compare fields with different names, click Add Field for Comparison and select the source and destination fields manually.

For each field pair, configure:

Setting

Description

Source Field

The field from the source table.

Destination Field

The field from the destination table to compare against.

Comparison Metric

The aggregation to compute before comparing. Valid values: MAX, AVG, MIN, SUM. Add multiple metrics per field pair if needed.

Error Threshold

Acceptable difference between the two metric values. Set as Percentage, Absolute Value, or Consistent or Not. Each metric can have its own threshold.

Ignored Object

Differences to ignore when comparing. Supported options depend on the field type (see the table below).

Example: To verify that the amount column SUM in the source table matches the destination within 5%, set Comparison Metric to SUM and Error Threshold to Percentage with a value of 5.

Ignore options by field type

Field type

Supported ignore options

Integer (INT, BIGINT, etc.)

Difference Between Null Value and Value 0

String (STRING, VARCHAR, TEXT, etc.)

Difference Between Null Value and Empty String

Numeric (integer and floating-point)

Floating Precision; Difference Between Null Value and Value 0; Ignore trailing zeros in the decimal part

Integer and string comparison

Ignore trailing zeros in the decimal part

Integer and floating-point comparison

Ignore trailing zeros in the decimal part; Difference Between Null Value and Value 0

Floating-point and string comparison

Ignore trailing zeros in the decimal part

Custom SQL comparison

Add custom SQL expressions as comparison metrics when built-in aggregations are not enough:

  1. Click Add Custom SQL Comparison Metric. Rename the metric if needed.

  2. Set an Error Threshold (Percentage, Absolute Value, or Consistent or Not).

  3. In the Custom SQL column, click Configure and write the SQL expressions for the source and destination tables.

  4. Click Confirm to save.

Full-text comparison

Full-text comparison checks row-level content, not just aggregates. Use it to find missing rows or content differences between tables.

Select a comparison type

Type

When to use

Behavior

Source Data Contained in Destination

Verify no source rows are missing from the destination (destination may have extra rows)

Passes if every source row exists in the destination

Comparison by Row

Find differences in both row count and content

Compares row by row; requires an error threshold for the acceptable row difference

Example: If your ETL pipeline copies orders from a source database to a data lake, use Source Data Contained in Destination to confirm no orders were dropped — even if the destination has additional enriched rows.

For Comparison by Row, set an Error Threshold:

  • Percentage(|Metric value of source table - Metric value of destination table|) / (Metric value of source table) × 100%

  • Absolute Value|Metric value of source table - Metric value of destination table|

  • Consistent or Not — passes only if all rows match exactly

Select fields to compare

Fields with the same name are compared by default. To compare fields with different names, click Add Field for Comparison and select the source and destination fields.

Setting

Description

Source Field

The field from the source table.

Destination Field

The field from the destination table to compare against.

Full-text Comparison Based on Primary Keys

Use the primary key as the join key to compare other field values row by row.

Ignored Object

Same ignore options as metric-based comparison (see the table above).

Configure result storage

Full-text comparison results must be stored so you can review the details after the run. Only MaxCompute data sources are supported for storage.

Setting

Description

Data Source Type

MaxCompute only.

Data Source Name

Select a MaxCompute data source associated with the workspace.

Connection Status

Verify the MaxCompute data source can connect to the resource group configured for the comparison.

Table for Storage

Click Generate Storage Table to create a storage table. The table name follows the format data_comparison_xxxxxx.

Tunnel Quota

Select MaxCompute data transmission resources for writing results. For details, see Purchase and use exclusive resource groups for Data Transmission Service.

Configure scheduling

After configuring the comparison rules, set up scheduling to run the node automatically. For details, see Node scheduling configuration.

Step 3: Deploy the node

  1. Click the image icon in the toolbar to save the node.

  2. Click the image icon to submit the node. In the Submit dialog box, enter a Change description.

    • Code review (optional): If enabled, a reviewer must approve the code before deployment. This prevents untested code from reaching the production environment. For details, see Code review.

    • Smoke testing (optional): Run a test before deployment to verify the node runs as expected. For details, see Smoke testing.

    You must set the Rerun property and Parent Nodes for the node before you can commit it.
  3. If your workspace uses standard mode, click Deploy in the upper-right corner after submitting to publish the node to the production environment. For details, see Deploy tasks.

After deployment, the node runs automatically on the schedule you configured.

View the data validation report

After a run completes, check the data validation report to see whether the comparison passed and where discrepancies were found.

View from Operation Center

  1. Click the image icon and choose All Products > Data Development And Task Operation > Operation Center.

  2. In the left navigation pane, choose Auto Triggered Node O&M > Auto Triggered Instances to find the instance for your data comparison node.

  3. In the Actions column, click More and select View Runtime Log.

  4. On the log page, click the Data Comparison tab to view the report.

View from the Data Development page

If you run the node directly from DataStudio, click the link in the runtime log to go to the data validation report page.

image

Manage the node

After the node is deployed, perform operations and maintenance in Operation Center. For details, see Operation Center.

What's next