Configure a data quality test - DataWorks - Alibaba Cloud Documentation Center

In DataWorks DataStudio, you can configure data quality monitoring rules for MaxCompute SQL nodes to validate the output table a node produces. Rules are defined in YAML, bound to the node's SQL code, and run automatically after each scheduled execution—catching data issues during development rather than after deployment.

How it works

Quality rules are authored in the Data Quality Test tab while you write SQL. When you commit and publish the node, the rules are versioned and deployed alongside the SQL code. At runtime, after the SQL executes successfully, the quality check runs automatically. The node's final status reflects both the SQL result and the quality check result.

Limitations

Currently, data quality rules can be configured directly only for MaxCompute SQL nodes.

Relationship with the Data Quality Center

The Data Governance module's Data Quality Center also supports rule configuration for data tables, with an extensive rule template library and cross-task governance capabilities.

	IDE-embedded rules	Data Quality Center
Scope	Tightly coupled to a specific node's output	Global, cross-task governance
Best for	Validating node-level output during development	Unified data quality management across pipelines
Rule templates	Access and reuse templates from the Data Quality Center	Full template library and management

Within the IDE's Data Quality Test feature, you can directly reference and reuse rule templates from the Data Quality Center to ensure rule consistency and improve configuration efficiency.

If you configure rules for the same data table in both places, both sets run independently. To avoid duplicate configurations and redundant alerts, pick one primary method per data table.

Configure quality rules

Open the Data Quality Test tab

In DataStudio, open a MaxCompute SQL node that contains data output logic (for example, INSERT OVERWRITE).
In the toolbar above the node editor, click Data Quality Test. The Data Quality Test tab opens at the bottom of the IDE.

Write rules

Quality rules follow the Data Quality Spec and are written in YAML. Each rule targets a specific dataset and field, and defines what a passing result looks like.

Generate rules with AI (recommended)

Write rules manually

For precise control, write YAML directly using the Data Quality Spec format. Click a template in the Rule Templates panel on the right to insert a complete rule snippet, then modify it.

- datasets:
    - type: Table
      dataSource:
        name: odps_first
      tables:
        - table1  # Only one table is supported currently; wildcards are not allowed.
      filter: partition:dt=${bizdate}  # Filter by WHERE condition or partition (use "partition:" prefix for partitions).
  rules:
    - templateId: SYSTEM:field:null_value:fixed  # Insert a template from the Rule Templates panel, then customize.
      fields:
        - id
      pass:
        - when = 0
      name: The number of rows with null id is 0
      severity: High
      identity: dq_suggestion_monitor_spec_The number of rows with null id is 0
    - templateId: #
- datasets:
    #...

For the full rule syntax, see Data Quality spec configuration.

YAML is indentation-sensitive. Use consistent indentation throughout your rule definitions.

The editor includes the following features to speed up rule authoring:

Syntax highlighting: Distinguishes YAML keywords, rule names, functions, table names, and field names with different colors.
Auto-completion: Suggests keywords, built-in rule templates, and available table and field names as you type.
Real-time validation: Flags YAML syntax errors and highlights referenced table or field names that do not exist.

Test rules

After writing rules, test them in the IDE without waiting for a scheduled run.

Method	How to trigger	What runs
Standalone test	Click Test Run in the Data Quality Test panel	All rules, or a single selected rule
Linked test	Select Trigger data quality test after the node runs successfully, then click Run in the IDE toolbar	SQL executes first; quality test runs on success
Workflow test	Click the run button in the workflow DAG toolbar, or run a single script from the DAG view	Task code executes first; quality test runs after

Publish and monitor

Publish rules

When you commit and publish the node, the YAML rules in the Data Quality Test tab are published to the production environment as part of the node.

To compare rule versions, open the Quality Configuration tab. It shows the rules for the current version alongside historical versions so you can track every change.

Scheduled execution behavior

After publishing, quality rules run automatically every time the node executes on a schedule. The check starts after the SQL node succeeds.

Rule severity and blocking behavior

Severity	Setting	If the check fails
Strong rule	`severity: High`	The node instance fails and downstream tasks are blocked.
Weak rule	`severity: Normal` or not set	The node instance succeeds normally; downstream tasks are not blocked.

For details on configuring severity and pass conditions, see Data Quality Rules.

Alert notifications

If a check result is abnormal—a red alert, orange alert, or check failure—the system sends an email alert to the node owner, regardless of whether the rule blocks the node. The alert method cannot be changed at this time.

DataWorks:Configure a data quality test