In DataWorks DataStudio, you can configure data quality monitoring rules for MaxCompute SQL nodes to validate the output table a node produces. Rules are defined in YAML, bound to the node's SQL code, and run automatically after each scheduled execution—catching data issues during development rather than after deployment.
How it works
Quality rules are authored in the Data Quality Test tab while you write SQL. When you commit and publish the node, the rules are versioned and deployed alongside the SQL code. At runtime, after the SQL executes successfully, the quality check runs automatically. The node's final status reflects both the SQL result and the quality check result.
Limitations
Currently, data quality rules can be configured directly only for MaxCompute SQL nodes.
Relationship with the Data Quality Center
The Data Governance module's Data Quality Center also supports rule configuration for data tables, with an extensive rule template library and cross-task governance capabilities.
| IDE-embedded rules | Data Quality Center | |
|---|---|---|
| Scope | Tightly coupled to a specific node's output | Global, cross-task governance |
| Best for | Validating node-level output during development | Unified data quality management across pipelines |
| Rule templates | Access and reuse templates from the Data Quality Center | Full template library and management |
Within the IDE's Data Quality Test feature, you can directly reference and reuse rule templates from the Data Quality Center to ensure rule consistency and improve configuration efficiency.
If you configure rules for the same data table in both places, both sets run independently. To avoid duplicate configurations and redundant alerts, pick one primary method per data table.
Configure quality rules
Open the Data Quality Test tab
-
In DataStudio, open a MaxCompute SQL node that contains data output logic (for example,
INSERT OVERWRITE). -
In the toolbar above the node editor, click Data Quality Test. The Data Quality Test tab opens at the bottom of the IDE.
Write rules
Quality rules follow the Data Quality Spec and are written in YAML. Each rule targets a specific dataset and field, and defines what a passing result looks like.
Generate rules with AI (recommended)
Write rules manually
For precise control, write YAML directly using the Data Quality Spec format. Click a template in the Rule Templates panel on the right to insert a complete rule snippet, then modify it.
- datasets:
- type: Table
dataSource:
name: odps_first
tables:
- table1 # Only one table is supported currently; wildcards are not allowed.
filter: partition:dt=${bizdate} # Filter by WHERE condition or partition (use "partition:" prefix for partitions).
rules:
- templateId: SYSTEM:field:null_value:fixed # Insert a template from the Rule Templates panel, then customize.
fields:
- id
pass:
- when = 0
name: The number of rows with null id is 0
severity: High
identity: dq_suggestion_monitor_spec_The number of rows with null id is 0
- templateId: #
- datasets:
#...
For the full rule syntax, see Data Quality spec configuration.
YAML is indentation-sensitive. Use consistent indentation throughout your rule definitions.
The editor includes the following features to speed up rule authoring:
-
Syntax highlighting: Distinguishes YAML keywords, rule names, functions, table names, and field names with different colors.
-
Auto-completion: Suggests keywords, built-in rule templates, and available table and field names as you type.
-
Real-time validation: Flags YAML syntax errors and highlights referenced table or field names that do not exist.
Test rules
After writing rules, test them in the IDE without waiting for a scheduled run.
| Method | How to trigger | What runs |
|---|---|---|
| Standalone test | Click Test Run in the Data Quality Test panel | All rules, or a single selected rule |
| Linked test | Select Trigger data quality test after the node runs successfully, then click Run in the IDE toolbar | SQL executes first; quality test runs on success |
| Workflow test | Click the run button in the workflow DAG toolbar, or run a single script from the DAG view | Task code executes first; quality test runs after |
Publish and monitor
Publish rules
When you commit and publish the node, the YAML rules in the Data Quality Test tab are published to the production environment as part of the node.
To compare rule versions, open the Quality Configuration tab. It shows the rules for the current version alongside historical versions so you can track every change.
Scheduled execution behavior
After publishing, quality rules run automatically every time the node executes on a schedule. The check starts after the SQL node succeeds.
Rule severity and blocking behavior
| Severity | Setting | If the check fails |
|---|---|---|
| Strong rule | severity: High |
The node instance fails and downstream tasks are blocked. |
| Weak rule | severity: Normal or not set |
The node instance succeeds normally; downstream tasks are not blocked. |
For details on configuring severity and pass conditions, see Data Quality Rules.
Alert notifications
If a check result is abnormal—a red alert, orange alert, or check failure—the system sends an email alert to the node owner, regardless of whether the rule blocks the node. The alert method cannot be changed at this time.