DataWorks provides data quality monitoring nodes. You can configure monitoring rules on these nodes to check the data quality of tables in a data source, such as detecting dirty data. You can also define a custom scheduling policy to run monitoring tasks periodically. This topic describes how to use a data quality monitoring node.
Background information
The Data Quality feature in DataWorks helps you detect changes in source data and track dirty data generated during the extract, transform, and load (ETL) process. It automatically blocks tasks that have issues to prevent dirty data from spreading to downstream nodes. This prevents tasks from producing unexpected data that can affect normal operations and business decisions. It also significantly reduces the time spent on troubleshooting and avoids wasting resources on rerunning tasks. For more information, see Data Quality.
Limits
Supported data source types: MaxCompute, E-MapReduce, Hologres, CDH Hive, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and StarRocks.
Supported table scope:
You can monitor only tables in a data source that is bound to the same workspace as the data quality monitoring node.
Each node can monitor only one table, but you can configure multiple monitoring rules for the node. The monitoring scope varies based on the table type:
Non-partitioned tables: The entire table is monitored by default.
Partitioned tables: Specify a partition filter expression to monitor a specific partition.
NoteTo monitor multiple tables, you must create multiple data quality monitoring nodes.
Limits on supported operations:
Data quality monitoring rules created in DataStudio can be run, modified, published, and managed only in DataStudio. You can view these rules in the Data Quality module, but you cannot trigger scheduled runs or manage them there.
If you modify the monitoring rules in a data quality monitoring node and then publish the node, the original monitoring rules are replaced.
Prerequisites
A business flow is created.
In Data Development (DataStudio), development operations for different data sources are performed based on business flows. Therefore, you must create a business flow before you create a node. For more information, see Create a business flow.
A data source is created and bound to the current workspace, and the table to be monitored has been created in the data source.
Before you run a data quality monitoring task, you must create the table that the monitoring node will monitor in the data source. For more information, see Data Source Management, Resource Management, and Node development.
A resource group is created.
Data quality monitoring nodes can run only on Serverless resource groups. For more information, see Resource Management.
(Optional, for RAM users) The Resource Access Management (RAM) user for task development has been added to the workspace and granted the Development or Workspace Manager role. The Workspace Administrator role has extensive permissions and must be granted with caution. For more information about adding members and granting permissions, see Add workspace members.
Step 1: Create a data quality monitoring node
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Right-click the target business flow and choose .
In the Create Node dialog box, enter a name for the node and click Confirm. After the node is created, you can develop and configure the task on the node's configuration page.
Step 2: Configure data quality monitoring rules
1. Select the table to monitor
2. Configure the data range for monitoring
3. Configure data quality monitoring rules
4. Configure compute resources
Step 3: Configure a policy for handling check results
In the Handling Policy section of the node configuration page, you can configure a policy for handling abnormal check results and the method for subscribing to them.
Exception categories
Handling policy for exceptions
Subscription method for exceptions
Step 4: Configure task scheduling
To run the created node task periodically, click Properties in the right-side pane of the node configuration page and configure the scheduling properties for the node task as needed. For more information, see Configure scheduling properties for a node.
You must set the Rerun and Parent Nodes properties for the node before you can submit it.
Step 5: Debug the task
Perform the following debugging operations as needed to check whether the task runs as expected.
(Optional) Select a resource group and assign values to custom parameters.
Click the
icon in the toolbar. In the Parameters dialog box, select the scheduling resource group to use for debugging.If your task uses scheduling parameters, you can assign values to the variables here for debugging. For more information about the parameter assignment logic, see Task debugging process.
The following figure shows an example of scheduling parameter configuration.

Save and run the task.
Click the
icon in the toolbar to save the task. Click the
icon to run the task.After the task is complete, you can view the run result at the bottom of the node configuration page. If the run fails, troubleshoot the issue based on the error message.
(Optional) Perform smoke testing.
If you want to perform smoke testing in the development environment to check whether the scheduling node task runs as expected, you can perform smoke testing when you submit the node or after the node is submitted. For more information, see Perform smoke testing.
Step 6: Submit and publish the task
After the node task is configured, submit and publish it. After the node is published, it will run periodically based on its scheduling configuration.
When you submit and publish the node, the quality rules configured for the node are also submitted and published.
Click the
icon in the toolbar to save the node.Click the
icon in the toolbar to submit the node task.When you submit the task, enter a Change description in the Submit dialog box. If needed, you can also select whether to perform a code review after the node is submitted.
NoteYou must set the Rerun and Parent Nodes properties for the node before you can submit it.
Code review helps control the quality of task configurations and prevents errors that can occur if incorrect configurations are published online without review. If you perform a code review, the submitted node can be published only after it is approved by a reviewer. For more information, see Code review.
If you are using a workspace in standard mode, after the task is submitted successfully, click Deploy in the upper-right corner of the node configuration page to publish the task to the production environment. For more information, see Publish tasks.
Next steps
Task O&M: After the task is submitted and published, it runs periodically based on the node's configuration. You can click Operation Center in the upper-right corner of the node configuration page to go to the Operation Center and view the scheduling and running status of the auto triggered task, including the node status and details of triggered rules. For more information, see Manage auto triggered tasks.
Data Quality: After the data quality monitoring rules are published, you can also go to the Data Quality module to view rule details. However, you cannot perform management operations, such as modifying or deleting the rules. For more information, see Data Quality.





