All Products
Search
Document Center

DataWorks:Quality monitoring

Last Updated:Jun 21, 2026

The quality monitoring node in DataWorks lets you configure rules to monitor the data quality of tables in your data sources and detect dirty data. You can also customize scheduling policies to periodically run data validation tasks. This topic describes how to use a quality monitoring node to monitor data quality.

Background

The Data Quality feature in DataWorks helps you detect changes in source data and identify dirty data generated during the ETL (Extract, Transformation, and Load) process. It automatically intercepts problematic tasks to prevent dirty data from propagating downstream. This prevents unexpected data from disrupting operations and affecting business decisions. It also significantly reduces troubleshooting time and saves resources by preventing task reruns. For more information, see Data Quality.

Limitations

  • Supported table types: MaxCompute, E-MapReduce, Hologres, CDH Hive, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and StarRocks.

  • Scope of supported tables:

    • You can monitor only tables in data sources that are bound to the workspace where the quality monitoring node is located.

    • Each node can monitor only one table, but you can configure multiple monitoring rules for it. The monitoring scope varies by table type:

      • For a non-partitioned table, the entire table is monitored by default.

      • For a partitioned table, you must specify a partition to monitor by using a partition filter expression.

      Note

      To monitor multiple tables, create multiple nodes.

  • Operational limitations:

    • Quality monitoring rules created in Data Studio can be run, modified, and deployed only within Data Studio. Although these rules are visible in the Data Quality module, you cannot trigger them on a schedule or manage them from there.

    • If you modify the monitoring rules in a quality monitoring node and then deploy the node, the previously generated monitoring rules are replaced.

Prerequisites

  • A computing engine is bound to your workspace, and the table you want to monitor has been created in it.

    Before you run a data quality monitoring task, you must create the table that the monitoring node will check. For more information, see Bind a computing engine and Develop a node.

  • A resource group has been created.

    Quality monitoring nodes can run only on a serverless resource group. For more information, see Manage resource groups.

  • (Optional, for RAM users) The RAM user for task development is added to the corresponding workspace and granted the Development or Workspace Administrator role. The Workspace Administrator role has extensive permissions and should be granted with caution. For more information about adding and authorizing members, see Add members to a workspace.

Step 1: Create a quality monitoring node

  1. Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose Shortcuts > Data Studio in the Actions column.

  2. In the left-side navigation pane, click image to go to Data Development. Next to Project Directory, click image and choose New Node > Data Quality > Quality Monitor. Follow the on-screen instructions to enter the path and name for the node and create it.

Step 2: Configure quality monitoring rules

1. Select the table to monitor

On the quality monitoring node editor page, click Add Table. In the Add Table dialog box, select the table to monitor. You can use the More filter to locate the table faster.

Note

If your table is not listed, go to My Data in Data Map.

2. Configure the monitoring data scope

  • Non-partitioned table: The entire table is monitored by default. You can skip this step.

  • Partitioned table: You must select the partition data to monitor. You can use scheduling parameters. Click Preview to verify that the partition filter expression is resolved correctly.

3. Configure data quality monitoring rules

You can create new rules or import existing ones. Configured rules are enabled by default.

Note
  • When you create rules in a quality monitoring node, you can use the DataWorks Copilot rule recommendation feature to intelligently generate quality rules based on your table information. You can then accept or reject the suggestions as needed.

  • DataWorks Copilot is in public preview in some regions. If it is not available in the region where your workspace is located, you can manually create or import rules as described in this topic.

  • Create a new rule

    Click Create Rule to create a quality monitoring rule based on a template or custom SQL. The following sections describe these methods.

    Built-in template

    You can follow these steps to quickly create a quality monitoring rule from a rule template.

    Note

    You can also find the required rule template in the built-in template list on the left and click + Use to create it.

    Click the + Built-in Template Rule tab on the right side of the dialog box. In the Built-in Templates panel on the left, expand a template category such as Table Row Count, find the target template, and click + Use. In the form on the right, configure the rule parameters and click OK.

    System rule template parameters

    Parameter

    Description

    Rule Name

    Enter a custom name for the rule.

    Template

    Defines the type of rule check to perform on the table.

    Data quality provides a wide range of built-in table-level and field-level rule templates. For more information, see View built-in rule templates.

    Note

    Average, sum, minimum, and maximum values apply only to numeric fields.

    Rule Scope

    The scope to which the rule applies. For a table-level rule, the scope is the current table by default. For a field-level rule, you must select a specific field.

    Comparison Method

    Specifies how the rule validates the table data.

    • Manual Settings: Allows you to manually set the comparison method and thresholds.

      The available comparison methods vary by rule template. The options displayed in the UI are definitive.

      • Supports comparison of Numeric Type results, typically against a fixed value (expected value). The comparison methods include Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To. You can customize the normal data range (normal threshold) and the abnormal data range (critical threshold).

      • Supports comparison of Fluctuation results, which is typically a range comparison. The comparison methods include Absolute Value, Raise, and Drop. You can customize the normal data range (normal threshold). You can also define a warning range for data with anomalies (warning threshold) and a critical range for data that does not meet expectations (critical threshold) based on the degree of deviation.

    • Intelligent Dynamic Threshold: The system uses an intelligent algorithm to automatically determine a reasonable threshold, so you do not need to configure one manually. If the system detects abnormal data, it immediately triggers an alert or blocks the associated task. Dynamic thresholds also support strong and weak rules.

      Note

      Only custom SQL, custom range, and dynamic threshold quality rules support the intelligent dynamic threshold comparison method.

    Monitoring Threshold

    • If you set Comparison Method to Manual Settings, you can specify a Normal threshold and a Error Threshold.

      • Normal threshold: A condition that the check result must meet for the data to be considered normal.

      • Error Threshold: A condition that, if met by the check result, indicates critically abnormal data and triggers a critical alert.

    • If the rule performs a Fluctuation check, you must also specify a Warning Threshold.

      • Warning Threshold: A condition that, if met by the check result, indicates a non-critical anomaly that does not affect business operations.

    Retain problem data

    If this feature is enabled and a rule check fails, the system automatically creates a table to store the resulting problem data.

    Important
    • You can configure the retain problem data feature for MaxCompute and Hologres tables.

    • Only some data quality rules support the retain problem data feature.

    • This feature is affected by the rule status. If the rule is Deactivate, the system does not retain problem data.

    Status

    The status of the rule, which can be Enable or Deactivate. This controls whether the rule runs in the production environment.

    Important

    If you set the status to Deactivate, the rule cannot be triggered for a test run or by an associated scheduling task.

    Degree of importance

    The severity of the rule.

    • Strong rule: An important rule. By default, a critical anomaly blocks the associated scheduling task.

    • Weak rule: A regular rule. By default, a critical anomaly does not block the associated scheduling task.

    Configuration Source

    The source of the rule configuration. In this case, the value is Data Quality.

    Description

    An optional description of the rule.

    Custom template

    Before you use this method, you must first create a custom rule template in Data Quality > Quality Assets > Rule Template Library. You can then create quality monitoring rules from that template. For more information, see Create and manage custom rule templates.

    The following steps show how to create a data quality rule from a custom template.

    Note

    You can also find the required rule template in the custom template list on the left and click + Use to create it.

    In the Create Rule dialog box, click the + Custom Template Rule tab. Configure Rule Name, Rule Template, Quality Dimension, FLAG Parameter, SQL, Comparison Method, and Monitoring Threshold (including normal and red thresholds), and then click OK.

    Custom rule template parameters

    This section describes only the parameters that are unique to custom rule templates. For information about other parameters, see System rule template parameters.

    Parameter

    Description

    FLAG parameter

    Defines the SET command to execute before the data quality check SQL runs.

    SQL

    The SQL validation logic. The query must return a single numeric value.

    In the custom SQL, use square brackets to match the partition filter expression of the table. Example:

    SELECT count(*) FROM ${tableName} WHERE ds=$[yyyymmdd];
    Note
    • The ${tableName} variable is dynamically replaced with the name of the monitored table.

    • For more information about how to configure a partition filter expression, see Appendix 2: Built-in partition filter expressions.

    • If you create a quality monitor for the table and configure a rule by using this method, the Data Scope specified for the monitor is ignored. The WHERE clause in this SQL determines which partitions are checked.

    Custom SQL

    This method allows you to define custom data quality validation logic for a table.

    Click the + Custom SQL tab at the top. In the form on the right, configure the Rule Name, Rule Template (select Custom SQL), Quality Dimension, FLAG Parameter, SQL, Comparison Method (choose Manual Settings or Intelligent Dynamic Threshold), and Monitoring Threshold, and then click OK.

    Custom SQL parameters

    This section describes only the parameters that are unique to custom SQL. For information about other parameters, see System rule template parameters.

    Parameter

    Description

    FLAG parameter

    Defines the SET command to execute before the data quality check SQL runs.

    SQL

    The SQL validation logic. The query must return a single numeric value.

    In the custom SQL, use square brackets to match the partition filter expression of the table. Example:

    SELECT count(*) FROM <table_name> WHERE ds=$[yyyymmdd];
    Note
    • In your configuration, you must replace <table_name> with the actual name of the table. This SQL statement determines which table is monitored.

    • For more information about how to configure a partition filter expression, see Appendix 2: Built-in partition filter expressions.

    • If you create a quality monitor for the table and configure a rule by using this method, the Data Scope specified for the monitor is ignored. The WHERE clause in this SQL determines which partitions are checked.

  • Import existing rules

    If monitoring rules for the target table already exist in the Data Quality module, you can import them to quickly clone the rules. If no rules exist, go to Data Quality to create them first. For more information, see Configure rules: By table (single table).

    Note

    This method supports importing multiple rules in bulk and allows for configuring monitoring rules at the table field level.

    Click Import Rule. You can search for and select the rules to import by rule ID or name, rule template, or associated scope (entire table or specific fields).

    After you select the rules, click OK to complete the import.

Note

After a quality monitoring node is deployed, the rules it contains can be viewed in the Data Quality module, but management operations such as modifying or deleting them are not allowed there.

4. Configure runtime resources

Select the runtime resources for the quality rule checks. This selection determines the data source where the quality monitoring task runs. By default, this is the data source where the monitored table is located.

Note

If you select another data source, confirm that it has access permissions to the table.

Step 3: Configure handling policies

In the Handling Policy section of the node editor page, you can configure handling policies and notification subscriptions for exceptions that the quality monitoring rules detect.

Exception categories

Exception category

Description

Strong rule: Check failed

  • Strong/Weak: Indicates the severity of the rule.

  • Error: The check result triggered the critical threshold. This typically indicates a serious issue that is likely to affect downstream operations.

  • Warning: The check result triggered the warning threshold. This typically indicates a minor issue that might not affect downstream operations.

  • Check Failed: The quality check failed to run. For example, this can happen if the monitored partition was not generated or the SQL query for the check failed.

Strong rule: Critical exception

Strong rule: Warning exception

Weak rule: Check failed

Weak rule: Critical exception

Weak rule: Warning exception

Exception handling policies

You can configure handling policies for exceptions generated by rule checks:

  • Do not ignore: If a specific exception category is detected (for example, a strong rule triggers a critical exception), you can configure the system to stop the current node and set its status to Failed.

    Note
    • After the current node fails, downstream nodes will not be executed. This blocks the production pipeline and prevents the spread of problematic data.

    • You can add multiple exception categories to check for.

    • This policy is typically used when an exception has a major impact and needs to block downstream tasks.

  • Ignore: Ignore the exception and continue to execute downstream nodes.

Exception notification methods

You can configure how to receive notifications for exceptions (for example, by email). When an exception occurs, the platform sends a notification through the specified method so you can handle the exception promptly.

Note

The platform supports multiple notification methods, which may vary on the UI. Note the following:

  • Email, Email and SMS, and Phone notifications can only be sent to users under the current account. Make sure the recipients' email addresses and phone numbers are configured correctly. For more information, see View and set alert contacts.

  • For other methods, you must enter the webhook URL for receiving notifications. For instructions on how to obtain this URL, see Obtain a webhook URL.

Step 4: Configure scheduling

To run the node task periodically, click Scheduling Settings on the right side of the node editor page and configure the scheduling properties based on your business requirements. For more information, see Configure scheduling for a node.

Step 5: Debug the task

Perform the following debugging operations as needed to check whether the task runs as expected.

  1. (Optional) Select the runtime resource group and assign values to custom parameters.

    • On the right side of the quality monitoring node, click Run Configuration and configure the Resource Group for Scheduling to use for the debug run.

    • If your task uses scheduling parameters, you can assign values to variables in the Script Parameters section for debugging. For more information about the parameter assignment logic, see Debug a task.

  2. Save and run the task.

    Click the image icon in the top toolbar to save the task. Click the image icon to run the task.

    After the task is complete, you can view the run results at the bottom of the node editor page. If the run fails, troubleshoot the issue based on the error message.

Step 6: Deploy the task

After the node task is configured, you must deploy it. After deployment, the node runs periodically according to its scheduling configuration.

Note

When you deploy a quality monitoring node, the quality rules configured within it are also deployed.

  1. In the top toolbar, click the image icon to save the node.

  2. In the top toolbar, click the image icon to deploy the node.

For more information about how to deploy nodes, see Deploy nodes and workflows.

Next steps

  • Task O&M: After a task is deployed, it runs periodically based on its scheduling configuration. You can click O&M in the upper-right corner of the node editor page to go to Operation Center and view the scheduling and run details of the task, such as the node run status and triggered rule details. For more information, see Manage scheduled tasks.

  • Data Quality: After a quality monitoring rule is deployed, you can go to the Data Quality module to view the rule details. However, you cannot perform management operations such as modifying or deleting the rule there. For more information, see Data Quality.