All Products
Search
Document Center

DataWorks:Configure rules for a table

Last Updated:Mar 28, 2026

Data Quality allows you to configure quality monitoring rules for data tables. You can use these rules to check if table data meets your requirements, automatically block tasks that produce problematic data, and prevent dirty data from propagating downstream, ensuring that your output data meets expectations. This topic describes how to configure, run, and manage quality monitoring rules for a specific table.

Prerequisites

Quality rules are configured for engine data tables. To do this, you must first acquire engine metadata. For more information, see Metadata acquisition.

Limitations

  • Data source limits: You can configure quality monitoring rules only for MaxCompute, E-MapReduce, Hologres, CDH Hive, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, StarRocks, MySQL, SQL Server, DLF, and Lindorm data sources.

  • Network limits: After a rule is configured, the scheduling node that generates the table data must use a resource group with established network connectivity to trigger the Data Quality rule check.

  • Rule effectiveness limits: Rules that use a dynamic threshold require at least 21 days of sample data to work correctly. With fewer than 21 days of data, the rule check may fail or produce inaccurate results. If you do not have 21 days of sample data, you can configure the rule, associate it with a scheduling task, and then use the backfill feature to generate the required data.

Core components of quality monitoring

image

Configuring quality monitoring rules for a table is the core process of defining your data quality validation logic. This process involves four key components:

  1. Monitoring scope: Specifies the target asset for data quality checks. The configuration includes:

    • Monitored Object: Select one or more physical tables for data quality checks. Both partitioned and non-partitioned tables are supported.

    • Data range: For a partitioned table, you must use a partition filter expression to dynamically define which partitions to scan during each check. For example, use $[yyyymmdd-1] to check the partition data from the day before the data timestamp.

  2. Monitoring Rule: Define the specific validation logic and measurement standards to determine whether data meets expectations.

    • Rule definition: You can add one or more quality rules to a monitored object. Each rule is instantiated from a rule template. The template can be one of the following types:

      • System template: Use a built-in template provided by DataWorks. System templates cover multiple dimensions, such as integrity, uniqueness, and validity. Examples include "table row count fluctuation" and "field unique value count".

      • Custom template: Create reusable validation logic with custom SQL.

    • Rule properties: Each rule requires you to configure its key properties, including a threshold (for example, fluctuation rate not exceeding 30%) and its severity (strong or weak rule). If a check for a strong rule fails, it can block the associated scheduling task.

  3. Trigger Method: Defines when the quality monitoring task runs.

    • Scheduled trigger: Associates the quality monitoring with an upstream DataWorks scheduling node, typically the one that generates the monitored table. When the scheduling node runs successfully, the associated quality rules are automatically triggered for validation. This is the best practice for automated data quality assurance.

    • Triggered Manually: This validation process is not associated with a scheduling task and requires you to start it manually from the UI. This method is suitable for temporary, one-time data exploration and validation.

  4. Alert policy: Configures the notification strategy for when data quality issues occur.

    • Alert subscription: You can configure alerts for specific rule check results, such as "failed" or "warning". The system supports sending notifications through various channels, including email, SMS, telephone, DingTalk chatbots, Lark chatbots, WeCom chatbots, and custom webhooks.

After you configure these four components and save the settings, a complete quality monitoring plan is created. Before you deploy it to the production environment, we recommend that you use the test run feature to verify your configuration.

Procedure

Step 1: Access the table quality details page

  1. Go to the Data Quality page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Quality. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Quality.

  2. Go to the Configure by Table page.

    In the left-side navigation pane, click Configure Rules > Configure by Table to go to the rule configuration page.

    1. In the Data Source list on the left, select the database that contains the table for which you want to configure rules.

    2. Filter tables by database type, database, table name, or other criteria. Click the target table name, or click Rule Management in the Actions column to go to the table's quality details page.

      This page displays all configured quality monitors and rules for the current table. You can filter rules based on whether they are associated with a quality monitor. You can also define the run configuration for rules that are not yet associated with a quality monitor.

      image

Step 2: Create a quality monitor

  1. Create a quality monitor.

    You can create a quality monitor in one of the following two ways:

    Rule management tab

    On the Table Quality Details page, click the Rule Management tab. Next to Monitor Perspective, click the image icon to create a new quality monitor.

    image

    Quality monitoring tab

    On the Table Quality Details page, switch to the Quality monitoring tab. Click Create Monitor.

    image

  2. Configure the parameters for the quality monitor.

    Section

    Parameter

    Description

    Basic Configurations

    Monitor Name

    Enter a custom name for the quality monitor.

    Quality Monitoring Owner

    Specify the owner of this quality monitor. When you configure an alert subscription, you can specify the owner as the alert recipient by using the Email, Email and SMS, or Telephone notification method.

    Monitored Object

    The object for data quality checks. By default, this is the current table.

    Data Range

    Use a partition filter expression to define which partitions of the table the quality rule will check.

    • For a non-partitioned table, you do not need to configure this parameter. The default is Full Table.

    • For a partitioned table, the expression format is partition_name=partition_value. The partition value can be a fixed value or a built-in partition filter expression.

    Note

    This configuration does not take effect when configuring rules with a custom template or custom SQL. For quality rules configured with a custom template or custom SQL, the custom SQL determines which partitions to check.

    Select Quality Rules

    Select Quality Rules

    Selects the quality rules to associate with the quality monitor. These rules will check if the data in the specified range meets expectations.

    Note
    • You can create multiple quality monitors for different partitions and associate them with different quality rules to apply different validation rules to different partitions.

    • If you have not created a quality rule, you can skip this step. You can first create the quality monitor and then add rules to it later. For more information about how to create a quality rule, see Step 3: Configure data quality rules.

    Running Settings

    Trigger Method

    The trigger method for the quality monitor.

    • Triggered by Node Scheduling in Production Environment: Associates the quality monitor with a specific, periodically scheduled task in DataWorks Operation Center. After the task runs successfully, the quality rules in this quality monitor are automatically triggered. Dry-run tasks do not trigger quality rule checks.

    • Triggered Manually: Allows you to manually trigger the quality monitoring rules that are associated with the current quality monitor.

    Important

    If the table you are monitoring is not a MaxCompute table and you set Trigger Method to Triggered by Node Scheduling in Production Environment, the selected periodically scheduled task cannot use a public scheduling resource group. Otherwise, the quality monitor reports an error when it runs.

    Associate Scheduling Node

    If you set Trigger Method to Triggered by Node Scheduling in Production Environment, you can configure this parameter to specify an associated scheduling node. After the specified scheduling node runs successfully, the quality monitoring rules are automatically triggered.

    Select Run Resource

    Specifies the computing resources required to run the quality rule checks. By default, the resource for the data source of the monitored table is selected. If you select another data source, make sure that its resources can access the table.

    Handling Policies

    Quality Issue Handling Policies

    Configure the blocking or alerting policy to use when the system detects a data quality issue.

    • Alert: When a data quality issue is detected, the system sends an alert to the subscribed channels for the quality monitor.

      The default conditions are Strong Rule · Critical Anomaly, Strong Rule · Warning Anomaly, Strong Rule · Check Failed, Weak Rule · Critical Anomaly, Weak Rule · Warning Anomaly, and Weak Rule · Check Failed.

    • Blocks: When a data quality issue is detected, the system identifies the production scheduling node that triggered the table quality check, sets the node to Failed, and prevents downstream nodes from running. This process blocks the production pipeline to prevent problematic data from spreading.

      The default condition is Strong Rule · Critical Anomaly.

      Important

      If you set the policy to Blocks, the system also triggers an alert when a data quality rule's conditions are met.

    Alert Method Configuration

    You can send alert notifications by using Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, Enterprise WeChat Chatbot, Custom Webhook, or Telephone.

    Note
    • To use a chatbot, add a DingTalk, Lark, or WeCom chatbot, obtain its webhook URL, and then paste the URL into the alert subscription.

    • The Custom Webhook notification method is supported only in DataWorks Enterprise Edition. For information about the message format of an alert notification sent by using a Custom Webhook, see Appendix: Webhook message format.

    • If you select Email, Email and SMS, or Telephone as the notification method, you can set Recipient to Data Quality Monitoring Owner, Shift Schedule, or Scheduling Task Owner.

      • Data Quality Monitoring Owner: Alert notifications are sent to the Quality Monitoring Owner specified in the Basic Configurations section of the current quality monitor.

      • Shift Schedule: When an alert for a quality rule check is triggered by a scheduling node that is associated with a quality monitor, the system sends an alert notification to the on-duty user for the current day in the shift schedule.

      • Scheduling Task Owner: Alert notifications are sent to the owner of the scheduling node that is associated with the quality monitor.

  3. Click Save to create the quality monitor.

Step 3: Configure data quality rules

Note

You can configure quality rules based on built-in table-level and field-level monitoring templates. For more information about built-in rule templates, see View built-in rule templates.

  1. On the Table Quality Details page, on the Rule Management tab, select the quality monitor that you created and click Create Rule to go to the rule configuration page.

  2. Create a data quality rule.

    Data Quality provides several methods for configuring quality monitoring rules. Select the one that best suits your business requirements.

    Method 1: System template

    Data Quality provides dozens of built-in quality rule templates. In the left-side pane, click + Use next to a template to quickly create a quality monitoring rule. You can add multiple rules at the same time.

    You can click + System Template Rule at the top and then modify the Template parameter to change the rule template.

    System rule template parameters

    Parameter

    Description

    Rule Name

    You can enter a custom rule name.

    Template

    Defines the type of rule check that you want to perform on the table.

    Data Quality provides a wide range of built-in table-level and field-level monitoring templates that you can use. For more information, see View built-in rule templates.

    Note

    The average, sum, minimum, and maximum values are applicable only to numeric fields.

    Rule Scope

    The scope of the rule. For a table-level rule, the scope is the current table by default. For a field-level rule, you need to select a specific field.

    Comparison Method

    Defines how the rule checks whether the table data meets your expectations.

    • Manual Settings: Customizes the comparison between the data output and the rule.

      Different rule templates support different comparison methods. The options available in the UI may differ.

      • For numeric results, you can compare the result with a fixed value, known as the expected value. The supported comparison methods include Greater Than, Greater Than or Equal To, Equal To, Unequal To, Less Than, and Less Than or Equal To. You can customize the normal data range (normal threshold) and the critical data range (critical threshold).

      • For fluctuation results, you can perform a range comparison. The supported comparison methods include Absolute Value, Raise, and Drop. You can customize the normal data range (normal threshold). You can also define a warning range (warning threshold) and a critical range (critical threshold) based on the degree of deviation.

    • Intelligent Dynamic Threshold: You do not need to manually configure a fluctuation threshold or an expected value. The system uses an intelligent algorithm to automatically determine a reasonable threshold. If the system detects abnormal data, it immediately triggers an alert or blocks the associated task. Dynamic thresholds also support strong and weak rules.

      Note

      Only quality rules created from custom SQL, custom range, and dynamic threshold templates support the intelligent dynamic threshold comparison method.

    Monitoring Threshold

    • If you set Comparison Method to Manual Settings, you can specify Normal threshold and Critical Threshold.

      • Normal threshold: If the quality rule check result meets the value that you specify here, the data is as expected.

      • Critical Threshold: If the quality rule check result meets the value that you specify here, the data is not as expected.

    • If the rule is a fluctuation-type check, you must also specify Warning Threshold.

      • Warning Threshold: If the quality rule check result meets the value that you specify here, the data has an anomaly but it does not affect business operations.

    Retain problem data

    If the rule is enabled and the quality rule check fails, the system automatically creates a table to store any problematic data found during the check.

    Important
    • Currently, you can enable this feature for MaxCompute and Hologres tables.

    • Currently, only some quality monitoring rules support this feature.

    • The rule's status affects this feature. If the rule is Disabled, the system does not retain problem data.

    Status

    The Enable or Disable status of the rule, which controls whether the rule runs in the production environment.

    Important

    If you set the status to Disable, you cannot trigger the rule for a test run, and associated scheduling tasks will not trigger it.

    Degree of importance

    Sets the severity of the rule.

    • Strong rule: An important rule. If a critical anomaly occurs, it blocks the associated scheduling task by default.

    • Weak rule: A regular rule. If a critical anomaly occurs, it does not block the associated scheduling task by default.

    Configuration Source

    Displays the source of the rule configuration. In this case, the value is Data Quality.

    Description

    You can add a supplementary description for the rule.

    Method 2: Custom template

    Note

    Before you use a custom template to create a rule, you must go to Quality Assets > Rule Template Library to create a custom rule template. For more information, see Create and manage custom rule templates.

    When you use a custom template, the system automatically displays its basic configurations, such as the FLAG parameter and validation SQL. You can specify a custom Rule Name and configure monitoring thresholds based on the rule type. For example, a numeric rule requires a normal threshold and a critical threshold, while a fluctuation-type rule also requires a warning threshold.

    Custom rule template parameters

    This section describes only the parameters that are unique to custom rule templates. For information about other parameters, see the system rule template parameter descriptions.

    Parameter

    Description

    FLAG parameter

    Defines the SET command to execute before the data quality check SQL runs.

    SQL

    Defines the complete SQL validation logic. The query must return a single numeric value in one row and one column.

    In the custom SQL, use square brackets to match the partition filter expression of the table. Example:

    SELECT count(*) FROM ${tableName} WHERE ds=$[yyyymmdd];
    Note
    • The system dynamically replaces the ${tableName} variable with the name of the monitored table.

    • For more information about how to configure partition filter expressions, see Appendix 2: Built-in partition filter expressions.

    • If you have created a quality monitor for the table and configure a rule in this way, the Data Range that you specified in the monitor settings no longer takes effect. The WHERE clause in this SQL statement determines which table partitions the rule checks.

    Method 3: Custom SQL

    This method allows you to customize the data quality validation logic for the table.

    Custom SQL parameters

    This section describes only the parameters that are unique to custom SQL. For information about other parameters, see the system rule template parameter descriptions.

    Parameter

    Description

    FLAG parameter

    Defines the SET command to execute before the data quality check SQL runs.

    SQL

    Defines the complete SQL validation logic. The query must return a single numeric value in one row and one column.

    In the custom SQL, use square brackets to match the partition filter expression of the table. Example:

    SELECT count(*) FROM <table_name> WHERE ds=$[yyyymmdd];
    Note
    • In your configuration, you must replace <table_name> with the actual name of the table. This SQL statement determines which table to monitor.

    • For more information about how to configure partition filter expressions, see Appendix 2: Built-in partition filter expressions.

    • If you have created a quality monitor for the table and configure a rule in this way, the Data Range that you specified in the monitor settings no longer takes effect. The WHERE clause in this SQL statement determines which table partitions the rule checks.

    Method 4: Custom script

    Custom script rules support hour- and minute-level data validation. For information about how to write script rules, see Using system rule templates. For example:

    - assertion: change 30 minutes ago for max(id) = 15
      name: 30-minute difference in max value of id field is 15

    image

  3. (Optional) Add the configured rule to a quality monitor. For more information about quality monitors, see Step 2: Create a quality monitor.

    Note

    A quality rule can be triggered only after you add it to a quality monitor. You can select an existing quality monitor here, or select this quality rule in the Select Quality Rules step when you configure a quality monitor.

    image

  4. Click OK.

Step 4: Test the rule execution

You can test the rules in a quality monitor in the following ways.

From the rule management tab

  1. On the Rule Management tab, under Monitor Perspective, find the quality monitor that you created and click Test Run.

    image

  2. In the Test Run dialog box, confirm parameters such as Data Range and Scheduling Time, and then click Test Run. After Started is displayed, you can click View Details to view the detailed results of the test run.

    image

From the quality monitoring tab

  1. On the Monitor tab, find the quality monitor that you created, and click Test in the Actions column.

    image

  2. In the Test Run dialog box, confirm parameters such as Data Range and Scheduling Time, and then click Test Run. After Started is displayed, you can click View Details to view the detailed results of the test run.

    image

Step 5: Modify alert subscriptions

You configured alert subscriptions in Step 2. Create a quality monitor. When a rule triggers, the system sends a notification to the specified alert recipients. If you want to modify the alert subscription to notify other users, you can configure it in the following ways.

From the rule management tab

  1. On the Rule Management tab, under Monitor Perspective, find the quality monitor that you created and open the alert subscription page as shown in the following figure.

    image

  2. In the Alert Subscription dialog box, add a Notification Method and a Recipient, and then click Save in the Actions column. After you save the settings, you can add another subscription.

    The supported notification methods include Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, Enterprise WeChat Chatbot, Custom Webhook, and Telephone.

    Note
    • To use a chatbot, add a DingTalk, Lark, or WeCom chatbot, obtain its webhook URL, and then paste the URL into the alert subscription.

    • The Custom Webhook notification method is supported only in DataWorks Enterprise Edition. For information about the message format of an alert notification sent by using a Custom Webhook, see Appendix: Webhook message format.

    • If you select Email, Email and SMS, or Telephone as the notification method, you can set Recipient to Data Quality Monitoring Owner, Shift Schedule, or Scheduling Task Owner.

      • Data Quality Monitoring Owner: Alert notifications are sent to the Quality Monitoring Owner specified in the Basic Configurations section of the current quality monitor.

      • Shift Schedule: When an alert for a quality rule check is triggered by a scheduling node that is associated with a quality monitor, the system sends an alert notification to the on-duty user for the current day in the shift schedule.

      • Scheduling Task Owner: Alert notifications are sent to the owner of the scheduling node that is associated with the quality monitor.

From the quality monitoring tab

  1. On the Monitor tab, find the quality monitor that you created, and click More > Alert Subscription in the Actions column.

    image

  2. In the Alert Subscription dialog box, add a Notification Method and a Recipient, and then click Save in the Actions column. After you save the settings, you can add another subscription.

    The supported notification methods include Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, Enterprise WeChat Chatbot, Custom Webhook, and Telephone.

    Note
    • To use a chatbot, add a DingTalk, Lark, or WeCom chatbot, obtain its webhook URL, and then paste the URL into the alert subscription.

    • The Custom Webhook notification method is supported only in DataWorks Enterprise Edition. For information about the message format of an alert notification sent by using a Custom Webhook, see Appendix: Webhook message format.

    • If you select Email, Email and SMS, or Telephone as the notification method, you can set Recipient to Data Quality Monitoring Owner, Shift Schedule, or Scheduling Task Owner.

      • Data Quality Monitoring Owner: Alert notifications are sent to the Quality Monitoring Owner specified in the Basic Configurations section of the current quality monitor.

      • Shift Schedule: When an alert for a quality rule check is triggered by a scheduling node that is associated with a quality monitor, the system sends an alert notification to the on-duty user for the current day in the shift schedule.

      • Scheduling Task Owner: Alert notifications are sent to the owner of the scheduling node that is associated with the quality monitor.

Next steps

After a quality monitor runs, you can go to Quality O&M in the left-side navigation pane and click Monitor and Running Records to view the table's quality check status and the complete records of its quality rule checks.

Appendix

Appendix 1: Formulas for fluctuation rate and variance

  • Formula for fluctuation rate: Fluctuation rate = (Sample value - Baseline value) / Baseline value

    • Sample value: The value of the sample collected on the current day. For example, for a 1-day fluctuation check of the table row count in an SQL task, the sample is the row count of the current day's partition.

    • Baseline value: The comparison value from historical samples.

    Note
    • If the rule is a table row count, 1-day fluctuation rate check for an SQL task, the baseline value is the row count of the previous day's partition.

    • If the rule is a table row count, 7-day average fluctuation rate check for an SQL task, the baseline value is the average row count from the previous 7 days.

  • Formula for variance fluctuation: (Current sample - Average of last N days) / Standard deviation

    Note

    You can use variance only for numeric types such as BIGINT and DOUBLE.

Appendix 2: Built-in partition filter expressions

The following example assumes this scenario:

  • The data timestamp (bizdate) is 20240524

  • The scheduling time is 10:30:00

Partition filter expression

Description

Example

ds=$[yyyymmdd]

Checks the partition data of the current data timestamp.

20240524

ds=$[yyyymmdd-1]

Checks the partition data from the day before the data timestamp.

20240523

ds=$[yyyymmdd-7]

Checks the partition data from 7 days before the data timestamp (one week ago).

20240517

ds=$[add_months(yyyymmdd,-1)]

Checks the partition data from the same day of the previous month as the data timestamp.

20240424

ds=$[yyyymmddhh24miss]

Checks the partition for the current data timestamp, accurate to the second of the current scheduling time.

20240524103000

ds=$[yyyymmdd]000000

Checks the second-level partition data at midnight of the current data timestamp.

20240524000000

ds=$[yyyymmddhh24miss-1/24]

Checks the second-level partition data from one hour before the scheduling time on the current data timestamp.

20240524093000

ds=$[hh24miss-1/24]

(For hourly partitions) Checks the partition from one hour before the scheduling time. The format is usually hh0000.

090000

ds=$[hh24miss-30/24/60]

(For minute-level partitions) Checks the partition from 30 minutes before the scheduling time. The format is usually hhmi00.

100000

ds=$[yyyymmdd-1]/hour=$[hh24]

(For two-level partitions) Checks all hourly partition data from the day before the data timestamp.

All partitions from ds=20240523/hour=00 to ds=20240523/hour=23