Batch Configure Monitoring Rules with Quality Templates - DataWorks

Supported data sources

Templates support monitoring rules for the following data sources: MaxCompute, E-MapReduce (EMR), Hologres, CDH Hive, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, StarRocks, MySQL, Lindorm, SQL Server, and Data Lake Formation (DLF).

Before you begin: Choose your path

Use this table to determine your configuration approach before you start.

Decision	Option A	Option B
What do you want to check?	An entire table (row counts, null rates, duplicates)	A specific field (value distribution, uniqueness, format)
Template type to select	Table-Level	Field-Level
Do you know your expected data range?	Yes — set thresholds manually	No — let the system learn from historical data
Comparison method to use	Manual Settings	Intelligent Dynamic Threshold

Intelligent Dynamic Threshold is only available for rules based on custom SQL statements, custom ranges, or dynamic threshold.

How it works

Configuring monitoring rules from a template involves three stages:

Select a template and configure rule parameters — Choose a Table-Level or Field-Level built-in template and set the comparison method and thresholds that define what "good data" looks like.
Add tables or fields in batches — Select all the tables or fields you want the rule to cover.
Associate the rule with a quality monitor — A quality monitor defines the data scope (for example, a specific partition of a partitioned table) and links the rule to a scheduling node that triggers checks automatically.

Step 1: Open the Configure by Template page

Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Governance > Data Quality. Select the target workspace from the drop-down list and click Go to Data Quality.
In the left-side navigation pane, choose Configure Rules > Configure by Template. Data Quality displays built-in Table Level and Field Level rule templates. Click Configure Monitoring Rules for the template you want to use.

Step 2: Configure rule properties

On the Batch Add Monitoring Rules page, configure the following settings.

Basic attributes

Parameter	Description
Data Source Type	The data source type for the tables this rule will cover. See Supported data sources for the full list.
Rule Source	Displays Built-in Template and the selected template name. Read-only. For the full list of available templates, see View built-in rule templates.
Template	The selected template.
Rule Name	Auto-generated. Adjust the name suffix as needed.

Advanced properties

Choose a comparison method

Method	When to use	How thresholds work
Manual Settings	You know the expected data range or fluctuation limits	For numeric results, compare against a fixed expected value using a comparison operator. Set a Normal Threshold (expected range) and a Red Threshold (critical breach). For fluctuation-based checks, use Absolute Value, Raise, or Drop as the comparison method, and configure the Normal Threshold.
Intelligent Dynamic Threshold	You are unsure what threshold values to set	The system automatically determines the reasonable threshold based on intelligent algorithms. No manual threshold values required. Alerts trigger or tasks block immediately when abnormal data is detected. Configure the Orange Threshold to define the range of anomalous but non-critical results.

Intelligent Dynamic Threshold is only available for rules based on custom SQL statements, custom ranges, or dynamic threshold.

Configure the degree of importance

Value	Effect when the red threshold is exceeded
Strong rules	The associated scheduling node is blocked by default, preventing downstream tasks from running.
Weak rules	The associated scheduling node is not blocked. An alert is sent, but downstream tasks continue.

Configure monitoring thresholds

If Comparison Method is set to Manual Settings:
- Normal Threshold: The data quality check result meets the specified condition, meaning the data output is as expected.
- Red Threshold: The data quality check result meets the specified condition, meaning the data output is not as expected.
- For numeric results, compare against a fixed value using: Greater Than, Greater Than Or Equal To, Equal To, Not Equal To, Less Than, or Less Than Or Equal To.
- For fluctuation-based checks, use Absolute Value, Raise, or Drop as the comparison method.
If Comparison Method is set to Intelligent Dynamic Threshold:
- Orange Threshold: The data quality check result meets the specified condition, meaning the data is abnormal but your business is not affected.

Status: Set to Enable to allow the rule to run in production. If set to Disable, the rule cannot run in test mode or be triggered by scheduling nodes.

Important

Disabling a rule prevents it from running in both test and production environments.

Click Next to proceed to the Generate Monitoring Rule page.

Step 3: Add tables or fields

Based on the template type you selected, add the tables or fields you want to check.

Add tables (Table-Level Rule Template)

Click Add Table. On the Batch Create page, select the tables you want to add.

The list shows all tables matching the Data Source Type configured in the previous step. Filter by Table Name as needed.
Click Confirm to add the selected tables to the Tables for Which You Want to Configure Rules list.

Add fields (Field-Level Rule Template)

Click Add Fields. In the Select a field dialog box, select the table that contains the target field.

The Tables to Be Selected area lists tables based on the Data Source Type configured in the previous step.
After selecting a table, the Select Fields section shows all fields in that table. Filter by Field Name or Field Description as needed.
Select the field and click Create. The field appears in the Fields for Which You Want to Configure Rules list.

Step 4: Associate a quality monitor

A quality monitor defines the data scope (the Data Range, such as a specific partition) and the trigger settings for the quality check. Associate each table or field with a quality monitor.

Batch configuration

Select one or more tables or fields, then click Configure Monitor.

Choose one of the following batch actions:

Automatically Associate: Associates the selected tables or fields with existing quality monitors automatically.
Disassociate: Removes the quality monitor association for the selected tables or fields.

Batch Add: Creates new quality monitor settings for the selected tables. Configure the following:

Configuration item	Description
Data Range	The partition or data scope to check. For non-partitioned tables, use a WHERE clause to limit the scope (or leave blank to check all data). For partitioned tables, use the format `Partition key=Partition value`. The partition value can be a constant or a built-in partition filter expression.
Trigger Method	Triggered by Node Scheduling in Production Environment: Checks run automatically after the associated scheduling node completes in Operation Center. Dry-run nodes do not trigger checks. Triggered Manually: Checks run only when triggered manually.
Associated Scheduling Node	Available when Trigger Method is set to Triggered By Node Scheduling In Production Environment. Select the scheduling nodes to associate with the monitor.
Running Resources	The compute resources used to run the rule. Defaults to the data source of the monitored table in the current workspace. If you select a different data source, make sure it can access the monitored table.

Important

For non-MaxCompute tables with Triggered By Node Scheduling In Production Environment selected, do not associate scheduling nodes that run on the shared resource group for scheduling. Doing so may cause errors when the monitor runs.

Single-table configuration

In the Quality Monitoring column next to the target table or field, select an existing quality monitor or click New Quality Monitoring to create one.

If no monitor exists, click Create Monitor and configure the following:

Basic configurations

Parameter	Description
Monitor Name	The name of the monitor.
Quality Monitoring Owner	The owner of the monitor. When configuring alert subscriptions, specify the owner as the recipient using Email, Email and SMS, or Telephone.
Monitored Object	The table to check. Defaults to the current table.
Data Range	The partition or data scope to check. For non-partitioned tables, all data is checked by default. For partitioned tables, use `Partition key=Partition value` format. The partition value can be a constant or a built-in partition filter expression. > Note For rules based on custom templates or custom SQL statements, Data Range has no effect. The partition is determined by the SQL statement in the rule.
Monitoring Rule	The monitoring rules to associate with this monitor. If no rules exist yet, skip this field and add rules after creating the monitor.

Running settings

Parameter	Description
Trigger Method	Same as batch configuration.
Associated Scheduling Node	Same as batch configuration.
Running Resources	Same as batch configuration.

Handling policies

Parameter Description

Quality Issue Handling Policies Blocks: If a data quality issue is detected, the scheduling node that generates the table is set to Failed, blocking downstream nodes. Default: Strong rules · Red anomaly. Alert: Sends alert notifications when an issue is detected. Defaults: Strong rules · Red anomaly, Strong rules · Orange exception, Strong rules · Check Failed, Weak rules · Red anomaly, Weak rules · Orange exception, Weak rules · Check Failed.

Alert Method Configuration Notification channels: Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, Enterprise WeChat Robot, Custom Webhook, or Telephone. For DingTalk, Lark, or WeChat chatbots, add the bot and copy the webhook URL to the Recipient field. Custom Webhook is available in DataWorks Enterprise Edition only. See Appendix: Webhook message format for the message format. When using Email, Email and SMS, or Telephone, set the recipient to Data Quality Monitoring Owner (the owner set in Basic Configurations), Shift Schedule (the on-duty person in the shift schedule), or Scheduling Task Owner (the owner of the associated scheduling node).

After creating the monitor, return to the batch monitoring rules page and click Refresh. Then select the monitor you just created in the Quality Monitoring column.

Parameter	Description
Quality Issue Handling Policies	Blocks: If a data quality issue is detected, the scheduling node that generates the table is set to Failed, blocking downstream nodes. Default: `Strong rules · Red anomaly`. Alert: Sends alert notifications when an issue is detected. Defaults: `Strong rules · Red anomaly`, `Strong rules · Orange exception`, `Strong rules · Check Failed`, `Weak rules · Red anomaly`, `Weak rules · Orange exception`, `Weak rules · Check Failed`.
Alert Method Configuration	Notification channels: Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, Enterprise WeChat Robot, Custom Webhook, or Telephone. For DingTalk, Lark, or WeChat chatbots, add the bot and copy the webhook URL to the Recipient field. Custom Webhook is available in DataWorks Enterprise Edition only. See Appendix: Webhook message format for the message format. When using Email, Email and SMS, or Telephone, set the recipient to Data Quality Monitoring Owner (the owner set in Basic Configurations), Shift Schedule (the on-duty person in the shift schedule), or Scheduling Task Owner (the owner of the associated scheduling node).

Step 5: Test and finalize

Click Generate Monitoring Rule to open the Verify Monitoring Rule page. From this page:

Test Run: Select one or more rules and click Test Run. In the dialog box, select a Data Timestamp (the simulated trigger time). The system calculates the partition values based on the specified time and Data Range, then checks whether the data meets the configured rule. After the test run completes, click Running Records in the Actions column to view results.
Subscriptions: Configure alert recipients and notification channels. See the alert method options in Step 4.
Associated Scheduling: Link rules to scheduling nodes. Use Use Recommended Running Mode to let the system automatically associate rules with the recommended nodes based on data lineage, or use Manually Specify Running Mode to associate rules with specific nodes manually.

Important
A rule must be associated with a scheduling node to be triggered automatically.
View Rule Details: Click View Rule Details in the Actions column to open a rule's detail page, where you can modify, start, stop, or delete the rule and view logs.
Delete: Remove one or more selected rules.

After the test run succeeds and scheduling is associated, click Complete Check.

What's next

After the monitor runs, choose Quality O&M in the left-side navigation pane and click Monitor > Running Records to view the quality check status and complete rule check history for the monitored tables.

Appendix: Webhook message format

This section describes the message format and parameters for alert notifications sent using a Custom Webhook.

Sample message

{
  "detailUrl": "https://dqc-cn-zhangjiakou.data.aliyun.com/?defaultProjectId=3058#/jobDetail?envType=ODPS&projectName=yongxunQA_zhangbei_standard&tableName=sx_up_001&entityId=10878&taskId=16876941111958fa4ce0e0b5746379cd9bc67999d05f8&bizDate=1687536000000&executeTime=1687694111000",
  "datasourceName": "emr_test_01",
  "engineTypeName": "EMR",
  "projectName": "Project name",
  "dqcEntityQuality": {
    "entityName": "tb_auto_test",
    "actualExpression": "ds=20230625",
    "strongRuleAlarmNum": 1,
    "weakRuleAlarmNum": 0
  },
  "ruleChecks": [
    {
      "blockType": 0,
      "warningThreshold": 0.1,
      "property": "id",
      "tableName": "tb_auto_test",
      "comment": "Test a monitoring rule",
      "checkResultStatus": 2,
      "templateName": "Compare the Number of Unique Field Values Against Expectation",
      "checkerName": "fulx",
      "ruleId": 123421,
      "fixedCheck": false,
      "op": "",
      "upperValue": 22200,
      "actualExpression": "ds=20230625",
      "externalId": "123112232",
      "timeCost": "10",
      "trend": "up",
      "externalType": "CWF2",
      "bizDate": 1600704000000,
      "checkResult": 2,
      "matchExpression": "ds=$[yyyymmdd]",
      "checkerType": 0,
      "projectName": "auto_test",
      "beginTime": 1600704000000,
      "dateType": "YMD",
      "criticalThreshold": "0.6",
      "isPrediction": false,
      "ruleName": "Rule name",
      "checkerId": 7,
      "discreteCheck": true,
      "endTime": 1600704000000,
      "MethodName": "max",
      "lowerValue": 2344,
      "entityId": 12142421,
      "whereCondition": "type!='type2'",
      "expectValue": 90,
      "templateId": 5,
      "taskId": "16008552981681a0d6",
      "id": 234241453,
      "open": true,
      "referenceValue": [
        {
          "discreteProperty": "type1",
          "value": 20,
          "bizDate": "1600704000000",
          "singleCheckResult": 2,
          "threshold": 0.2
        }
      ],
      "sampleValue": [
        {
          "discreteProperty": "type2",
          "bizDate": "1600704000000",
          "value": 23
        }
      ]
    }
  ]
}

Parameter description

Parameter	Type	Example	Description
`ProjectName`	String	`autotest`	The name of the compute engine instance or data source being monitored.
`actualExpression`	String	`ds=20200925`	The partition in the monitored data source table.
`RuleChecks`	Array	—	A list of validation results.
`BlockType`	Integer	`1`	Rule strength. `1` = Strong rule. `0` = Weak rule. Strong rules block scheduling tasks when the critical threshold is exceeded.
`WarningThreshold`	Float	`0.1`	Warning threshold. Indicates the allowable deviation from the expected value.
`Property`	String	`type`	The column in the data source table that the rule checks.
`TableName`	String	`dual`	The name of the validated table.
`Comment`	String	—	The description of the validation rule.
`CheckResultStatus`	Integer	`2`	The status of the check result.
`TemplateName`	String	`Compare number of unique field values against expectation`	The name of the validation template.
`CheckerName`	String	`fulx`	The name of the checker.
`RuleId`	Long	`123421`	The rule ID.
`FixedCheck`	Boolean	`false`	Whether a fixed value is used for the check. `true`: fixed value. `false`: no fixed value.
`Op`	String	`>`	The comparison operator.
`UpperValue`	Float	`22200`	The predicted upper limit, automatically generated based on the configured threshold.
`ActualExpression`	String	`ds=20200925`	The actual partition in the data source table that was checked.
`ExternalId`	String	`123112232`	The node ID for the scheduled task.
`TimeCost`	String	`10`	The duration of the verification task.
`Trend`	String	`up`	The trend of monitoring results.
`ExternalType`	String	`CWF2`	The CDN mapping system type. Only `CWF` is supported.
`BizDate`	Long	`1600704000000`	The data timestamp. For offline data, this is typically one day before the check runs.
`CheckResult`	Integer	`2`	The verification result.
`MatchExpression`	String	`ds=$[yyyymmdd]`	The partition filter expression.
`CheckerType`	Integer	`0`	The type of the checker.
`ProjectName`	String	`autotest`	The name of the compute engine or data source for the data quality check.
`BeginTime`	Long	`1600704000000`	The start time of the verification operation.
`DateType`	String	`YMD`	The scheduling cycle type. `YMD` indicates yearly, monthly, and daily tasks.
`CriticalThreshold`	Float	`0.6`	The critical threshold. Indicates the degree of deviation from the expected value. If a strong rule exceeds this threshold, scheduling tasks are blocked.
`IsPrediction`	Boolean	`false`	Whether the result is a prediction. `true`: prediction. `false`: not a prediction.
`RuleName`	String	—	The name of the rule.
`CheckerId`	Integer	`7`	The ID of the checker.
`DiscreteCheck`	Boolean	`true`	Whether discrete validation is enabled. `true`: discrete. `false`: non-discrete.
`EndTime`	Long	`1600704000000`	The end time of the verification results query.
`MethodName`	String	`max`	The method used to collect sample data. Supported values: `avg`, `count`, `sum`, `min`, `max`, `count_distinct`, `user_defined`, `table_count`, `table_size`, `table_dt_load_count`, `table_dt_refuseload_count`, `null_value`, `null_value/table_count`, `(table_count-count_distinct)/table_count`, `table_count-count_distinct`.
`LowerValue`	Float	`2344`	The predicted lower limit, automatically generated based on the configured threshold.
`EntityId`	Long	`14534343`	The ID of the partition filter expression.
`WhereCondition`	String	`type!='type2'`	The filter condition for the validation task.
`ExpectValue`	Float	`90`	The expected value.
`TemplateId`	Integer	`5`	The ID of the validation template.
`TaskId`	String	`16008552981681a0d6****`	The ID of the verification task.
`Id`	Long	`2231123`	The primary key ID.
`ReferenceValue`	Array	—	Historical sample values.
`DiscreteProperty`	String	`type1`	Values of the sample field grouped by GROUP BY. For example, grouping by Gender returns `Male`, `Female`, and `null`.
`Value`	Float	`20`	The sample value.
`BizDate`	String	`1600704000000`	The data timestamp. For offline data, typically one day before the check runs.
`SingleCheckResult`	Integer	`2`	The verification result for a single check.
`Threshold`	Float	`0.2`	The threshold value.
`SampleValue`	Array	—	Current sample values.
`Open`	Boolean	`true`	Whether the rule is enabled.