When nodes fail, stall, or consume too many resources, you need to know immediately. Custom alert rules let you define the exact conditions that trigger a notification — whether a node errors out, runs too long, or never completes — and route that notification to the channel that works for your team.
Custom alert rules apply only to auto triggered node instances. Instances generated by test runs and data backfill operations are not monitored.
Limitations
| Limitation | Details |
|---|---|
| Who can modify rules | Only rule owners, tenant administrators, and Alibaba Cloud accounts |
| Phone call notifications | Supported only for mobile phone numbers in the Chinese mainland |
| Webhook URLs | Basic Edition supports DingTalk, Lark, and WeCom group-based webhook URLs. Enterprise Edition also supports custom webhook URLs. To configure a custom webhook, see Intelligent monitoring: Formats of alert messages sent by using a custom webhook, then submit a ticketsubmit a ticket to contact DataWorks technical support. |
| Professional Edition-only trigger conditions | Instances with Errors, Proportion of Instances with Errors, and Node Logs Contain Keywords require DataWorks Professional Edition or higher. See Features of DataWorks editions. |
Monitoring time ranges
The monitoring time range determines which instances DataWorks checks for each trigger condition. If an instance falls outside the applicable range, no alert fires even when the trigger condition is met.
| Monitoring time range | Trigger conditions |
|---|---|
| T (previous day's data timestamp) | Instance Generated, Fluctuation of Instance Count, Complete, Instances with Errors, Proportion of Instances with Errors, Node Logs Contain Keywords |
| T and T-1 | Incomplete, Incomplete in Cycle, Timed Out |
| T, T-1, and T-2 | Error, Error Persisting After Automatic Rerun of Node |
Where T = the previous day's data timestamp, T-1 = the day before, and T-2 = two days before.
Go to the Rule Management page
Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Development and O&M > Operation Center. Select your workspace from the drop-down list, then click Go to Operation Center.
In the left-side navigation pane, choose Alarm > Rule Management.
You can also open the Auto Triggered Nodes page, select one or more nodes, and then choose Actions > Add Alert Rule to create a rule for those nodes. For details, see View and manage auto triggered tasks.
Create a custom alert rule
On the Rule Management page, click Create Custom Rule. The dialog box has four sections: Basic Information, Trigger Condition, Alert Details, and Alerting Frequency Control. After you configure all sections, click OK.
Basic information
| Parameter | Description |
|---|---|
| Rule name | The name of the custom alert rule. |
| Object type | The type of object to monitor. Valid values: Node, Baseline, Workspace, Workflow, Exclusive Resource Group for Scheduling, and Exclusive Resource Group for Data Integration. If you select Baseline, you can monitor only nodes in that baseline. To also monitor ancestor nodes, see Overview. |
| Rule object | The specific objects to monitor. Enter a name or ID, select the object from the list, then click Add. Maximum objects: Node (50), Baseline (5), Workflow (5), Workspace (1). |
| Add to whitelist | Nodes within the monitoring scope that you want to exclude from alerting. Required when Object type is Baseline, Workspace, or Workflow. Maximum 50 nodes. |
| Resource group name | The exclusive resource group to monitor. Required when Object type is Exclusive Resource Group for Scheduling or Exclusive Resource Group for Data Integration. |
Trigger condition
In custom alert rule logic, a node in the frozen state is considered complete.
Trigger conditions for Node, Baseline, Workspace, and Workflow:
| Trigger condition | Description |
|---|---|
| Complete | An alert fires when all monitored nodes finish successfully. If Object type is Node and multiple nodes are added, all nodes must complete before the alert fires. If Object type is Baseline or Workflow, all nodes in the baseline or workflow must complete. Not available when Object type is Workspace. For hourly nodes, the node must complete in all cycles before the alert fires. |
| Error | An alert fires each time an error occurs during a node run. For example, if you set the number of times that an alert is reported each time an error occurs to 2 and a node is rerun twice with an error occurring during each rerun, an alert is reported four times. To alert only after all automatic reruns fail, use Error Persisting After Automatic Rerun of Node instead. |
| Error Persisting After Automatic Rerun of Node | An alert fires if an error persists after the node has been automatically rerun. |
| Instances with Errors | An alert fires if the number of failed instances on the current day reaches a specified threshold. A failure includes both data quality check failures and code execution failures. Requires DataWorks Professional Edition or higher. |
| Proportion of Instances with Errors | An alert fires if the ratio of failed instances to total instances in the workspace on the current day exceeds a specified threshold. Requires DataWorks Professional Edition or higher. |
| Node Logs Contain Keywords | An alert fires if node run logs contain specified keywords on the current day. Requires DataWorks Professional Edition or higher. If you use an exclusive resource group created before August 24, submit a ticketsubmit a ticket to upgrade the resource group configuration before using this condition. |
| Instance Generated | Available only when Object type is Workspace. |
| Fluctuation of Instance Count | Available only when Object type is Workspace. DataWorks generates instances for the next day before 24:00 each day. If the generated count fluctuates significantly compared to the historical average for the workspace, an alert fires. |
| Incomplete | An alert fires if monitored nodes are still running at a specified point in time. For hourly or minute-level nodes, the system checks all cycles on the current day at the specified time. Example scenarios: (1) A node scheduled at 01:00 with an alert time of 02:00 — if the node is still running at 02:00, an alert fires. (2) A node scheduled hourly with an alert time of 12:00 — an alert fires every day. (3) A baseline with a completion deadline of 10:00 — if any node is still running at 10:00, an alert fires. |
| Incomplete in Cycle | An alert fires if a node is still running at the end of a specified cycle. Typically used for hourly nodes. For workflows, the system monitors nodes scheduled by day, hour, or minute based on the cycle number (N) you specify. If a node's total instance count is less than N, alerts for that node are suppressed. |
| Timed Out | An alert fires if a node is still running after a specified duration. Use this to set a maximum acceptable run time for a node. |
Trigger conditions for Exclusive Resource Group for Scheduling and Exclusive Resource Group for Data Integration:
| Trigger condition | Description |
|---|---|
| Resource Group Usage | An alert fires if resource group usage exceeds a specified percentage for a specified duration. Example: usage > 50% for 15 minutes. |
| Number of Instances Waiting for Resources in Resource Group | An alert fires if the number of waiting instances exceeds a specified count for a specified duration. Example: more than 10 instances waiting for 15 minutes. |
Alert details
DataWorks supports the following notification channels: email, SMS, phone call, DingTalk chatbot, and webhook URL.
| Notification method | Alert contact options | Notes |
|---|---|---|
| Mail, SMS, or Telephone | Node Owner, Shift Schedule, or Others | Click Check Contact Information to verify the phone number or email address. Phone call alerting requires DataWorks Professional Edition or higher, and is supported only for mobile numbers in the Chinese mainland. If you select Telephone, acknowledge the rate-limiting notice: the same user receives at most one phone call within 20 minutes, and additional alerts are downgraded to SMS. To send phone alerts to numbers outside the Chinese mainland, use a custom webhook instead. See Scheme to push monitoring and alerting information to a custom webhook. If you select Shift Schedule, DataWorks notifies only the main engineer for the first two alerts and notifies both the main and secondary engineer from the third alert onward. To set up a shift schedule first, see Create and manage a shift schedule. |
| DingTalk Chatbot or WebHook | Group members | Supports @all or specific member mentions. Click Send Test Message to verify the configuration. For DingTalk chatbot security settings, keywords must include DataWorks (case-sensitive). |
Alerting frequency control
Use these parameters to reduce alert noise during high-activity periods or overnight.
| Parameter | Description |
|---|---|
| Maximum Alerts | The maximum number of times an alert fires. Once this limit is reached, no further alerts are sent for the rule. |
| Minimum Alert Interval | The minimum time between consecutive alerts. |
| Alerting Do-Not-Disturb Period | The time window during which no alert notifications are sent. If a trigger condition is met during this window, DataWorks sends the notification when the window ends. Example: with a quiet window of 00:00–08:00, a node that times out at 03:00 triggers a notification at 08:00. |
Manage rules
On the Rule Management page, use the Actions column to manage existing rules:
| Action | Description |
|---|---|
| View Details | View the rule configuration. |
| Enable or Disable | Enable or disable the rule. When enabled, alert details appear on the Alert Management page. See View alert details. |
| Delete | Permanently delete the rule. |
Only rule owners, tenant administrators, and Alibaba Cloud accounts can modify alert rules.
Set up DingTalk chatbot notifications
To route alerts to a DingTalk group, add a custom chatbot to the group and get its webhook URL.
Open the DingTalk group and click the Group Settings icon in the upper-right corner.
In the Group Settings panel, click Group Assistant.
In the Group Assistant panel, click Add Robot.
In the ChatBot dialog box, click the add icon.
In the Please choose which robot to add section, click Custom.
In the Robot details message, click Add.
In the Add Robot dialog box, configure the following parameters:
Parameter Description Chatbot name A name for the chatbot. Add to Group The DingTalk group for this chatbot. This cannot be changed after creation. Custom Keywords Messages are sent only if they contain at least one of the keywords you specify. Add DataWorksas a keyword (case-sensitive). You can add up to 10 keywords.Read the terms of service, select I have read and accepted <<DingTalk Custom Robot Service Terms of Service>>, then click Finished.
Copy the webhook URL of the chatbot, then click Finished.
ImportantKeep the webhook URL confidential. If the webhook URL is leaked, your business is at risk.
On the Rule Management page, click Create Custom Rule. In the Create Custom Rule dialog box, set Alert Notification Method to DingTalk Chatbot, then paste the webhook URL in the Webhook URL field.
FAQ
How do I prevent excessive alert notifications?
Set the Maximum Alerts, Minimum Alert Interval, and Alerting Do-Not-Disturb Period parameters in the Alerting Frequency Control section. See Alerting frequency control in this topic.