Custom alert rules allow you to monitor the status of specified node instances based on your business requirements. This helps you identify and handle exceptions at the earliest opportunity. This topic describes how to create a custom alert rule on the Rule Management page. This topic also describes how to add a DingTalk chatbot and obtain the webhook URL of the chatbot.

Limits

  • Custom alert rules take effect only on auto triggered node instances.
  • Custom alert rules support the following alert notification methods: email, text message, DingTalk chatbot, and webhook URL. Limits on the supported alert notification methods:
    • An alert notification can be sent by text message only in the following regions: Singapore (Singapore), Malaysia (Kuala Lumpur), and Germany (Frankfurt). If you want to use this notification method in other regions, submit a ticket to contact Alibaba Cloud DataWorks technical support.
    • An alert notification can be sent by using the webhook URL-based alerting feature only to Enterprise WeChat or Lark.
    Note DataWorks supports the webhook URL-based alerting feature only for DingTalk, Enterprise WeChat, and Lark. If you want to use this notification method for other services, submit a ticket to contact Alibaba Cloud DataWorks technical support.
  • DataWorks supports the webhook URL-based alerting feature. This feature has the following limits:
    • The webhook URL-based alerting feature is supported only in DataWorks Enterprise Edition and DataWorks Ultimate Edition.
    • Custom alert rules and baselines support the webhook URL-based alerting feature in the following regions: China (Shanghai), China (Chengdu), China (Zhangjiakou), China (Beijing), China (Hangzhou), China (Shenzhen), China (Hong Kong), Germany (Frankfurt), and Singapore (Singapore).

Create a custom alert rule

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where the required workspace resides, find the workspace, and then click Data Analytics.
  2. In the upper-left corner of the DataStudio page, click the Icon icon and choose All Products > Task Operation > Operation Center.
  3. In the left-side navigation pane of the Operation Center page, choose Alarm > Rule Management.
  4. In the upper-right corner of the Rule Management page, click Create Custom Rule.
  5. In the Create Custom Rule dialog box, configure the parameters.
    Custom alert rule
    Section Parameter Description
    General Rule Name The name of the custom alert rule.
    Object Type The type of object that you want to monitor. Valid values: Node, Baseline, Workspace, Workflow, Exclusive Resource Groups for Scheduling, and Exclusive Resource Groups for Data Integration.
    Object This parameter is required only if you set the value of the Object Type parameter to Node, Baseline, Workspace, or Workflow. After you specify the name or ID of the object that you want to monitor, select the object from the drop-down list. Then, click the Add icon icon.
    Add to Whitelist This parameter is required only if you set the value of the Object Type parameter to Baseline, Workspace, or Workflow. Specify the name or ID of the object that you want to monitor, click the Add icon to add the object to the table that is displayed under the Object field, and then specify the name or ID of a node in the Add to Whitelist field. The nodes that you add to the whitelist are not monitored.
    Resource Group Name If you set the value of the Object Type parameter to Exclusive Resource Groups for Scheduling or Exclusive Resource Groups for Data Integration, you must select a resource group name.
    Trigger Condition Trigger Condition If you set the value of the Object Type parameter to Node, Baseline, Workspace, or Workflow, the Trigger Condition parameter has the following valid values:
    • Completed

      Node instances are monitored from the time when they start to run. When the node instances are successfully run, an alert is reported.

    • Uncompleted

      Node instances are monitored from the time when they start to run. If the node instances are still running at the specified point in time, an alert is reported. For example, a node instance is scheduled to run at 01:00, and you set the alert time to 02:00. If the node instance is still running at 02:00, an alert is reported.

    • Error

      Node instances are monitored from the time when they start to run. If an error occurs when the node instances are running, an alert is reported.

      If an error occurs for a node instance, the 3 icon is displayed in the General column on the Cycle Instance page under Cycle Task Maintenance in Operation Center.

    • Uncompleted in Cycle

      If node instances are still running at the end of the specified cycle, an alert is reported. In most cases, you can configure this trigger condition for node instances that are scheduled by hour.

      For example, Node A is scheduled to run every 2 hours, and each run takes 25 minutes. If Node A starts to run at 00:00 every day, the node runs 12 times within 24 hours. The first cycle starts at 00:00, the second cycle starts at 02:00, and this goes on until the twelfth cycle. The twelfth cycle starts at 22:00. If the node runs as expected, the node instance in each cycle stops running at the specified point in time, such as 00:25 or 02:25. If the node instance is still running at the specified point in time in a cycle, an alert is reported.
      Note You can configure the Uncompleted in Cycle trigger condition to monitor nodes in workflows.

      If the Trigger Condition parameter is set to Uncompleted in Cycle for workflows, the system monitors nodes that are scheduled by day, hour, or minute in the workflows based on the cycle number (N) that you specified. If the number of node instances for a node is less than the value of N, the system ignores the alerts reported for the node.

      For example, you set the cycle number to 3, and two nodes are configured in a workflow. The following examples show detailed alerting and monitoring information:
      • Node A is scheduled by hour: Node A is scheduled to run every 2 hours, and each run takes 25 minutes. If Node A starts to run at 00:00 every day, the node runs 12 times within 24 hours. The first cycle starts at 00:00, and the third cycle starts at 04:00. If the node runs as expected, the node instance in the third cycle stops running at 04:25. If the node instance in the third cycle is still running at 04:25, an alert is reported.
      • Node B is scheduled by minute: Node B is scheduled to run every 10 minutes, and each run takes 2 minutes. If Node B starts to run at 00:00 every day, the node runs six times within 1 hour. The first cycle starts at 00:00, and the third cycle starts at 00:20. If the node runs as expected, the node instance in the third cycle stops running at 00:22. If the node instance in the third cycle is still running at 00:22, an alert is reported.
    • Overtime

      Node instances are monitored from the time when they start to run. If the node instances are still running after the specified period ends, an alert is reported. In most cases, you can configure this trigger condition to monitor the duration of node instances.

    • The error persists after the node automatically reruns

      Node instances are monitored from the time when they start to run. If an error still occurs after the node instances are rerun, an alert is reported.

    Trigger Condition If you set the Object Type parameter to Exclusive Resource Groups for Scheduling or Exclusive Resource Groups for Data Integration, the Trigger Condition parameter has the following valid values:
    • Resource Group Usage: If the resource usage is greater than a specific percentage for a specific period of time, an alert is reported.

      For example, if the resource usage is greater than 50% for 15 minutes, an alert is reported.

    • Number of Instances Waiting for Resources in Resource Group: If the number of node instances that are waiting for resources is greater than a specific number for a specific period of time, an alert is reported.

      For example, if the number of node instances that are waiting for resources is greater than 10 for 15 minutes, an alert is reported.

    Alert Details Notification Method The method used to send an alert notification. Valid values: Email, SMS, DingTalk Chatbot, and WebHook. You can add a DingTalk chatbot to receive alert notifications. For more information about how to send alert notifications to a DingTalk group, see the following section. If you want the system to send alert notifications to multiple DingTalk groups, add multiple webhook URLs.
    • If you set the Notification Method parameter to DingTalk Chatbot or WebHook, you can click Send Test Message in the Actions column in the table that appears to check whether an alert notification can be sent. If the alert contact does not receive the alert notification, troubleshoot the issue. For more information, see Intelligent monitoring.
    • The Recipient parameter is required if you set the Notification Method parameter to SMS or Email. After you specify the Recipient parameter, Check Contact Information is displayed on the right side of the corresponding value of this parameter. You can click Check Contact Information to check whether the mobile phone number or email address is correct.
    Notice Only the webhook URLs of DingTalk chatbots are supported.
    Recipient The user who receives alert notifications. Valid values: Node Owner, Varies According to Shift Schedule, and Others.
    Alerting Frequency Control Maximum Alerts The maximum number of times an alert is reported. If the number of times an alert is reported exceeds the specified threshold, the alert is no longer reported.
    Minimum Alert Interval The minimum interval at which an alert is reported.
    Quiet Hours The system does not send alert notifications during the period of time that is specified by this parameter.
    For example, you set Quiet Hours to a period of time from 00:00 to 08:00 and set Trigger Condition to Overtime, Error, and Uncompleted for a node. In this case, an alert notification is sent based on the following rules:
    • If the node does not finish running before 08:00, the Uncompleted alert is triggered for the node. However, the system considers that the node is successfully run in the period of time that is specified by the Quiet Hours parameter and does not send the Uncompleted alert.
    • If the node is still running after 08:00 or the node is still running after the specified period ends, the system sends the Uncompleted or Overtime alert.
  6. Click OK. An alert rule is created.
    On the Rule Management page, you can find the created alert rule and click View Details in the Actions column to view the details of the alert rule.

Send alert notifications to a DingTalk group

  1. Go to the DingTalk group to which you want the system to send alert notifications and click the Group Settings icon in the upper-right corner.
  2. In the Group Settings panel, click Group Assistant.
  3. In the Group Assistant panel, click Add Robot.
  4. In the ChatBot dialog box, click the Add a DingTalk chatbot icon.
  5. In the Please choose which robot to add section, click Custom.
  6. In the Robot details message, click Add.
  7. In the Add Robot dialog box, configure the parameters.
    Parameter Description
    Chatbot name The name of the custom chatbot.
    Add to Group The DingTalk group to which the chatbot is added. This group cannot be changed.
    Custom Keywords After you specify custom keywords, messages are sent only if these messages contain the specified keywords. You must add DataWorks as a keyword. This keyword is case-sensitive.
    Note You can specify a maximum of 10 keywords. A message can be sent only if it contains at least one of the specified keywords.
  8. Read the terms of service, select I have read and accepted <<DingTalk Custom Robot Service Terms of Service>>, and then click Finished.
  9. After you complete the security settings, copy the webhook URL of the chatbot and click Finished.
    Notice Keep the webhook URL confidential. If the webhook URL is leaked, your business is at risk.
  10. Go to the Rule Management page, and click Create Custom Rule. In the Create Custom Rule dialog box, set the Notification Method parameter to DingTalk Chatbot, and paste the chatbot webhook URL that you copied from DingTalk in the Webhook Address column in the DingTalk Chatbot section.
    Create Custom Rule