You can use custom alert rules to monitor the status or resource usage of specified nodes based on your business requirements. This helps you identify and handle exceptions at the earliest opportunity. This topic describes how to create a custom alert rule on the Rule Management page. This topic also describes how to add a DingTalk chatbot and obtain the webhook URL of the chatbot.

Limits

  • Custom alert rules take effect only on auto triggered node instances.
  • Custom alert rules support the following alert notification methods: email, text message, DingTalk chatbot, and webhook URL. Limits on the supported alert notification methods:
    • An alert notification can be sent by using a text message only in the following regions: Singapore (Singapore), Malaysia (Kuala Lumpur), and Germany (Frankfurt). If you want to use this notification method in other regions, submit a ticket to contact Alibaba Cloud DataWorks technical support.
    • Webhook URL:
      • The webhook URL-based alerting feature is supported only in DataWorks Enterprise Edition and DataWorks Ultimate Edition.
      • The webhook URL-based alerting feature is supported in the following regions: China (Shanghai), China (Chengdu), China (Zhangjiakou), China (Beijing), China (Hangzhou), China (Shenzhen), China (Hong Kong), Germany (Frankfurt), and Singapore (Singapore).
      • An alert notification can be sent by using the webhook URL-based alerting feature only to Enterprise WeChat or Lark.
    Note DataWorks supports the webhook URL-based alerting feature only for DingTalk, Enterprise WeChat, and Lark. If you want to use this notification method for other services, submit a ticket.

Create a custom alert rule

  1. Go to the Operation Center page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides. Find your workspace, click the More icon icon in the Actions column, and then select Operation Center.
  2. In the left-side navigation pane, choose Alarm > Rule Management.
  3. On the page that appears, click Create Custom Rule.
  4. In the Create Custom Rule dialog box, configure the parameters.
    Create Custom Rule dialog box
    1. Configure parameters in the General section.
      Parameter Description
      Rule Name The name of the custom alert rule.
      Object Type The type of object that you want to monitor. Valid values: Node, Baseline, Workspace, Workflow, Exclusive Resource Groups for Scheduling, and Exclusive Resource Groups for Data Integration.
      Note If this parameter is set to Baseline, you can monitor only the status of nodes that are run on a specified baseline. If you also want to monitor the status of ancestor nodes of the nodes that are run on the specified baseline, see Overview.
      Object The object that you want to monitor.

      This parameter is required only if you set the Object Type parameter to Node, Baseline, Workspace, or Workflow. Enter the name or ID of the object that you want to monitor in the field, select the object that appears, and then click Add.

      Add to Whitelist Specifies the nodes that are in the monitoring scope but you do not want to monitor.

      This parameter is required only if you set the Object Type parameter to Baseline, Workspace, or Workflow. To add a node to the whitelist, enter the name or ID of the node in the Add to Whitelist field and click Add. The nodes that you add to the whitelist are not monitored.

      Resource Group The name of the exclusive resource group that you want to monitor.

      If you set the Object Type parameter to Exclusive Resource Groups for Scheduling or Exclusive Resource Groups for Data Integration, you must select a resource group that you want to monitor from the Resource Group drop-down list.

    2. Configure parameters in the Trigger Condition section.
      Note In the logic of a custom alert rule, a node is complete if the node is in the frozen state.
      Object type Trigger condition Description
      Node, Baseline, Workspace, or Workflow Completed
      Nodes are monitored from the time when they start to run. When the nodes are successfully run, an alert is reported.
      • If the Object Type parameter is set to Baseline or Workflow, an alert is reported only after all nodes in the specified baseline or workflow are successfully run.
      • If the Object Type parameter is set to Node and multiple nodes are added, an alert is reported only after all nodes are complete.
      • If the Object Type parameter is set to Workspace, you cannot select Completed from the Trigger Condition drop-down list.
      Note For a node that is scheduled to run by hour, the node is considered complete only after the node is successfully run in all cycles.
      Uncompleted

      Nodes are monitored from the time when they start to run. If the nodes are still running at a specified point in time, an alert is reported.

      Note Alert rules of this trigger condition type are different from alert policies provided by using the intelligent baseline feature. The intelligent baseline feature can be used to detect an exception that prevents a node in a baseline from being complete on time. If an exception is detected, the system sends you an alert notification about the exception at the earliest opportunity. For more information, see Overview.
      Sample scenarios:
      • Scenario 1: A node is scheduled to run at 01:00, and you set the alert time to 02:00. If the node is still running at 02:00, an alert is reported.
      • Scenario 2: A node is scheduled to run every hour from 00:00 to 23:59. You set the alert time to 12:00. In this case, an alert is reported every day.
      • Scenario 3: You set the completion time for a baseline to 10:00. If a node in the baseline is still running at 10:00, an alert is reported.
      Note For a node that is scheduled to run by hour or minute, the system checks whether the node is complete at a specified point in time in all cycles on the current day.
      Error

      Nodes are monitored from the time when they start to run. If an error occurs when the nodes are running, an alert is reported.

      Note

      If an error occurs for a node instance, the 3 icon is displayed in the General column on the Cycle Instance page under Cycle Task Maintenance in Operation Center.

      • If the Object Type parameter is set to Baseline, Workspace, or Workflow, an alert is reported if an error occurs on a node in the specified baseline, workspace, or workflow.
      • An alert is reported each time an error occurs when a node is running. For example, you set the number of times that an alert is reported each time an error occurs to 2. If a node is rerun twice and an error occurs during each rerun operation, an alert is reported for four times.
      • If you want an alert to be reported only if an error persists after a node is automatically rerun, you can select The error persists after the node automatically reruns from the Trigger Condition drop-down list.
      Uncompleted in Cycle

      If nodes are still running at the end of a specified cycle, an alert is reported. In most cases, you can configure this trigger condition for node instances that are scheduled to run by hour.

      If the Trigger Condition parameter is set to Uncompleted in Cycle for workflows, the system monitors nodes that are scheduled to run by day, hour, or minute in the workflows based on the cycle number (N) that you specified. If the number of node instances for a node is less than N, the system ignores the alerts reported for the node.

      For example, you set the cycle number to 3, and two nodes are contained in a workflow. The following examples show detailed alerting and monitoring information:
      • Node A is scheduled to run every 2 hours, and each run operation takes 25 minutes. If Node A starts to run at 00:00 every day, the node runs for 12 times within 24 hours. The first cycle starts at 00:00, and the third cycle starts at 04:00. If the node runs as expected, the node instance in the third cycle stops running at 04:25. If you set the trigger condition to Uncompleted in Cycle and set the alert time to 04:30 for the node instance in the third cycle, an alert is reported when the node instance in the third cycle is still running at 04:30.
      • Node B is scheduled to run every 10 minutes, and each run operation takes 2 minutes. If Node B starts to run at 00:00 every day, the node runs for six times within 1 hour. The first cycle starts at 00:00, and the third cycle starts at 00:20. If the node runs as expected, the node instance in the third cycle stops running at 00:22. If you set the trigger condition to Uncompleted in Cycle and set the alert time to 00:23 for the node instance in the third cycle, an alert is reported when the node instance in the third cycle is still running at 00:23.
      Overtime
      Nodes are monitored from the time when they start to run. If the nodes are still running after a specified period ends, an alert is reported. In most cases, you can configure this trigger condition to monitor the duration of nodes.
      Note If a node that is monitored fails to be run and remains in the failed state after a specified period ends, a timeout alert is reported.
      The error persists after the node automatically reruns
      Nodes are monitored from the time when they start to run. If an error persists after the nodes are rerun, an alert is reported.
      Note If you want an alert to be reported each time an error occurs when a node is running, you can set the trigger condition to Error.
      Instance Generated You can set the trigger condition to Instance Generated only when the Object Type parameter is set to Workspace.
      Fluctuation of Instance Number You can set the trigger condition to Fluctuation of Instance Number only when the Object Type parameter is set to Workspace. DataWorks generates auto triggered node instances that need to run the next day before 24:00 every day. When the number of auto triggered node instances that are generated in your desired workspace significantly fluctuates, in comparison with the average number of auto triggered node instances that are historically generated in the workspace, an alert is reported.
      Exclusive Resource Groups for Scheduling or Exclusive Resource Groups for Data Integration Resource Group Usage If the value of the Resource Group Usage parameter is greater than a specific percentage for a specific period of time, an alert is reported.

      Example: If the value of the Resource Group Usage parameter is greater than 50% for 15 minutes, an alert is reported.

      Number of Instances Waiting for Resources in Resource Group If the value of the Number of Instances Waiting for Resources in Resource Group parameter is greater than a specific number for a specific period of time, an alert is reported.

      Example: If the value of the Number of Instances Waiting for Resources in Resource Group parameter is greater than 10 for 15 minutes, an alert is reported.

    3. Configure parameters in the Alert Details section.
      Notification method Alert contact Description
      Email or SMS You can select Node Owner, Varies According to Shift Schedule, or Others for Recipient.
      • After you configure the Recipient parameter, Check Contact Information is displayed on the right side of the value of this parameter. You can click Check Contact Information to check whether the mobile phone number or email address is correct.
      • If you want to select Varies According to Shift Schedule for Recipient, you must configure a shift schedule first. For more information about how to configure a shift schedule, see Create and manage a shift schedule.
      DingTalk Chatbot or WebHook You can specify members in a group.
      • You can click Send Test Message in the Actions column to check whether an alert notification can be sent. If the alert contact does not receive the alert notification, troubleshoot the issue. For more information, see Intelligent monitoring.
      • DataWorks supports the webhook URL-based alerting feature only for DingTalk. If you want to use this notification method for other services, submit a ticket to contact Alibaba Cloud DataWorks technical support.
      • You can specify only keywords for the security configuration of a DingTalk chatbot. The keywords must contain DataWorks.
    4. Configure parameters in the Alerting Frequency Control section.
      Parameter Description
      Maximum Alerts The maximum number of times an alert is reported. If the number of times an alert is reported exceeds the specified threshold, the alert is no longer reported.
      Minimum Alert Interval The minimum interval at which an alert is reported.
      Quiet Hours The system does not send alert notifications during the period of time that is specified by this parameter.

      For example, you set the Trigger Condition parameter to Overtime, Error, or Uncompleted for a node and set the Quiet Hours parameter to the period of time from00:00 to 08:00. In this case, the system does not send an alert notification during this period of time. If the node times out, an error occurs on the node, or the node is not complete at 08:00, the system sends an alert notification.

  5. Click OK. An alert rule is created.
    On the Rule Management page, you can click View Details, Disable, Enable, or Delete in the Actions column that corresponds to a rule to perform the related operation.
    • View Details: View basic information about the desired rule.
    • Enable or disable: Enable or disable a rule. You can enable a rule to monitor the status of a node for which the rule is configured. You can view alert details on the Alert Management page. For more information, see View alerts.
    • Delete: Delete a rule.

Scenario practices: Send alert notifications to a DingTalk group

  1. Go to the DingTalk group to which you want the system to send alert notifications and click the Group Settings icon in the upper-right corner.
  2. In the Group Settings panel, click Group Assistant.
  3. In the Group Assistant panel, click Add Robot.
  4. In the ChatBot dialog box, click the Add a DingTalk chatbot icon.
  5. In the Please choose which robot to add section, click Custom.
  6. In the Robot details message, click Add.
  7. In the Add Robot dialog box, configure the parameters.
    Parameter Description
    Chatbot name The name of the custom chatbot.
    Add to Group The DingTalk group to which the chatbot is added. This group cannot be changed.
    Custom Keywords After you specify custom keywords, messages can be sent only if these messages contain at least one of the specified keywords. You must add DataWorks as a keyword. This keyword is case-sensitive.
    Note You can specify a maximum of 10 keywords. A message can be sent only if it contains at least one of the specified keywords.
  8. Read the terms of service, select I have read and accepted <<DingTalk Custom Robot Service Terms of Service>>, and then click Finished.
  9. After you complete the security settings, copy the webhook URL of the chatbot and click Finished.
    Notice Keep the webhook URL confidential. If the webhook URL is leaked, your business is at risk.
  10. Go to the Rule Management page and click Create Custom Rule. In the Create Custom Rule dialog box, set the Notification Method parameter to DingTalk Chatbot, and paste the chatbot webhook URL that you copied from DingTalk in the Webhook Address column in the DingTalk Chatbot section.
    Create Custom Rule