Custom alert rules allow you to monitor the status of specified node instances based on your business requirements. This helps you identify and handle exceptions at the earliest opportunity. This topic describes how to create a custom alert rule on the Rule Management page. This topic also describes how to add a DingTalk chatbot and obtain the webhook URL of the chatbot.

Limits

  • Custom alert rules take effect only on auto triggered node instances.
  • You can use the phone call alerting feature only in DataWorks Professional Edition or more advanced editions.
  • The webhook alerting feature has the following limits:
    • You can use the webhook alerting feature only in DataWorks Enterprise Edition or Ultimate Edition.
    • Custom alert rules and baselines support the webhook alerting feature only in the Germany (Frankfurt) and Singapore (Singapore) regions.
    • DataWorks allows you to use the webhook alerting feature to send alert notifications only to Enterprise WeChat and Feishu.

Create a custom alert rule

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Click the More icon icon in the upper-left corner and choose All Products > Task Operation > Operation Center. The Operation Center page appears.
  3. In the left-side navigation pane, choose Alarm > Rule Management.
  4. On the Rule Management page, click Create Custom Rule in the upper-right corner.
  5. In the Create Custom Rule dialog box, set the parameters that are described in the following table.
    Create Custom Rule
    Section Parameter Description
    General Rule Name The name of the custom alert rule.
    Object Type The type of object that you want to monitor. Valid values: Node, Baseline, Workspace, Workflow, Exclusive Resource Groups for Scheduling, and Exclusive Resource Groups for Data Integration.
    Object If you set the Object Type parameter to Node, Baseline, Workspace, or Workflow, you must specify one or more objects. After you specify the name or ID of an object that you want to monitor, select the object from the drop-down list. Then, click the Add icon icon.
    Resource Group Name If you set the Object Type parameter to Exclusive Resource Groups for Scheduling or Exclusive Resource Groups for Data Integration, you must select a resource group name.
    Trigger Condition Trigger Condition If you set the Object Type parameter to Node, Baseline, Workspace, or Workflow, the valid values of the Trigger Condition parameter are:
    • Completed

      Node instances are monitored from the time when they start to run. When the node instances are successfully run, an alert is reported.

    • Uncompleted

      Node instances are monitored from the time when they start to run. If the node instances are still running at the specified point in time, an alert is reported. For example, a node instance is scheduled to run at 01:00, and you set the alert time to 02:00. If the node instance is still running at 02:00, an alert is reported.

    • Error

      Node instances are monitored from the time when they start to run. If an error occurs when the node instances are running, an alert is reported.

      If an error occurs for a node instance, the Icon icon is displayed in the General column on the Cycle Instance page under Cycle Task Maintenance in Operation Center.

    • Uncompleted in Cycle

      If node instances are still running at the end of the specified cycle, an alert is reported. In most cases, you can configure this trigger condition for node instances that are scheduled by hour.

      For example, Node A is scheduled to run every 2 hours, and each run takes 25 minutes. If Node A starts to run at 00:00 every day, the node runs 12 times within 24 hours. The first cycle starts at 00:00, the second cycle starts at 02:00, and this goes on until the twelfth cycle. The twelfth cycle starts at 22:00. If the node runs as expected, the node instance in each cycle stops running at the specified point in time, such as 00:25 or 02:25. If the node instance is still running at the specified point in time in a cycle, an alert is reported.
      Note You can configure the Uncompleted in Cycle trigger condition to monitor nodes in workflows.

      If the Trigger Condition parameter is set to Uncompleted in Cycle for workflows, the system monitors nodes that are scheduled by day, hour, or minute in the workflows based on the cycle number (N) that you specified. If the number of node instances for a node is less than the value of N, the system ignores the alerts reported for the node.

      For example, you set the cycle number to 3, and two nodes are configured in a workflow. The following examples show detailed alerting and monitoring information:
      • Node A is scheduled by hour: Node A is scheduled to run every 2 hours, and each run takes 25 minutes. If Node A starts to run at 00:00 every day, the node runs 12 times within 24 hours. The first cycle starts at 00:00, and the third cycle starts at 04:00. If the node runs as expected, the node instance in the third cycle stops running at 04:25. If the node instance in the third cycle is still running at 04:25, an alert is reported.
      • Node B is scheduled by minute: Node B is scheduled to run every 10 minutes, and each run takes 2 minutes. If Node B starts to run at 00:00 every day, the node runs six times within 1 hour. The first cycle starts at 00:00, and the third cycle starts at 00:20. If the node runs as expected, the node instance in the third cycle stops running at 00:22. If the node instance in the third cycle is still running at 00:22, an alert is reported.
    • Overtime

      Node instances are monitored from the time when they start to run. If the node instances are still running after the specified period ends, an alert is reported. In most cases, you can configure this trigger condition to monitor the duration of node instances.

    • The error persists after the node automatically reruns

      Node instances are monitored from the time when they start to run. If an error still occurs after the node instances are rerun, an alert is reported.

    Trigger Condition If you set the Object Type parameter to Exclusive Resource Groups for Scheduling or Exclusive Resource Groups for Data Integration, the valid values of the Trigger Condition parameter are:
    • Resource Group Usage: If the resource usage is greater than a specific percentage for a specific period of time, an alert is reported.

      For example, if the resource usage is greater than 50% for 15 minutes, an alert is reported.

    • Number of Instances Waiting for Resources in Resource Group: If the number of node instances that are waiting for resources is greater than a specific number for a specific period of time, an alert is reported.

      For example, if the number of node instances that are waiting for resources is greater than 10 for 15 minutes, an alert is reported.

    Alert Details Notification Method The methods used to send alert notifications. Valid values: Email, SMS, Phone, DingTalk Chatbot, and WebHook. You can add a DingTalk chatbot to receive alert notifications. For more information about how to send alert notifications to a DingTalk group, see the following section. If you want to send alert notifications to multiple DingTalk groups, add multiple webhook URLs.
    Notice
    • You can use the phone call alerting feature only in DataWorks Professional Edition or more advanced editions.
    • If you set the Notification Method parameter to Phone, you must select To prevent users from receiving alarm calls too frequently, DataWorks filters alarm calls in advance. A user can receive at most one alarm call within 20 minutes. Other alarm calls will be downgraded and converted to SMS messages.
    • Only the webhook URLs of DingTalk chatbots are supported.
    Recipient The user who receives alert notifications. Valid values: Node Owner, Varies According to Shift Schedule, and Others.
    Alerting Frequency Control Maximum Alerts The maximum number of times an alert is reported. If the number of times an alert is reported exceeds the specified threshold, the alert is no longer reported.
    Minimum Alert Interval The minimum interval at which an alert is reported.
    Quiet Hours The specified period during which no alerts are reported.
  6. Click OK. An alert rule is created.
    On the Rule Management page, you can find the created alert rule and click View Details in the Actions column to view the details of the alert rule.

Send alert notifications to a DingTalk group

  1. Go to the DingTalk group to which you want to send alert notifications and click the Group Settings icon in the upper-right corner.
  2. In the Group Settings panel, click Group Assistant.
  3. In the Group Assistant panel, click Add Robot.
  4. In the ChatBot dialog box, click the Add icon icon.
  5. In the Please choose which robot to add section, click Custom.
  6. In the Robot details message, click Add.
  7. In the Add Robot dialog box, set the parameters that are described in the following table.
    Parameter Description
    Chatbot name The name of the custom chatbot.
    Add to Group The DingTalk group to which the chatbot is added. This group cannot be changed.
    Custom Keywords After you specify custom keywords, messages are sent only if these messages contain the specified keywords. We recommend that you specify the keyword DataWorks.
    Note You can specify a maximum of 10 keywords. A message can be sent only if it contains at least one of the specified keywords.
  8. Read the terms of service, select I have read and accepted <<DingTalk Custom Robot Service Terms of Service>>, and then click Finished.
  9. After you complete the security settings, copy the webhook URL of the chatbot and click Finished.
    Notice
    • Save the copied webhook URL and paste it in the Webhook Address field when you create an alert rule.
    • Keep the webhook URL confidential. If the webhook URL is leaked, your business is at risk.