Fully managed Flink allows you to configure alert rules for jobs that are running. If an alert rule is triggered when a job is running, the system sends you an alert to help you detect and handle exceptions at the earliest opportunity. This topic describes how to configure alert rules in the console of fully managed Flink.

Prerequisites

Application Real-Time Monitoring Service (ARMS) is activated. For more information, see Activate and upgrade ARMS.

Create a custom alert rule

  1. Log on to the Realtime Compute for Apache Flink console.
  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
  3. In the left-side navigation pane, click Deployments.
  4. Click the name of the desired job.
  5. Click the Alarm Configuration tab.
  6. Click the Alarm Rules tab.
  7. On the right side of the Alarm Rules tab, click Add Rule and select Custom Rule.
    You can also create alert rules from an alert template. To do this, select Create Rule by Template from the Add Rule drop-down list and perform subsequent steps. For more information about how to create an alert template, see Create an alert template.
  8. In the Create Rule panel, configure the alert rule parameters. The following table describes the alert rule parameters.
    Alert rule parameters
    Section Parameter Description
    Rule Name The name must be 3 to 64 characters in length, and can contain lowercase letters, digits, and underscores (_). The name must start with a letter.
    Description The description of the rule.
    Content The conditions that trigger an alert. After you create the conditions, fully managed Flink compares the values of specified metrics with the thresholds that are specified in the conditions at the interval you specify. If one of the conditions resolves to true, an alert is triggered.
    A condition consists of the following items:
    • Metrics:
      • Restart Count in 1 Minute: the number of times that the JobManager restarts jobs in one minute.
      • Checkpoint Count in 5 Minutes: the number of times that checkpointing succeeds in five minutes.
      • Emit Delay: the processing delay. This parameter specifies the difference between the time when the data is generated and the time when the data leaves the source operator. Unit: seconds.
        Note The time when the data is generated depends on the timestamp that is recorded in the external system. If no timestamp is recorded in the external system or the timestamp that is recorded when data is written to the external system is incorrect, the value of the Emit Delay parameter is invalid and cannot be used to determine the true processing delay.
      • IN RPS: the total number of records that are read by all source operators per second.
      • OUT RPS: the total number of records that are written by all sink operators per second.
    • Time Interval: the interval at which the metric is collected. After fully managed Flink obtains two values of a specified metric and calculates the interval at which the two values of the specified metric are collected, fully managed Flink compares the calculation result with the threshold that is specified in the conditions at the interval you specify.
    • Comparator: the greater-than-or-equal-to sign (>=) and the less-than-or-equal-to sign (<=) are supported.
    • Thresholds: the value that is used to compare with the value of a specified metric.
    Effective Time The time period during which the alert rule is effective. If you do not specify a time period, all alert rules are effective throughout the day. For example, you can specify a time period from 09:00 to 18:00.
    Alarm Rate You can set this parameter to a value between 1 minute to 24 hours.
    Notification Notification Valid values:
    • DingTalk
    • Email
    • SMS
    • Webhook
    Note You can specify the phone number, email address, and DingTalk ID for an alert contact.
    Contact Group The contact groups or contacts that you can add or edit. If you click Edit Contacts Group and Contacts, the Edit Contact Group dialog box appears. Then, you can add a webhook and configure a DingTalk chatbot to send alerts. For more information, see FAQ.
  9. Click OK.
    After you create an alert rule, the rule is immediately effective. You can stop, edit, or delete the alert rule in the alert rule list.

Create an alert template

  1. Log on to the Realtime Compute for Apache Flink console.
  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
  3. In the left-side navigation pane, click Deployments.
  4. Click the name of the desired job.
  5. Click the Alarm Configuration tab.
  6. Click the Alarm Rules tab.
  7. On the right side of the Alarm Rules tab, click Add Rule and choose Create Rule by Template > Add Rule Template.
  8. In the Create Rule Template panel, configure the parameters. The following table describes the parameters.
    Alert template
    Section Parameter Description
    Rule Name The name must be 3 to 64 characters in length, and can contain lowercase letters, digits, and underscores (_). The name must start with a letter.
    Description The description of the rule.
    Content The conditions that trigger an alert. After you create the conditions, fully managed Flink compares the values of specified metrics with the thresholds that are specified in the conditions at the interval you specify. If one of the conditions resolves to true, an alert is triggered.
    A condition consists of the following items:
    • Metrics:
      • Restart Count in 1 Minute: the number of times that the JobManager restarts jobs in one minute.
      • Checkpoint Count in 5 Minutes: the number of times that checkpointing succeeds in five minutes.
      • Emit Delay: the processing delay. This parameter specifies the difference between the time when the data is generated and the time when the data leaves the source operator. Unit: seconds.
        Note The time when the data is generated depends on the timestamp that is recorded in the external system. If no timestamp is recorded in the external system or the timestamp that is recorded when data is written to the external system is incorrect, the value of the Emit Delay parameter is invalid and cannot be used to determine the true processing delay.
      • IN RPS: the total number of records that are read by all source operators per second.
      • OUT RPS: the total number of records that are written by all sink operators per second.
    • Time Interval: the interval at which the metric is collected. After fully managed Flink obtains two values of a specified metric and calculates the interval at which the two values of the specified metric are collected, fully managed Flink compares the calculation result with the threshold that is specified in the conditions at the interval you specify.
    • Comparator: the greater-than-or-equal-to sign (>=) and the less-than-or-equal-to sign (<=) are supported.
    • Thresholds: the value that is used to compare with the value of a specified metric.
    Effective Time The time period during which the alert rule is effective. If you do not specify a time period, all alert rules are effective throughout the day. For example, you can specify a time period from 09:00 to 18:00.
    Alarm Rate You can set this parameter to a value between 1 minute to 24 hours.
    Notification Notification Valid values:
    • DingTalk
    • Email
    • SMS
    • Webhook
    Note You can specify the phone number, email address, and DingTalk ID for an alert contact.
    Contact Group The contact groups or contacts that you can add or edit. If you click Edit Contacts Group and Contacts, the Edit Contact Group dialog box appears. Then, you can add a webhook and configure a DingTalk chatbot to send alerts. For more information, see FAQ.
  9. Click OK.
    After you create an alert template, you can edit or delete the template from the alert template list.

FAQ

  • How do I add a webhook?
    1. Click Edit Contacts Group and Contacts.
    2. In the Edit Contact Group dialog box, click the Webhook tab and click Add Webhook.
    3. In the Add Webhook dialog box, configure the parameters. The following table describes the parameters. Add a webhook
      Parameter Description
      Name Required. The name of the webhook that you want to add.
      URL Required. The webhook URL.
      Headers Optional. The request headers that store cookies and tokens. The format is key: value.
      Note Make sure that a space exists after the colon (:) between the key and the value.
      Params Optional. The request parameters that are in the key: value format.
      Note Make sure that a space exists after the colon (:) between the key and the value.
      Body Required. The request body that is used to store the POST request parameters and parameter data.

      You can use the $content placeholder in the request body. $content represents the actual alert message.

    4. Click OK.
  • How do I create a DingTalk chatbot to send alerts?
    1. Click Edit Contacts Group and Contacts.
    2. On the Contact tab, click Add Contact.
    3. In the Add Contact dialog box, enter the webhook URL of the chatbot in the DingTalk Robot field.
      You must create a DingTalk chatbot to obtain the webhook URL of the chatbot. For more information, see Add a custom DingTalk chatbot and obtain the webhook URL.
      Notice To ensure that you receive alerts from a DingTalk chatbot, select at least Custom Keywords in the Security Settings section of the Add Robot dialog box, and configure Alarm as a keyword.
    4. Click Submit.