Fully managed Flink allows you to configure alert rules for deployments that are running. If an alert rule is triggered when a deployment is running, the system sends you an alert to help you detect and handle exceptions at the earliest opportunity. This topic describes how to configure alert rules in the console of fully managed Flink.
Prerequisites
Application Real-Time Monitoring Service (ARMS) is activated. For more information, see Activate ARMS.
Limits
You cannot create custom rules for a Flink deployment when you deploy a draft for the deployment deployed in a session cluster.
Create a custom rule
Go to the Create Rule panel.
Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Deployments. On the Deployments page, click the name of the desired deployment.
Click the Alarm tab.
On the Alarm tab, click the Alarm Rules tab. In the upper-right corner of the Alarm Rules tab, choose .
You can also create alert rules based on an alert template on the Configurations page. To create alert rules by using an alert template, click Create Rule by Template, click the name of the desired template, and perform the subsequent steps. For more information about how to create an alert template, see Create an alert template.
In the Create Rule panel, configure the alert rule parameters.
Section
Parameter
Description
Rule
Name
The name must be 3 to 64 characters in length, and can contain lowercase letters, digits, and underscores (_). The name must start with a letter.
Description
The description of the rule.
Content
The conditions that trigger an alert. After you create the conditions, fully managed Flink compares the values of specified metrics with the thresholds that are specified in the conditions at the interval you specify. If one of the conditions resolves to true, an alert is triggered.
A condition consists of the following items:
Metrics:
Restart Count in 1 Minute: the number of times that the JobManager restarts deployments in one minute.
Checkpoint Count in 5 Minutes: the number of times that checkpointing succeeds in five minutes.
Emit Delay: the processing delay. This parameter specifies the difference between the time when data is generated and the time when data leaves the source operator. Unit: seconds.
ImportantThe time when the data is generated depends on the timestamp that is recorded in the external system. If no timestamp is recorded in the external system or the timestamp that is recorded when data is written to the external system is incorrect, the value of the Emit Delay parameter is invalid and cannot be used to determine the true processing delay.
IN RPS: the number of input data records per second.
OUT RPS: the number of output data records per second.
Source IdIe Time: the duration for which data is not processed in the source. Unit: milliseconds.
Job Failed: The deployment fails.
Time Interval: the interval at which a metric is collected. After fully managed Flink obtains two values of the metric and calculates the interval at which the two values of the metric are collected, fully managed Flink compares the calculation result with the specified threshold at the specified interval. If the historical data meets the specified conditions of the alert rule, an alert is triggered.
For example, if you set this parameter to 10 minutes, fully managed Flink obtains two values of a metric at an interval of 10 minutes on the vertical axis and compares the values with the specified threshold at the specified interval. An alert is triggered based on the value of the Comparator parameter.
If you set the Comparator parameter to the greater-than-or-equal-to sign (>=), the maximum value of the metric on the vertical axis at a 10-minute interval is used. If the maximum value of the metric is greater than or equal to the threshold, an alert is triggered.
If you set the Comparator parameter to the less-than-or-equal-to sign (<=), the minimum value of the metric on the vertical axis at a 10-minute interval is used. If the minimum value of the metric is less than or equal to the threshold, an alert is triggered.
Comparator: The greater-than-or-equal-to sign (>=) and the less-than-or-equal-to sign (<=) are supported.
Thresholds: the value that is used to compare with the value of a metric.
Effective Time
The time period during which the alert rule is effective. If you do not specify a time period, all alert rules are effective throughout the day. For example, you can specify a time period from 09:00 to 18:00.
Alarm Rate
You can set this parameter to a value in a range from 1 minute to 24 hours.
Notification
Notification
Valid values:
DingTalk
Email
SMS
Webhook
Phone
You can specify the phone number, email address, and DingTalk ID for an alert contact.
ImportantMake sure that the contact you added can receive alert notifications. Otherwise, alert notifications cannot be sent.
Notification object
The contacts to which alert notifications are sent. You can select multiple contacts. You can directly select or search for a contact. You must manage contacts before you select contacts.
To manage contacts, perform the following operations: Click Notification object management on the right side of Notification object. In the Edit Contact Group dialog box, click Edit in the Actions column on the Contact Group, Contact, Webhook, and DingTalk tabs separately, edit information, and then click Save.
For more information about how to create a webhook and add a DingTalk chatbot, see FAQ.
Alarm Noise Reduction
After you click Advanced Settings, you can turn on Alarm Noise Reduction.
After you turn on Alert Noise Reduction, the system does not send alert notifications if a deployment can quickly resume due to a short-period failover. For example, in cluster scheduling or automatic tuning scenario, a deployment may perform a failover for a short period of time. The system sends alert notifications only when the specified threshold condition is continuously met.
No Data Warning
After you click Advanced Settings, you can turn on No Data Warning and specify the time period during which no data is generated.
After you turn on this switch, data that is monitored based on codeless tracking is reported. If no data is reported during the specified time period, the system sends an alert notification. In most cases, if an issue, such as an exception of the JobManager, abnormal deployment cancellation, or an exception of the report trace, occurs, data that is monitored based on codeless tracking is reported.
Click OK.
After you create an alert rule, the rule is immediately effective. You can stop, edit, or delete the alert rule in the alert rule list.
Create an alert template
Go to the Create Rule Template panel.
You can use one of the following methods to go to the Create Rule Template panel:
Go to the Configurations page.
Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Configurations.
On the Alarm Templates tab, click Add Alarm Template.
Go to the Deployments page.
Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Deployments. On the Deployments page, click the name of the desired deployment.
Click the Alarm tab.
Click the Alarm Rules tab. On this tab, choose
.
In the Create Rule Template panel, configure the parameters for the alert template.
Section
Parameter
Description
Rule
Name
The name must be 3 to 64 characters in length, and can contain lowercase letters, digits, and underscores (_). The name must start with a letter.
Description
The description of the rule.
Content
The conditions that trigger an alert. After you create the conditions, fully managed Flink compares the values of specified metrics with the thresholds that are specified in the conditions at the interval you specify. If one of the conditions resolves to true, an alert is triggered.
A condition consists of the following items:
Metrics:
Restart Count in 1 Minute: the number of times that the JobManager restarts deployments in one minute.
Checkpoint Count in 5 Minutes: the number of times that checkpointing succeeds in five minutes.
Emit Delay: the processing delay. This parameter specifies the difference between the time when data is generated and the time when data leaves the source operator. Unit: seconds.
ImportantThe time when the data is generated depends on the timestamp that is recorded in the external system. If no timestamp is recorded in the external system or the timestamp that is recorded when data is written to the external system is incorrect, the value of the Emit Delay parameter is invalid and cannot be used to determine the true processing delay.
IN RPS: the number of input data records per second.
OUT RPS: the number of output data records per second.
Source IdIe Time: the duration for which data is not processed in the source. Unit: milliseconds.
Job Failed: The deployment fails.
Time Interval: the interval at which a metric is collected. After fully managed Flink obtains two values of the metric and calculates the interval at which the two values of the metric are collected, fully managed Flink compares the calculation result with the specified threshold at the specified interval. If the historical data meets the specified conditions of the alert rule, an alert is triggered.
For example, if you set this parameter to 10 minutes, fully managed Flink obtains two values of a metric at an interval of 10 minutes on the vertical axis and compares the values with the specified threshold at the specified interval. An alert is triggered based on the value of the Comparator parameter.
If you set the Comparator parameter to the greater-than-or-equal-to sign (>=), the maximum value of the metric on the vertical axis at a 10-minute interval is used. If the maximum value of the metric is greater than or equal to the threshold, an alert is triggered.
If you set the Comparator parameter to the less-than-or-equal-to sign (<=), the minimum value of the metric on the vertical axis at a 10-minute interval is used. If the minimum value of the metric is less than or equal to the threshold, an alert is triggered.
Comparator: The greater-than-or-equal-to sign (>=) and the less-than-or-equal-to sign (<=) are supported.
Thresholds: the value that is used to compare with the value of a metric.
Effective Time
The time period during which the alert rule is effective. If you do not specify a time period, all alert rules are effective throughout the day. For example, you can specify a time period from 09:00 to 18:00.
Alarm Rate
You can set this parameter to a value in a range from 1 minute to 24 hours.
Notification
Notification
Valid values:
DingTalk
Email
SMS
Webhook
Phone
You can specify the phone number, email address, and DingTalk ID for an alert contact.
ImportantMake sure that the contact you added can receive alert notifications. Otherwise, alert notifications cannot be sent.
Notification object
The contacts to which alert notifications are sent. You can select multiple contacts. You can directly select or search for a contact. You must manage contacts before you select contacts.
To manage contacts, perform the following operations: Click Notification object management on the right side of Notification object. In the Edit Contact Group dialog box, click Edit in the Actions column on the Contact Group, Contact, Webhook, and DingTalk tabs separately, edit information, and then click Save.
For more information about how to create a webhook and add a DingTalk chatbot, see FAQ.
Alarm Noise Reduction
After you click Advanced Settings, you can turn on Alarm Noise Reduction.
After you turn on Alert Noise Reduction, the system does not send alert notifications if a deployment can quickly resume due to a short-period failover. For example, in cluster scheduling or automatic tuning scenario, a deployment may perform a failover for a short period of time. The system sends alert notifications only when the specified threshold condition is continuously met.
No Data Warning
After you click Advanced Settings, you can turn on No Data Warning and specify the time period during which no data is generated.
After you turn on this switch, data that is monitored based on codeless tracking is reported. If no data is reported during the specified time period, the system sends an alert notification. In most cases, if an issue, such as an exception of the JobManager, abnormal deployment cancellation, or an exception of the report trace, occurs, data that is monitored based on codeless tracking is reported.
Click OK.
After you create an alert template, you can edit the template or delete the template from the alert template list.
FAQ
How do I add a webhook?
In the Create Rule Template panel or the Create Rule panel, click Notification object management.
In the Edit Contact Group dialog box, click the Webhook tab and click Add Webhook.
In the Add Webhook dialog box, configure the parameters. The following table describes the parameters.
Parameter
Description
Name
Required. The name of the webhook that you want to add.
URL
Required. The webhook URL.
Headers
Optional. The request headers that store cookies and tokens. The format is key: value.
NoteMake sure that a space exists after the colon (:) between the key and the value.
Params
Optional. The request parameters that are in the key: value format.
NoteMake sure that a space exists after the colon (:) between the key and the value.
Body
Required. The request body that is used to store the POST request parameters and parameter data.
You can use the $content placeholder in the request body. $content represents the actual alert message.
Click OK.
How do I add a DingTalk chatbot?
In the Create Rule Template panel or the Create Rule panel, click Notification object management.
On the DingTalk tab, click Add DingTalk.
In the Add DingTalk dialog box, configure the Name and URL parameters.
You must create a DingTalk chatbot to obtain the webhook URL of the chatbot. For more information, see Add a custom DingTalk chatbot and obtain the webhook URL.
ImportantTo ensure that you receive alerts from a DingTalk chatbot, select at least Custom Keywords in the Security Settings section of the Add Robot dialog box, and configure Alarm as a keyword.
Click Submit.