All Products
Search
Document Center

ApsaraMQ for Kafka:Configure alerts in Control Center

Last Updated:Mar 22, 2024

Control Center allows you to detect anomalous events in monitoring data and configure alerts. You can specify email addresses to receive notifications about cluster faults at the earliest opportunity. An alert consists of a trigger and one or more actions. Each trigger is based on a metric with condition value criteria that determine when the trigger fires. When the criteria are met, all actions that are associated with the trigger are executed. This topic describes how to configure alerts for ApsaraMQ for Confluent clusters in Control Center.

Alert metrics

Metrics

Type

Name

Description

Broker trigger

Bytes in

Number of bytes produced by a broker per second.

Bytes out

Number of bytes fetched from a broker per second. Internal replication traffic is not considered.

Fetch request latency

The latency of fetch requests to a broker at the median, 95th, 99th, or 99.9th percentile. Unit: milliseconds.

Production request count

The total number of production requests to a broker per minute.

Production request latency

The latency of production requests to a broker at the median, 95th, 99th, or 99.9th percentile. Unit: milliseconds.

Cluster trigger

Cluster down

Indicates whether a monitored cluster is shut down.

Leader election rate

The number of partition leader elections.

Offline topic partitions

The total number of topic partitions that are offline in a cluster. Topic partitions can be offline if brokers with replicas are down or if unclean leader election is disabled and the replicas are not in sync and thus none can be elected leader. In the case that none can be elected leader, you need to ensure that no messages are lost. When you create a trigger, set the corresponding parameter to a value greater than 0.

Unclear election count

The number of unclean partition leader elections in the cluster reported in the last interval. If unclean partition leader election is held among out-of-sync replicas, data loss may occur if messages were not synced before the loss of the former leader. Therefore, if the number of unclean elections is greater than 0, query the broker logs to determine why leaders were re-elected and search for warning or error messages. We recommend that you set the broker configuration parameter unclean.leader.election.enable to false. This way, a replica outside of the set of in-sync replicas is never elected leader. When you create a trigger, set the corresponding parameter to a value other than 0.

Under replicated topic partitions

The total number of topic partitions whose number of in-sync replicas is less than the replication factor. When you create a trigger, set the corresponding parameter to a value greater than 0.

ZK Disconnected

Indicates whether brokers can connect to ZooKeeper. Valid values:

  • Offline

  • Online

ZooKeeper expiration rate

The rate at which ZooKeeper sessions expirations occur in brokers.

Consumer group trigger

Average latency

The average latency of a consumer group. To monitor this metric, you must configure a Confluent Monitoring Interceptor for clients in the consumer group. Unit: milliseconds.

Consumer lag

How far behind consumer applications are while they consume messages from producer applications. The consumer lag is the difference between the end offset and the current offset.

Consumer lead

How far ahead consumer applications are while consuming from producer applications. The consumer lead is the difference between the current offset and the beginning offset. For example, a consumer at offset 15 in a partition that starts at offset 0 has a lead of 15. This alert metric indicates when consumption is close to the earliest available messages. This metric can be used to determine whether data loss occurred.

Consumption difference

The difference between the expected consumption value and the actual consumption value within a specific time period. In most cases, a gap that is very close to real time exists between the expected consumption value and the actual consumption value. This gap diminishes over time.

Maximum latency

The maximum latency of a consumer group. To monitor this metric, you must configure a Confluent Monitoring Interceptor for clients in the consumer group. Unit: milliseconds.

Topic trigger

Bytes in

The number of bytes that come into a topic per second.

Bytes out

The number of bytes that go out of a topic per second. Internal replication traffic is not considered.

Out of sync replica count

The total number of topic partition replicas that are in sync with the leader in a cluster. This metric indicates the sum of partitions, which is the product of topic partitions and topic replication factor.

Production request count

The number of production requests to a topic in a cluster.

Under replicated topic partitions

The number of under-replicated topic partitions. A use case for this metric is to determine whether a Kafka broker crashes when the broker holds a specific topic partition.

Conditions

A trigger fires when a condition is true for the comparison between the detected value of a monitored metric and the threshold that you set. Valid values:

  • Equal to

  • Greater than

  • Less than

  • Not equal to

Create a trigger

  1. Log on to Control Center.

  2. In the top navigation bar, click the image.png icon.

  3. On the Overview page, click the Triggers tab and then click Add a trigger.

  4. On the New trigger page, specify the trigger name and trigger condition. Then, click Save.

    After the trigger is created, you can click the name of the trigger on the Triggers tab and then modify or delete the trigger in the lower part of the page that appears.

Create an action form

  1. On the Overview page, click the Actions tab and then click Add an action.

  2. On the New action page, configure the parameters and click Save. The following table describes the parameters.

    Parameter

    Description

    Action Name

    Specify the action name.

    Triggers

    Select a trigger.

    Action

    Select the action type. Valid values:

    Subject

    Specify the email addresses of one or more alert contacts. This setting is required only if the Action parameter is set to Send email. Each time the action is executed, an email is sent to the specified email addresses. Separate multiple email addresses with commas (,).

    Max send rate

    The maximum rate at which the action is performed. This parameter must be used in combination with the Frequency parameter.

    For example, you can set this parameter to 1 and the Frequency parameter to Per day to send the alert once a day.

    Frequency

    This parameter must be used in combination with the Max send rate parameter. Valid values: Per hour, Per minute, Per 4 hours, Per 8 hours, and Per day. Default value: Per hour.

    For example, you can set this parameter to Per day and the Max send rate parameter to 1 to send the alert once a day.

  3. Click Save.

    After the action is created, you can click the name of the action on the Actions tab and then modify or delete the action on the page that appears.

Pause and resume all alert actions

For maintenance or troubleshooting reasons, you can pause all enabled alerts when necessary. Existing settings for individual actions that are enabled or disabled are respected during pauses and resumes. Trigger conditions that are met and fired are ignored when paused, and all enabled actions associated with the trigger are suppressed. After the actions are resumed, triggers fire and notifications are sent when the conditions are met. If you stop and restart ApsaraMQ for Confluent or Control Center, the paused actions resume and become active again.

Procedure

  1. On the Overview page, click the Actions tab.

  2. Turn on the Pause all actions switch.

  3. Read the message that appears and click Confirm.

    If you want to enable the actions again, repeat the preceding steps and turn on the Pause all actions switch.

Resume paused alert actions

  1. On the Overview page, click the Actions tab.

  2. Turn off the Pause all actions switch.

  3. Read the message that appears and click Confirm.

Disable or enable an alert action

When you create an Action, it is enabled by default. If you do not want an action to be active, disable it. Pausing and resuming actions respect the disabled setting for an action. Resuming paused alerts does not activate disabled alert actions.

Procedure

  1. On the Overview page, click the Actions tab.

  2. On the Actions tab, click the action that you want to manage.

  3. On the details page of the action, click Edit and turn off the Enabled switch.

    If you want to enable the action again, repeat the preceding steps and turn on the Enable switch.

References

For more information about alert settings, see Control Center Alerts for Confluent Platform.