All Products
Search
Document Center

Application Real-Time Monitoring Service:Use Alibaba Cloud Managed Service for Prometheus to monitor self-managed Kafka clusters and Message Queue for Apache Kafka instances

Last Updated:Jan 12, 2024

This topic describes how to use Alibaba Cloud Managed Service for Prometheus to monitor Message Queue for Apache Kafka instances and self-managed Kafka clusters.

Challenges of using a self-managed Prometheus service to monitor Message Queue for Apache Kafka instances and self-managed Kafka clusters

If you use a self-managed Prometheus service to monitor Message Queue for Apache Kafka instances and self-managed Kafka clusters, you may need to handle the following challenges:

  1. To ensure security and facilitate organization management, it is highly likely that you deploy your business in separate virtual private clouds (VPCs). If you want to use a self-managed Prometheus service to monitor your business, you must deploy the Prometheus service in each VPC. This increases the deployment costs and O&M costs.

  2. You must configure Prometheus, Grafana, and Alertmanager in each independent self-managed monitoring system, which is complex and requires a long period of time to complete.

  3. In some cases, the JMX agent of open source Apache Kafka consumes a large amount of CPU resources. This causes some impacts on self-managed Kafka clusters.

  4. You cannot use the self-managed Prometheus service to monitor Message Queue for Apache Kafka instances. As a result, you cannot monitor your messaging clusters in a one-stop and centralized manner.

  5. If your self-managed Kafka cluster is deployed on an Elastic Compute Service (ECS) instance, the self-managed Prometheus service cannot flexibly define and capture targets based on ECS tags due to the lack of the service discovery mechanism. If you want to implement a similar mechanism, you must write code in Golang to call the POP API of Alibaba Cloud ECS to integrate the open source Prometheus service. Then, you must compile and package the code, and then deploy the open source Prometheus service. This process is complex and causes great trouble in version upgrades.

  6. The commonly used open source Grafana dashboards are not designed for specific services. You cannot customize monitoring metrics based on the principles and best practices of Apache Kafka.

  7. No alert template is available for monitoring Apache Kafka. You must configure alert rules on your own. This process requires manpower and has high technical requirements.

Comparison between a self-managed Prometheus service and Alibaba Cloud Managed Service for Prometheus

The following table compares a self-managed Prometheus service with Alibaba Cloud Managed Service for Prometheus in monitoring Message Queue for Apache Kafka instances and self-managed Kafka clusters.

Item

Self-managed Prometheus service

Alibaba Cloud Managed Service for Prometheus

Deployment costs and O&M costs

You must purchase ECS instances to deploy Prometheus, Grafana, and Alertmanager in multiple VPCs. This incurs high O&M costs.

Alibaba Cloud Managed Service for Prometheus is a fully managed service that is provided for immediate use and integrates Prometheus, Grafana, and Alertmanager.

Availability, performance, and storage capacity

The overall performance and high availability performance are poor, and the storage capacity is small.

The overall performance and high availability performance are high, and the storage capacity is large.

Exporter performance

In some cases, the JMX agent of open source Apache Kafka consumes a large amount of CPU resources. This causes some impacts on self-managed Kafka clusters.

Alibaba Cloud Managed Service for Prometheus continuously optimizes the performance and improves the stability of JMX agents of open source Apache Kafka.

Service discovery

The service discovery of ECS instances is performed by using the open source static configurations or a third-party service registry. The service discovery process is complex and the O&M cost is high.

Alibaba Cloud Managed Service for Prometheus is compatible with open source service discovery features and provides aliyun_sd_configs. Similar to the LabelSelector for Kubernetes service discovery, you can use ECS tags to identify target ECS instances. This simplifies the configuration and O&M of service discovery.

Grafana dashboard

The Grafana dashboard displays only the collected metrics. You cannot customize the monitoring metrics based on the principles and best practices of Apache Kafka.

Alibaba Cloud Managed Service for Prometheus provides a professional dashboard template for monitoring Apache Kafka. You can use the dashboard to quickly and accurately understand the running status of the entire Apache Kafka process and troubleshoot issues.

Alert rule

No alert template is available for monitoring Apache Kafka. You must configure the alert rules.

Alibaba Cloud Managed Service for Prometheus provides professional and flexible alert metric templates based on the best practices of monitoring Apache Kafka. You can configure alert rules on the GUI.

Unified service

The self-managed Prometheus service is deployed in multiple VPCs, and the service cannot be used to monitor Message Queue for Apache Kafka instances. As a result, you cannot monitor your messaging clusters in a one-stop and centralized manner.

Alibaba Cloud Managed Service for Prometheus is a fully managed service that is integrated into Message Queue for Apache Kafka. Message Queue for Apache Kafka provides a native overall monitoring system.

Use Alibaba Cloud Managed Service for Prometheus to monitor Message Queue for Apache Kafka

Alibaba Cloud Managed Service for Prometheus is integrated into Message Queue for Apache Kafka. The main metrics include:

  • The traffic of instances, groups, and topics

  • The message accumulation of groups and topics

  • The disk usage of instances

  • The rebalance metrics of groups

View Message Queue for Apache Kafka dashboards

Message Queue for Apache Kafka provides three monitoring dashboards for instances, groups, and topics. You can view data on the dashboards to understand the production and consumption of messages and quickly identify issues.

Instance dashboard

  1. Log on to the ApsaraMQ for Kafka console. In the left-side navigation pane, click Instances.

  2. Click the name of the Message Queue for Apache Kafka instance that you want to view. In the left-side navigation pane, click Prometheus Monitoring to view the monitoring data of the instance.

Consumer group dashboard

  1. Log on to the ApsaraMQ for Kafka console. In the left-side navigation pane, click Instances.

  2. Click the name of the Message Queue for Apache Kafka instance that you want to view. In the left-side navigation pane, click Groups. On the page that appears, click the ID of the group that you want to view and click the Prometheus Monitoring tab to view the monitoring data of the group.

Topic dashboard

  1. Log on to the ApsaraMQ for Kafka console. In the left-side navigation pane, click Instances.

  2. Click the name of the Message Queue for Apache Kafka instance that you want to view. In the left-side navigation pane, click Topics. On the page that appears, click the name of the topic that you want to view and click the Prometheus Monitoring tab to view the monitoring data of the topic.

Use Alibaba Cloud Managed Service for Prometheus to configure alert rules for Message Queue for Apache Kafka

  1. Log on to the ARMS console.
  2. In the left-side navigation pane, choose Prometheus Service > Prometheus Instances.
  3. Click the name of the Prometheus instance instance that you want to manage to go to the Integration Center page.

  4. Click the Cloud Service Self-monitoring Integration tab and click the Message Queue for Apache Kafka card in the Installed section. In the panel that appears, click the Alerts tab to view Prometheus alerts of Message Queue for Apache Kafka. Alibaba Cloud Managed Service for Prometheus provides 13 key alert metrics for Message Queue for Apache Kafka instances, groups, and topics. You can add alert rules based on your business requirements. For more information, see Create an alert rule for a Prometheus instance.

Use Alibaba Cloud Managed Service for Prometheus to monitor self-managed Kafka clusters

You can also use Alibaba Cloud Managed Service for Prometheus to monitor self-managed Kafka clusters that are deployed in an ECS environment or container service environment, such as Container Service for Kubernetes (ACK), Serverless Kubernetes (ASK), and registered clusters. Alibaba Cloud Managed Service for Prometheus provides the basic edition and advanced edition of Kafka application components:

  • Kafka (basic edition): Basic metrics such as the number of brokers, the topic partitions, and the message group lag are collected. To use Alibaba Cloud Managed Service for Prometheus, you do not need to configure or restart the Kafka broker.

  • Kafka (advanced edition): The JMX agent collects basic metrics and the important metrics of producers, brokers, consumers, and internal modules. You can monitor the entire process of Apache Kafka messages based on the perspective of an expert by using the metrics. To use Alibaba Cloud Managed Service for Prometheus, you must start the JMX agent and restart the Kafka broker process.

When you use Alibaba Cloud Managed Service for Prometheus to monitor self-managed Kafka clusters, you must also focus on internal O&M metrics. You must store the important metrics of Kafka producers, brokers, consumers, and internal modules to analyze and troubleshoot possible problems in each phase of Kafka messages. We recommend that you use the advanced edition of Kafka application component to understand the overall status of self-managed Kafka clusters.

Use the Kafka (basic edition) application component provided by Alibaba Cloud Managed Service for Prometheus to monitor self-managed Kafka clusters

Deploy the Kafka (basic edition) application component for self-managed Kafka clusters

  1. Log on to the ARMS console.

  2. In the left-side navigation pane, click Integration Center. In the Application Components section, click + Add on the Kafka (Basic Edition) card and perform the following steps.

    1. In the STEP1 section, select the environment where you want to deploy the Kafka exporter.

    2. In the STEP2 section, select the Prometheus instance where you want the Kafka exporter to reside.

    3. On the Configuration tab in the STEP3 section, configure parameters and click OK. The following table describes the parameters.

      Parameter

      Description

      Exporter Name

      The unique name of the exporter.

      kafka address

      The endpoint of the self-managed Kafka broker. Separate multiple broker addresses with commas (,) or semicolons (;).

      • If your Kafka instance is deployed in a container service environment, you can enter the IP address or service address of the Kafka broker in this field.

      • If your Kafka instance is deployed in an ECS environment, you can enter the IP address or domain name system (DNS) address of the broker in this field.

      Metrics scrape interval (seconds)

      The interval at which you want the service to collect monitoring data.

      kafka version

      The version number of the Kafka broker. The latest version is V3.2.0.

      SASL enabled

      Specifies whether to enable the Simple Authentication and Security Layer (SASL) feature on the Apache Kafka broker.

      SASL username

      This field is required if you enable SASL.

      SASL password

      This field is required if you enable SASL.

      SASL mechanism

      The SASL mechanism. The following authentication mechanisms are supported: PLAIN, SCRAM-SHA-512, and SCRAM-SHA-256.

      TLS enabled

      Specifies whether to enable the Transport Layer Security (TLS) feature on the Apache Kafka broker.

      insecure skip TLS verify

      Set this field to Enabled if TLS is enabled on the Kafka broker and a self-signed TLS certificate is used during authentication.

View the dashboards of self-managed Kafka clusters

  1. Log on to the ARMS console.
  2. In the left-side navigation pane, choose Prometheus Service > Prometheus Instances.
  3. Click the name of the Prometheus instance instance that you want to manage to go to the Integration Center page.

  4. Click the Cloud Service Self-monitoring Integration tab and click the Kafka (Basic Edition) card in the Installed section. In the panel that appears, click the Dashboards tab and click the diagram of the Grafana dashboard that you want to view.

    The dashboards of Kafka (basic edition) application component display the following information:

    • The number of Kafka brokers.

    • The number of partitions in each topic.

    • The numbers of inbound messages, outbound messages, and accumulated messages in each topic.

    • The number of in-sync replicas (ISRs) in each topic.

Configure alert rules for self-managed Kafka clusters

On the Integration Center page that appears, click the Cloud Service Self-monitoring Integration tab. In the Installed section, click the Kafka (Basic Edition) card. In the panel that appears, click the Alerts tab to view the Prometheus alerts. You can add alert rules based on your business requirements. For more information, see Create an alert rule for a Prometheus instance.

Use the Kafka (advanced edition) application component provided by Alibaba Cloud Managed Service for Prometheus to monitor self-managed Kafka clusters

Deploy the Kafka (advanced edition) application component for self-managed Kafka clusters

  1. Log on to the ARMS console.

  2. In the left-side navigation pane, click Integration Center. In the Application Components section, click + Add on the Kafka (Advanced Edition) card and perform the following steps.

    1. In the STEP1 section, select the environment where you want to deploy the Kafka exporter.

    2. In the STEP2 section, select the Prometheus instance where you want the Kafka exporter to reside.

    3. On the Configuration tab in the STEP3 section, configure parameters and click OK. The following table describes the parameters.

      Parameter

      Description

      Instance name

      The unique name of the exporter.

      Kafka instance name

      The name of the Kafka instance that you want to monitor. You can specify an instance name on the dashboard to view the producer, broker, and consumer of a Kafka cluster.

      JMX Agent listening port

      The listening port that is specified when the JMX agent is deployed.

      Metrics path

      The HTTP path that is used by Prometheus to collect monitoring data from the JMX agent. Default value: /metrics.

      Metrics scrape interval (seconds)

      The interval at which you want the service to collect monitoring data.

      Pod/ECS Label Key (service discovery)

      The key and value that are specified for the pod or ECS instance when the JMX agent is deployed. Prometheus uses this key-value pair for service discovery.

      Pod/ECS Label value

View the dashboard of self-managed Kafka clusters

  1. Log on to the ARMS console.
  2. In the left-side navigation pane, choose Prometheus Service > Prometheus Instances.
  3. Click the name of the Prometheus instance instance that you want to manage to go to the Integration Center page.

  4. Click the Cloud Service Self-monitoring Integration tab and click the Kafka (Advanced Edition) card in the Installed section. In the panel that appears, click the Dashboards tab and click the diagram of a Grafana dashboard that you want to view. The Kafka (advanced edition) application component provides dashboards based on instances and topics.

    • Instance dashboard

      The metrics of Kafka brokers:

      • Core metrics: the numbers of brokers, offline partitions, under-replicated partitions, and controllers, and information about CPUs and networks

      • Java Virtual Machine (JVM) metrics: the key information about the JVM memory and garbage collection (GC)

      • Partition metrics: the partition information, such as the partition quantity, ISR, unclean leader election, replica lag, offline partitions, and under-replicated partitions

      • Time metrics: the time metrics in the Produce, Request, and Fetch phases

      • Cluster traffic metrics: the overall traffic metrics of the cluster.

      • Broker traffic metrics: the traffic details by broker.

    • Topic dashboard

      The metrics of Apache Kafka topics:

      • Producer: the key metrics of the producer, including the message sending rate, message compression ratio, and message sending latency

      • Server (Kafka broker): the number of partitions in a topic, and the rates and traffics of inbound messages and outbound messages

      • Consumer: the message consumption rate, message consumption latency, and rebalance information

Configure alert rules for self-managed Kafka clusters

Log on to the Managed Service for Prometheus console. Click the name of the Prometheus instance you want to manage. On the Integration Center page that appears, click the Cloud Service Self-monitoring Integration tab. In the Installed section, click the Kafka (Advanced Edition) card. In the panel that appears, click the Alerts tab to view the Prometheus alerts.

  • Producer: Three alert metrics are provided, including the message sending failure rate, message sending duration, and message sending retry rate. You can use the metrics to identify exceptions on the producer.

  • Instance: Thirteen alert metrics are provided, including topics with excessive partitions, offline partitions, unclean leader election, under-replicated partitions, a decrease in effective brokers, the number of effective controllers, the number of rejected messages, numbers of inbound messages and outbound messages of the instance, and numbers of inbound messages and outbound messages of topics. You can use the metrics to identify exceptions on the broker.

  • Consumer: The alert metric for message accumulation is provided. You can use this metric to identify exceptions on consumption.

You can also add alert rules based on your business requirements. For more information, see Create an alert rule for a Prometheus instance.