All Products
Search
Document Center

Managed Service for Prometheus:Use ACK One registered clusters and Managed Service for Prometheus to implement multi-cloud Kubernetes cluster monitoring

Last Updated:Jan 22, 2025

This topic describes the current situation of Kubernetes cluster observability, the challenges for the observability of multi-cloud Kubernetes clusters, and the solutions to these challenges. This topic also provides an example to describe how to use Alibaba Cloud Managed Service for Prometheus and ACK One registered clusters to manage and monitor multi-cloud Kubernetes clusters.

Current situation of Kubernetes cluster observability

As a container management and orchestration tool, Kubernetes has become a common technical base in the cloud computing industry. At the same time, Prometheus has been serving as a standard solution for Kubernetes cluster monitoring after it is proven in the iteration of various solutions.

Prometheus collects and stores metrics from the monitoring system layer, application layer, and business layer. In addition, Prometheus uses Grafana to display metrics and deliver alert events. The combination of Prometheus and Grafana allows you to collect, store, display, and configure alerting for the monitoring metrics of Kubernetes clusters, helps you identify issues and analyze the causes of issues, and protects cloud-native applications. It has become a standard solution for Kubernetes cluster monitoring in the cloud computing industry.

You can use one of the following solutions to manage Kubernetes clusters:

Solution 1: Build a monitoring system

You can use Prometheus and Grafana to build a monitoring system for your production environment. In the early stage, you need to invest a large amount of labor cost, and focus on the collaboration among the parts of the monitoring system. For example, you need to monitor the metric collection, metric storage, metric display, dashboards, and alerts, including alert deduplication. In the later stage, huge O&M cost is incurred.

Solution 2: Use monitoring services provided by cloud service vendors

You can use a monitoring service provided by a cloud service vendor, such as Managed Service for Prometheus. Managed Service for Prometheus supports two billing methods: subscription and pay-as-you-go. This reduces the upfront cost of monitoring system setup and provides technical O&M support to reduce O&M cost.

Challenges for the observability of multi-cloud Kubernetes clusters

Enterprises are deploying more and more diversified and complex services on the cloud. In some scenarios, Kubernetes clusters may be used across cloud services or regions, and you must address the O&M challenges for multi-cloud Kubernetes clusters.

You can use one of the following solutions to monitor multi-cloud Kubernetes clusters:

Solution 1: Build a monitoring system based on a self-managed Prometheus system and Grafana

If you use the solution, you will face the following challenges:

  • To build a self-built monitoring system, you must integrate functional modules such as collection, storage, display, and alerting in the early stage. In the later stage, you must assign more O&M personnel, which causes increased O&M cost.

  • The time series databases (TSDB) of open source Prometheus uses SSD storage. Data is separately stored in single sites, which may result in data loss.

  • Bottlenecks exist in the collection capabilities of open source Prometheus. Due to single-part operations, open source Prometheus does not support auto scaling. During peak hours, performance bottlenecks may arise from monitoring metric collection.

Solution 2: Use Prometheus services provided by cloud service vendors

If you use the solution, you will face the following challenges:

  • Multiple cloud service vendors: Different cloud service vendors provide different monitoring capabilities and access methods. This increases your learning cost.

  • Decentralized management: Different Prometheus services cannot be managed in a unified manner. This may lead to inefficient and chaotic management and duplicate O&M workloads. You may also be unable to identify business issues at the earliest opportunity.

In the preceding solutions, you cannot query or analyze scattered metrics or configure alerting for the metrics in a unified manner.

Benefits of Managed Service for Prometheus

To address the preceding challenges, ACK One registered clusters provide unified management capabilities for the Kubernetes clusters of third-party cloud service vendors. This helps you manage multi-cloud Kubernetes clusters in a unified manner. To address the preceding challenges, ACK One registered clusters provide unified management capabilities for the Kubernetes clusters of third-party cloud service vendors. This helps you manage multi-cloud Kubernetes clusters in a unified manner. Managed Service for Prometheus provides a complete Kubernetes cluster monitoring system with metric collection, Grafana display, and alerting capabilities. Managed Service for Prometheus supports the pay-as-you-go and subscription billing methods to improve the monitoring efficiency of Kubernetes clusters and reduce the O&M cost of user-created monitoring services.

The combination of ACK One registered clusters and Managed Service for Prometheus helps you address the preceding challenges and monitor multi-cloud Kubernetes clusters in a simple and efficient manner. Managed Service for Prometheus and ACK One registered clusters deliver the following benefits:

  • Powerful capabilities: The combination of Managed Service for Prometheus and ACK One registered clusters can resolve the issues that exist in multi-cloud Kubernetes cluster monitoring, such as scattered management, difficulty in monitoring system construction, low O&M efficiency, inability to jointly query metrics, and scattered alerting. You can implement unified management, configuration, query, and alerting for multi-cloud Kubernetes cluster monitoring with improved efficiency at low O&M cost. This way, O&M teams can focus on business without doing repetitive work.

  • Lower cost: Managed Service for Prometheus provides basic-metric collection for free to meet the basic monitoring requirements on Kubernetes clusters. For small-scale Kubernetes clusters, you can use the pay-as-you-go billing method to monitor your business at minimum cost. For information about the pay-as-you-go billing method of Managed Service for Prometheus, see Pay-as-you-go. For large-scale Kubernetes clusters, you can use the subscription billing method. Compared with the pay-as-you-go billing method, the subscription billing method can effectively reduce the monitoring cost of large-scale clusters by about 67%.

  • Less resource usage: If you use Managed Service for Prometheus, you only need to deploy a lightweight agent in your Kubernetes cluster. The agent provides the auto scaling capabilities. If your Kubernetes cluster has 2 cores and 4 GB of memory, you can collect 6 million metrics. To reduce the pressure that the service discovery module of open source Prometheus causes to the API Server of Kubernetes clusters, Managed Service for Prometheus has optimized the service discovery module. This minimizes resource usage, maximizes the collection of monitoring metrics from Kubernetes clusters, and protects your business. The service discovery module of open source Prometheus has a high pressure on the APIServer in the cluster. Managed Service for Prometheus has been optimized to reduce the pressure on the APIServer, minimize resource usage, maximize the collection of container cluster monitoring metrics, and protect your business.

Advantage 1: improved performance

Item

Managed Service for Prometheus

Self-managed Prometheus system

High availability

Managed Service for Prometheus provides high availability and supports horizontal scaling. You can deploy multiple replicas for the collection and storage components.

Self-managed Prometheus systems provide low availability and does not support horizontal scaling. You can run only one process at a time.

Data storage

Cloud-based storage has unlimited storage capacity.

The storage capacity is limited.

Data visualization

Grafana is built into the Application Real-Time Monitoring Service (ARMS) console and common monitoring templates are available out of the box.

You must deploy Grafana and configure dashboards on your own.

Alerting

Managed Service for Prometheus is integrated with the Alert Management sub-service of ARMS to improve alert efficiency and accuracy.

You must install the AlertManager plug-in on your own.

Collection performance of a single replica (2-core CPU, 4 GB of memory)

6 million data points

1 million data points

Data query performance (600 million time points)

8 to 10 seconds

180 seconds

Other capabilities

Managed Service for Prometheus provides pre-aggregation, downsampling, and GlobalView capabilities.

Not supported

Advantage 2: aggregated Prometheus multi-cluster query

ARMS provides a virtual aggregation instance for multiple Prometheus instances or self-managed Prometheus clusters. The virtual aggregation instance can be used to query Prometheus metrics, manage Grafana data sources, and manage alerts in a unified manner.

  • To manage the scattered data of open source Prometheus, Managed Service for Prometheus allows you to configure multiple data source addresses in Grafana. Otherwise, the running status of applications in different regions around the world is difficult to be analyzed from an overall perspective due to the isolation of data sources.

  • You do not need to deploy Prometheus Server in each region or deploy a large number of Thanos components. You only need to use Remote Write to report data to Managed Service for Prometheus.

  • Managed Service for Prometheus provides global, distributed, stable, and high-performance query capabilities. Horizontal and vertical scaling can be implemented at any time for a large number of queries.

  • Aggregated Prometheus multi-cluster query can be implemented out of the box. You do not need to deploy any components in addition to Managed Service for Prometheus. This helps you reduce O&M cost.

Advantage 3: lightweight installation

Compared with open source Prometheus, Managed Service for Prometheus is easy to be deployed. You only need to install a lightweight agent in your Kubernetes cluster. Backend storage can be hosted to save the cluster resource usage of business.

Advantage 4: integration of Managed Service for Grafana

Alibaba Cloud Managed Service for Grafana is a cloud-native O&M data visualization platform that provides maintenance-free and quick startup capabilities. Managed Service for Grafana provides the following benefits:

  • By default, the data sources of various Alibaba Cloud services, such as Managed Service for Prometheus and Simple Log Service are integrated. Third-party data sources or user-created data sources are supported. This allows you to quickly build integrated O&M dashboards.

  • Managed Service for Grafana provides exclusive instances, service-level agreement (SLA) assurance, and reliable O&M. Managed Service for Grafana also ensures high availability and elasticity of the monitoring system at lower maintenance cost.

  • Managed Service for Grafana supports Alibaba Cloud Single Sign-On (SSO) and self-managed account systems to implement fine-grained management of data sources and dashboards without compromising data security.

  • Managed Service for Grafana can resolve the following issues:

    • Difficulty in data aggregation: The monitoring data of various cloud services is difficult to be aggregated and unified, which increases the difficulty of O&M.

    • Difficulty in O&M: The core metrics in the monitoring charts of various cloud services must be repeatedly configured.

    • Difficulty in alert management: The alert rules of various cloud services are scattered and difficult to be managed.

  • Managed Service for Grafana can provide the following capabilities:

    • Default integrations: Managed Service for Grafana is integrated with key Alibaba Cloud services, such as elastic computing services and database services by default.

    • Unified dashboards: A unified dashboard system is established across data sources to optimize visualized O&M.

    • Unified alerting: You can easily build an integrated alerting system to improve the efficiency of alert management.

Advantage 5: integration of Alert Management

By default, Managed Service for Prometheus is integrated with Alert Management. Alert Management has the following features:

  • Globalization

    • You can globalize alert rule templates to configure alerting for global events.

    • You can globalize contacts and notification policies by configuring simple settings.

  • Event collection with higher management efficiency

    • You can integrate Alert Management with common monitoring services of Alibaba Cloud. You can also integrate Alert Management with third-party monitoring services for centralized management.

    • Alert Management provides stable alert event handling capabilities. You can handle alert events 24/7.

    • Alert Management ensures low latency for handling a large number of alert events.

  • Timely and accurate alert notifications

    • You can configure notification policies and compress alert events. This reduces the O&M workloads.

    • You can select one or more notification methods based on the urgency of an alert. For example, you can send alert notifications to contacts by email, SMS, phone call, or DingTalk to remind the contacts to handle the alert.

    • You can configure an escalation policy to send notifications to contacts multiple times if an alert remains unhandled for a long period of time.

  • Efficient alert management

    • Contacts can use DingTalk to handle alerts anytime.

    • Alerts use a common format, which allows contacts to better analyze alerts.

    • Multiple contacts can work together through DingTalk to handle alerts.

  • Alert event reprocessing

    • You can use event processing flows to orchestrate simple procedures and process alert events that are reported by an alert source. This meets your specific requirements on event handling in various scenarios.

    • You can deduplicate, compress, denoise, and silence alerts that are reported by an alert source. This converges alerts and reduces alert storms.

  • Alert configuration management

    • Alert Management provides monitoring templates that contain common core metrics of Kubernetes clusters. Alert Management also provides the alert template feature to automatically generate and send alert templates. This way, you can configure multiple alerts at a time.

    • Alert Management provides a visualized alert configuration wizard and preview. You can view and precisely configure alert conditions and events in real time.

  • You can view alerting statistics, analyze alert handling results in real time, improve alert handling efficiency, and monitor business status.

Example: Monitor a multi-cloud Kubernetes cluster Managed Service for Prometheus

Prerequisites

The cluster is connected to Alibaba Cloud over the Internet or an internal network. For more information, see FAQ about registered clusters.

Step 1: Create an ACK One registered cluster

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click Create Kubernetes Cluster.

  3. In the upper-right corner of the Clusters page, click Create Kubernetes Cluster.

  4. On the page that appears, click the ACK One Registered Cluster tab, set the required parameters, and click Create Cluster. For more information, see Register an external Kubernetes cluster.

    You can view the registered cluster on the Clusters page.集群列表

Step 2: Manage a multi-cloud Kubernetes cluster in the ACK One registered cluster

In this example, Tencent Kubernetes Engine (TKE) is used to describe how to manage a TKE cluster in an ACK One registered cluster, and capture and display metrics in Managed Service for Prometheus.

  1. On the Clusters page, find the registered cluster that you created in Step 1: Create an ACK One registered cluster and click Details in the Actions column.

  2. Click the Connection Information tab. On the Public Access tab, view the cluster credentials for connecting to the cluster over the Internet and click Copy.

  3. Log on to the Tencent Cloud TKE console. On the Clusters page, click the name of the TKE cluster. In the upper-right corner of the page, click Create Resource in YAML. In the dialog box that appears, paste the cluster credentials that you copied in the previous step to the editor, and click OK. Then, check the running status of Deployment and ack-cluster-agent on the Clusters page. If Deployment and ack-cluster-agent are running as expected, the installation is successful.

  4. Log on to the ACK console. On the Clusters page, check the status of the ACK One registered cluster that you created in Step 1: Create an ACK One registered cluster. If the ACK One registered cluster is in the Running state, the TKE cluster is managed.

Step 3: Install the Prometheus agent (ack-arms-prometheus)

For more information, see Enable Managed Service for Prometheus for a registered cluster.

Step 4: View monitoring data

By default, Managed Service for Prometheus is integrated with Grafana dashboards to allow you to view monitoring data, such as the Deployment dashboard and DaemonSet dashboard. You can perform the following steps to view monitoring data on dashboards:

  1. Log on to the Prometheus console. In the left-side navigation pane, click Instances.

  2. Click the Prometheus instance that monitors the ACK One registered cluster created in Step 1: Create an ACK One registered cluster.

  3. In the left-side navigation pane, click Dashboards. On the Dashboards page, you can click the name of a dashboard to view detailed metrics.

Step 5: View alerts

By default, Managed Service for Prometheus enables the monitoring of core metrics for Kubernetes clusters. This prevents errors that may occur if you manually enable Managed Service for Prometheus. In addition, Managed Service for Prometheus is integrated with a variety of alert templates with core metrics. You can use these alert templates based on your business requirements without the need to write PromQL code. To view alerts, perform the following steps:

  1. Log on to the Prometheus console. In the left-side navigation pane, click Instances.

  2. Click the Prometheus instance that monitors the ACK One registered cluster created in Step 1: Create an ACK One registered cluster.

  3. In the left-side navigation pane, click Alert Rules. On the Prometheus Alert Rules page, view the alerts.

Activation

  • ACK One registered clusters: For information about how to activate an ACK One registered cluster, see Register an external Kubernetes cluster.

  • Managed Service for Prometheus: Managed Service for Prometheus provides the subscription billing method. Compared with the pay-as-you-go billing method, the subscription billing method saves at least 67% of your cost.