Managed Service for Prometheus

Multi-source metrics are aggregated to monitor the status of your business and services in real time.

A Complete Metric Monitoring System in the Cloud

Managed Service for Prometheus is a cloud-native metric monitoring platform that is compatible with the open source Prometheus ecosystem. Managed Service for Prometheus provides an all-in-one metric monitoring platform for different roles such as O&M engineers and site reliability engineers (SREs). It also provides monitoring and alerting services for various scenarios, such as custom business monitoring, application component monitoring, cloud service monitoring, container monitoring, and system monitoring.

Better Performance

Compared with open source Prometheus, Managed Service for Prometheus improves the performance of data collection by 20 times. Managed Service for Prometheus supports data queries of long time intervals within seconds and globally aggregate queries.


Managed Service for Prometheus provides fully managed services that reduce O&M costs by 90%. You are charged based on the number of metrics, and a large number of free metrics are provided.

Higher Availability

Multiple replicas can be deployed to support the scale-out of data collection components. Distributed storage in the cloud ensures data reliability.


Specialized Containers for Full-stack Monitoring and Integrated Metric Monitoring Dashboards

Covers business monitoring, application component monitoring, cloud service monitoring, container monitoring, and system monitoring.

Metrics for the System Layer

Provides basic metrics such as CPU utilization, memory usage, disk usage, and I/O.

Multi-cloud and Multi-cluster Metrics

Provides metrics for Alibaba Cloud clusters such as Kubernetes clusters and Elastic Compute Service (ECS) clusters, and hybrid cloud clusters.

Metrics for the Container Layer

Deeply integrates with Alibaba Cloud Container Service for Kubernetes (ACK), and provides basic metrics free of charge.

Metrics for Cloud Services

Provides metrics for cloud services such as databases, message queues, microservices, and storage.

Metrics for the Self-managed Application Component Layer

Provides metrics for self-managed application components such as middleware, databases, and message queues.

Metrics for Application Performance

Deeply integrates with the application monitoring and browser monitoring features of Application Real-Time Monitoring Service (ARMS) and complies with the OpenTelemetry standard.

Custom Business Metrics

Supports custom metrics such as the transaction rate, order placement rate, and click-through rate.

Improved Grafana Dashboards

Integrates various dashboards and alerts with a few clicks.

Preset Dashboard Templates for Common Cloud Services and Application Components

Provides data source configurations and preset dashboards for cloud services to display monitoring data in a unified manner.

Custom Extension of Grafana Plug-ins

Allows you to add Grafana plug-ins, visual templates, and data sources to meet personalized O&M and monitoring requirements.

Prometheus Dashboard Templates for Mainstream Cloud Services

Provides Prometheus Grafana dashboards for mainstream cloud services, such as ACK and Message Queue for Apache Kafka, to help O&M teams perform finer-grained metric monitoring.

More Secure Permission Management Systems

Supports Alibaba Cloud single sign-on (SSO) and Resource Access Management (RAM), and allows you to access services by using Alibaba Cloud accounts and as RAM users.

Enhanced Intelligent Alert Management

Improves the troubleshooting collaboration efficiency and alert accuracy.

Multi-channel Data Push

Supports more than 10 notification methods, such as Slack, DingTalk, emails, text messages, and webhooks, for timely message notifications.

Improved Alert Collaboration Efficiency

Allows you to collaborate on alert handling, trace alerts, and collect statistics on alerts by using the alert management, alert recording, and alert history features based on DingTalk-based ChatOps.

Intelligent Noise Reduction

Provides algorithm capabilities, such as exception check and intelligent noise reduction, to effectively reduce invalid alerts.

Root Cause Analysis of Alerts

Helps you analyze the root causes of alerts based on the application context to improve the troubleshooting efficiency.

Globally Aggregate Queries of Multiple Instances

Provides capabilities such as globally aggregate queries, unified Grafana data sources, and centralized alerting.

Comparison between Alibaba Cloud Managed Service for Prometheus and open source Prometheus

Item Alibaba Cloud Managed Service for Prometheus Self-managed Prometheus
Resource purchase and system deployment Resources are fully managed by Alibaba Cloud. You must purchase resources and deploy systems on your own.
O&M costs No O&M operations are required. Routine O&M operations are required.
High availability Multiple replicas can be deployed to support the scale-out of data collection and storage components. This provides high availability. You can run only one process at a time. Scale-out is not supported for data collection and storage components.
Data access You can integrate common cloud services, mainstream application components such as databases and middleware, and applications that are developed by using mainstream programming languages such as Java and Go. You can monitor middleware hosted on ECS clusters without the need to install an agent. To collect the data of a component, you must create an exporter for the component.
Data storage Cloud-based storage has unlimited storage capacity. The storage capacity is limited.
Data visualization Grafana is integrated into the console, and all kinds of common monitoring templates are available out of the box. You must deploy Grafana and configure dashboards on your own.
Alert management The alert center of ARMS is integrated with Managed Service for Prometheus to improve the alert efficiency and accuracy. You must install the Alertmanager plug-in on your own.
Collection performance of a single replica (2 CPU cores, 4 GB memory) 6 million data points. 1 million data points.
Data query performance (0.6 billion points in time) 8 to 10 seconds. 180 seconds.
Security management Managed Service for Prometheus is integrated with Alibaba Cloud security capabilities to meet the security and compliance requirements of enterprises. Not supported.
Other capabilities Managed Service for Prometheus supports pre-aggregation and downsampling capabilities. Not supported.


Metrics for Cloud Services

Managed Service for Prometheus monitors cloud services, displays monitoring data on dashboards, supports data queries, and provides alerting services. This allows O&M teams to perform routine O&M operations in a convenient manner. Managed Service for Prometheus supports more than 50 cloud services and will support more cloud services in the future.

Metrics for the Container Layer

Managed Service for Prometheus is deeply integrated with ACK. You can enable Managed Service for Prometheus as a feature when you are configuring an ACK cluster. Managed Service for Prometheus provides containers with components for workload performance monitoring, Ingress performance monitoring, and CoreDNS performance monitoring. It also provides dedicated dashboards to monitor the metrics of container services, application components, cloud services, and applications, and displays monitoring data based on cloud resource tags.

Monitored Objects

Metrics for Application Performance

Managed Service for Prometheus is integrated with ARMS. It provides preset dashboards for application performance metrics and allows you to monitor the metrics with simple configurations. It also allows you to collect application performance data by using service discovery configurations.

Monitored Objects

Metrics for Multi-cloud Application Components

Managed Service for Prometheus provides solutions to install exporters for third-party clouds and data centers to meet their metric monitoring requirements.

Custom Business Metrics

Managed Service for Prometheus allows you to customize business objective metrics, service quality metrics, technical component metrics, and basic resource metrics based on your business requirements. After data capture tasks are configured, all metrics are centrally displayed to improve data relevance.

phone Contact Us