How to monitor multicloud container clusters by using Alibaba Cloud registered clusters and Prometheus - Application Real-Time Monitoring Service

When Kubernetes clusters run across multiple cloud providers, monitoring becomes fragmented. Each cluster needs its own Prometheus server, Grafana instance, and AlertManager setup, making cross-cluster queries and unified alerting difficult. Managed Service for Prometheus and ACK One registered clusters solve this by providing a single monitoring stack for all your Kubernetes clusters, regardless of where they run.

Register external clusters (such as those on Tencent Cloud) through ACK One, install a lightweight Prometheus agent, and monitor all clusters from one place with unified metrics collection, Grafana dashboards, and alert management.

How it works

Register your external Kubernetes clusters (such as those running on Tencent Cloud) as ACK One registered clusters.
Install a lightweight Prometheus agent (ack-arms-prometheus) in each registered cluster.
The agent collects metrics and sends them to Managed Service for Prometheus through remote write.
Query, visualize, and alert on metrics from all clusters through a single Managed Service for Prometheus instance.

Managed Service for Prometheus vs. self-managed Prometheus

Running self-managed Prometheus across multiple cloud providers introduces several operational challenges:

Single-process architecture. Open source Prometheus runs as a single process with no horizontal scaling. During traffic spikes, metric collection bottlenecks.
Limited storage. The built-in time series database (TSDB) uses local SSD storage with data stored on single nodes, risking data loss.
Fragmented monitoring. Each cluster requires its own Prometheus server, Grafana instance, and AlertManager. Cross-cluster queries require additional components such as Thanos.
High maintenance overhead. Building and maintaining every layer of the monitoring stack -- collection, storage, visualization, and alerting -- requires significant engineering effort.

Managed Service for Prometheus addresses these challenges with a fully hosted monitoring stack that includes built-in high availability, unlimited cloud storage, and cross-cluster query capabilities.

Performance comparison

Capability	Managed Service for Prometheus	Self-managed Prometheus
High availability	Horizontal scaling with multiple replicas for collection and storage	Single process, no horizontal scaling
Storage capacity	Unlimited cloud-based storage	Limited by local disk
Collection throughput (single replica: 2-core CPU, 4 GB memory)	6 million data points	1 million data points
Query performance (600 million time points)	8-10 seconds	180 seconds
Visualization	Built-in Grafana with pre-configured dashboards	Manual Grafana deployment and dashboard configuration
Alerting	Integrated Alert Management with templates and escalation policies	Manual AlertManager plug-in installation
Advanced features	Pre-aggregation, downsampling, GlobalView	Not available

Resource efficiency

The Prometheus agent is lightweight. A cluster with 2 cores and 4 GB of memory can collect up to 6 million metrics. The agent supports auto scaling and uses an optimized service discovery module that reduces pressure on the Kubernetes API Server compared to open source Prometheus.

Pricing

Tier	Description
Free	Basic-metric collection at no cost, covering standard Kubernetes monitoring needs
Pay-as-you-go	Suitable for small-scale clusters. For details, see Pay-as-you-go
Subscription	For large-scale clusters. Reduces monitoring costs by about 67% compared to pay-as-you-go

Aggregated multi-cluster query

Managed Service for Prometheus provides a virtual aggregation instance that spans multiple Prometheus instances and self-managed Prometheus clusters. The aggregation instance works out of the box with no additional component deployment required.

With this aggregation instance, you can:

Run unified cross-cluster queries without deploying Prometheus Server or Thanos components in each region. Report data through remote write instead.
Use centralized Grafana data sources instead of configuring separate data source addresses per region.
Scale queries elastically. Horizontal and vertical scaling for query workloads at any time.

Built-in Grafana integration

Managed Service for Grafana is a cloud-native visualization platform integrated by default with Managed Service for Prometheus and Simple Log Service.

Capability	Description
Pre-integrated data sources	Alibaba Cloud services such as elastic computing and database services are connected by default. Third-party and custom data sources are also supported
Managed infrastructure	Exclusive instances with Service Level Agreement (SLA) assurance and high availability, eliminating the need to maintain Grafana yourself
Access control	Alibaba Cloud Single Sign-On (SSO) and self-managed account systems for fine-grained management of data sources and dashboards
Unified dashboards	A single dashboard system across all data sources for streamlined operations
Unified alerting	Build an integrated alerting system across data sources to improve alert management efficiency

Alert Management integration

Managed Service for Prometheus is integrated with Alert Management by default. Alert Management provides:

Pre-built alert templates with common Kubernetes core metrics. Configure alerts without writing PromQL.
Multiple notification channels: email, SMS, phone call, and DingTalk.
Alert processing: deduplication, compression, denoising, and silencing to reduce alert storms.
Escalation policies that automatically re-notify contacts if alerts remain unhandled.
Event processing flows for custom alert handling logic.
Global alert rules that apply templates and notification policies across all regions.
Visual configuration wizard with real-time preview for alert conditions.

Tutorial: Set up multi-cloud monitoring

This tutorial walks through registering a Tencent Kubernetes Engine (TKE) cluster with ACK One and monitoring it through Managed Service for Prometheus. The same process applies to clusters on other cloud providers.

Prerequisites

Before you begin, make sure that:

The external cluster can connect to Alibaba Cloud over the Internet or an internal network

For connectivity requirements, see FAQ about registered clusters.

Step 1: Create an ACK One registered cluster

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click Create Kubernetes Cluster.
Click the ACK One Registered Cluster tab, configure the required parameters, and then click Create Cluster. For parameter details, see Register an external Kubernetes cluster.

After creation, the registered cluster appears on the Clusters page.

Step 2: Register the external cluster

On the Clusters page of the ACK console, find the registered cluster created in Step 1 and click Details in the Actions column.
Click the Connection Information tab. Click Obtain Temporary Kubeconfig or Obtain Long-term Kubeconfig based on your needs. On the Internal Access or Public Access tab, click Copy to copy the cluster credentials.
Log on to the Tencent Cloud TKE console. On the Clusters page, click the name of your TKE cluster. In the upper-right corner, click Create Resource in YAML. Paste the cluster credentials into the editor and click OK.
Check the running status of Deployment and ack-cluster-agent on the TKE Clusters page. If both are running, the installation is successful.
Return to the ACK console. On the Clusters page, confirm the registered cluster status is Running. A Running status indicates the TKE cluster is successfully registered.

Step 3: Install the Prometheus agent

Install the ack-arms-prometheus agent in your registered cluster. For detailed instructions, see Enable Managed Service for Prometheus for a registered cluster.

Step 4: View monitoring dashboards

Managed Service for Prometheus includes pre-configured Grafana dashboards such as Deployment and DaemonSet dashboards.

Log on to the ARMS console. In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the Prometheus instance associated with the registered cluster created in Step 1.
In the left-side navigation pane, click Dashboards. Click any dashboard name to view its metrics.

Step 5: Configure and view alerts

Managed Service for Prometheus automatically monitors Kubernetes core metrics and includes built-in alert templates. No PromQL is required for common alert rules.

Log on to the ARMS console. In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the Prometheus instance associated with the registered cluster created in Step 1.
In the left-side navigation pane, click Alert Rules. On the Prometheus Alert Rules page, review and customize alert rules based on your requirements.

Get started

ACK One registered clusters: To activate an ACK One registered cluster, see Register an external Kubernetes cluster.
Managed Service for Prometheus: Subscription pricing saves at least 67% compared to pay-as-you-go. See Pay-as-you-go for pricing details.