Host monitoring for ECS with Managed Service for Prometheus - Managed Service for Prometheus

Prerequisites

Managed Service for Prometheus is activated. Instance billing.
An ECS instance is created. Create and manage an ECS instance by using the ECS console (express version).
Resource Center is activated. Activate Resource Center.

Benefits of host monitoring

Host monitoring provides an automated observability solution for Alibaba Cloud ECS instances, covering host discovery, agent installation, metric scraping, and alerting.

Supports Alibaba Cloud ECS instances, self-managed data center servers, and third-party cloud servers. For ECS instances, the service auto-installs exporters and generates scrape configurations. A managed Prometheus agent handles metric scraping, storage, visualization, and alerting. For non-Alibaba Cloud hosts without automatic discovery support, manually install the Alibaba Cloud agent to push data.

Benefit	Description
Near real-time host discovery	Adaptability: Auto-discovery detects dynamic changes in cloud resources, ensuring all running instances are promptly monitored. Versatility: Supports multiple discovery types, including Kubernetes service discovery and integration with other cloud services.
Near real-time agent installation	Plug-and-play: Automated exporter installation lets the system recognize new nodes and collect metrics without manual intervention. Comprehensive monitoring: Includes node-exporter, process-exporter, GPU-exporter, and middleware exporters for full-stack performance tracking.
Near real-time metric scraping	Simplified configuration: Automated config generation reduces manual work and ensures accurate metric scraping across all nodes and services. Flexibility: Adjustable configurations scale to complex monitoring environments. A host is integrated within 30–60 seconds of creation. Scrape intervals are adjustable from 1 to 60 seconds.
Serverless agent	Centralized management: A managed Prometheus agent unifies data collection and simplifies monitoring architecture. The collection pipeline is transparent to users. High efficiency: Abstracted monitoring complexity reduces misconfiguration risk and improves data accuracy.
Intelligent metric labels	Automatically extracts tags, resource groups, and regions from ECS instances and injects them as metric labels. Custom labels can be added for business, environment, and data source identification.
Large-scale data collection and storage	Supports large-scale integration with a hybrid model of dedicated and shared resources. Resources scale dynamically based on the number of integrated hosts. The storage system handles massive metric volumes with high-performance querying.
Comprehensive upstream and downstream monitoring data	End-to-end observability requires integrating monitoring data across dimensions to reflect the health of the entire application and service ecosystem. Covers the full stack from hardware to the application layer, including hosts, networks, dependent services, and external services like RDMA networks, OSS, and Redis.
Process-level monitoring	Tracks process performance and resource utilization on the OS, showing the health of applications running on the server. Captures CPU usage, memory, disk I/O, start time, open file handles, and thread count per process. Near-real-time feedback enables quick issue identification. Helps identify processes causing performance degradation, such as memory leaks, high CPU usage, or resource contention.
Expert-level Grafana dashboards included by default	Includes expert-curated Grafana dashboards: ECS Overview, ECS Detail, GPU Overview, GPU Detail, and Node Process. One-click, out-of-the-box host monitoring experience.

Step 1: Integrate host monitoring data

Log on to the ARMS console. In the left-side navigation pane, click Integration Center.
On the Integration Center page, click Infrastructure in the left-side navigation pane, and then click Host Monitoring.
Note
- Managed Service for Prometheus uses Resource Center to discover VPCs and ECS instances. If Resource Center is not activated, the integration process guides you to activate it. Activate Resource Center.
- Activating Resource Center is an asynchronous operation. If the status does not update, wait 10 to 20 seconds and click Redetect.

In the panel that appears, select the target VPC and configure the Configuration Information as described in the following table.

Parameter	Description
Node-exporter installation mode	Automatic Installation (Recommended): Installs node-exporter on selected ECS instances automatically. No manual steps required. Self-installation: Install node-exporter yourself.
Host discovery mode	Stain label selection: Blacklist mechanism. Instances with matching labels are excluded. By default, container monitoring service nodes are not scraped. Unconditional: Install exporters and collect monitoring metrics from all ECS hosts in the current VPC. Tag selection: Whitelist mechanism. Only instances with matching tags are integrated. IP CIDR selection: Integrates instances whose IP falls within the specified CIDR block. Enter the VPC CIDR block to select all instances in the VPC. Instance ID: Specify instance IDs, separated by commas (,).




ECS stain label	Each stain label is a key-value pair. You can set multiple labels.
Collect textfile	Whether to scrape Prometheus metrics from a specified file.
Collect process status metrics	Collects process monitoring data by default.
Node-exporter service port	The default port is 9100.
Metric scrape interval (seconds)	Scrape interval. Default: 15 seconds.
Automatically configure security groups	Enabled by default.
Custom ECS tag injection	Specify an ECS tag key. The system injects the tag key-value pair into Prometheus metrics as labels.

Click OK. Wait 1 to 2 minutes for the integration to complete.

Note

If no data appears on the dashboard after integration, verify that the ECS security group inbound rules allow the 100.64.0.0/10 and 192.168.0.0/18 CIDR blocks to access ports 9100 and 9256 (default ports for node-exporter and process-exporter). View security group rules. If you use different ports, adjust the rules accordingly.

Step 2: View monitoring dashboards

In the left-side navigation pane, click Integration Management.
On the Integration Management page, click the Integrated Environments tab, and select ECS Environment.
In the ECS Environment list, click the name of the target environment to open its details page.
On the Component Management tab, in the Addon Type section, click Dashboard to view the built-in Grafana dashboards.

Step 3: Configure alerts

Log on to the Managed Service for Prometheus console. In the left-side navigation pane, click Integration Management.
On the Integration Management page, click the Integrated Environments tab, and select ECS Environment.
In the ECS Environment list, click the name of the target environment to open its details page.
On the Component Management tab, in the Addon Type section, click Alarm Rules to view the built-in alert rules.

Note

Built-in alert rules generate events but do not send notifications. To send notifications, click Edit to configure a notification method. You can also customize thresholds, duration, and content. Create a Prometheus alert rule.
In Simple Mode, you can set the notification recipients, notification period, and repeat policy.

Managed Service for Prometheus:Host observability

Prerequisites

Benefits of host monitoring

Step 1: Integrate host monitoring data

Step 2: View monitoring dashboards

Step 3: Configure alerts

Grafana dashboard examples

ECS Overview dashboard

ECS Detail dashboard