All Products
Search
Document Center

Managed Service for Prometheus:Host observability

Last Updated:Jun 02, 2026

Managed Service for Prometheus collects OS, process, and custom metrics from Linux and Windows ECS hosts using node-exporter, process-exporter, and text file scraping.

Prerequisites

Benefits of host monitoring

Host monitoring provides an automated observability solution for Alibaba Cloud ECS instances, covering host discovery, agent installation, metric scraping, and alerting.

Supports Alibaba Cloud ECS instances, self-managed data center servers, and third-party cloud servers. For ECS instances, the service auto-installs exporters and generates scrape configurations. A managed Prometheus agent handles metric scraping, storage, visualization, and alerting. For non-Alibaba Cloud hosts without automatic discovery support, manually install the Alibaba Cloud agent to push data.

Benefit

Description

Near real-time host discovery

  • Adaptability: Auto-discovery detects dynamic changes in cloud resources, ensuring all running instances are promptly monitored.

  • Versatility: Supports multiple discovery types, including Kubernetes service discovery and integration with other cloud services.

Near real-time agent installation

  • Plug-and-play: Automated exporter installation lets the system recognize new nodes and collect metrics without manual intervention.

  • Comprehensive monitoring: Includes node-exporter, process-exporter, GPU-exporter, and middleware exporters for full-stack performance tracking.

Near real-time metric scraping

  • Simplified configuration: Automated config generation reduces manual work and ensures accurate metric scraping across all nodes and services.

  • Flexibility: Adjustable configurations scale to complex monitoring environments.

A host is integrated within 30–60 seconds of creation. Scrape intervals are adjustable from 1 to 60 seconds.

Serverless agent

  • Centralized management: A managed Prometheus agent unifies data collection and simplifies monitoring architecture. The collection pipeline is transparent to users.

  • High efficiency: Abstracted monitoring complexity reduces misconfiguration risk and improves data accuracy.

Intelligent metric labels

  • Automatically extracts tags, resource groups, and regions from ECS instances and injects them as metric labels.

  • Custom labels can be added for business, environment, and data source identification.

Large-scale data collection and storage

  • Supports large-scale integration with a hybrid model of dedicated and shared resources. Resources scale dynamically based on the number of integrated hosts.

  • The storage system handles massive metric volumes with high-performance querying.

Comprehensive upstream and downstream monitoring data

  • End-to-end observability requires integrating monitoring data across dimensions to reflect the health of the entire application and service ecosystem.

  • Covers the full stack from hardware to the application layer, including hosts, networks, dependent services, and external services like RDMA networks, OSS, and Redis.

Process-level monitoring

  • Tracks process performance and resource utilization on the OS, showing the health of applications running on the server.

  • Captures CPU usage, memory, disk I/O, start time, open file handles, and thread count per process. Near-real-time feedback enables quick issue identification.

  • Helps identify processes causing performance degradation, such as memory leaks, high CPU usage, or resource contention.

Expert-level Grafana dashboards included by default

  • Includes expert-curated Grafana dashboards: ECS Overview, ECS Detail, GPU Overview, GPU Detail, and Node Process.

  • One-click, out-of-the-box host monitoring experience.

Step 1: Integrate host monitoring data

  1. Log on to the ARMS console. In the left-side navigation pane, click Integration Center.

  2. On the Integration Center page, click Infrastructure in the left-side navigation pane, and then click Host Monitoring.

    Note
    • Managed Service for Prometheus uses Resource Center to discover VPCs and ECS instances. If Resource Center is not activated, the integration process guides you to activate it. Activate Resource Center.

    • Activating Resource Center is an asynchronous operation. If the status does not update, wait 10 to 20 seconds and click Redetect.

  3. In the panel that appears, select the target VPC and configure the Configuration Information as described in the following table.

    Parameter

    Description

    Node-exporter installation mode

    • Automatic Installation (Recommended): Installs node-exporter on selected ECS instances automatically. No manual steps required.

    • Self-installation: Install node-exporter yourself.

    Host discovery mode

    • Stain label selection: Blacklist mechanism. Instances with matching labels are excluded. By default, container monitoring service nodes are not scraped.

    • Unconditional: Install exporters and collect monitoring metrics from all ECS hosts in the current VPC.

    • Tag selection: Whitelist mechanism. Only instances with matching tags are integrated.

    • IP CIDR selection: Integrates instances whose IP falls within the specified CIDR block. Enter the VPC CIDR block to select all instances in the VPC.

    • Instance ID: Specify instance IDs, separated by commas (,).

    ECS stain label

    Each stain label is a key-value pair. You can set multiple labels.

    Collect textfile

    Whether to scrape Prometheus metrics from a specified file.

    Collect process status metrics

    Collects process monitoring data by default.

    Node-exporter service port

    The default port is 9100.

    Metric scrape interval (seconds)

    Scrape interval. Default: 15 seconds.

    Automatically configure security groups

    Enabled by default.

    Custom ECS tag injection

    Specify an ECS tag key. The system injects the tag key-value pair into Prometheus metrics as labels.

  4. Click OK. Wait 1 to 2 minutes for the integration to complete.

Note

If no data appears on the dashboard after integration, verify that the ECS security group inbound rules allow the 100.64.0.0/10 and 192.168.0.0/18 CIDR blocks to access ports 9100 and 9256 (default ports for node-exporter and process-exporter). View security group rules. If you use different ports, adjust the rules accordingly.

Step 2: View monitoring dashboards

  1. In the left-side navigation pane, click Integration Management.

  2. On the Integration Management page, click the Integrated Environments tab, and select ECS Environment.

  3. In the ECS Environment list, click the name of the target environment to open its details page.

  4. On the Component Management tab, in the Addon Type section, click Dashboard to view the built-in Grafana dashboards.

Step 3: Configure alerts

  1. Log on to the Managed Service for Prometheus console. In the left-side navigation pane, click Integration Management.

  2. On the Integration Management page, click the Integrated Environments tab, and select ECS Environment.

  3. In the ECS Environment list, click the name of the target environment to open its details page.

  4. On the Component Management tab, in the Addon Type section, click Alarm Rules to view the built-in alert rules.

Note
  • Built-in alert rules generate events but do not send notifications. To send notifications, click Edit to configure a notification method. You can also customize thresholds, duration, and content. Create a Prometheus alert rule.

  • In Simple Mode, you can set the notification recipients, notification period, and repeat policy.

Grafana dashboard examples

ECS Overview dashboard

image.png

ECS Detail dashboard

image.png