All Products
Search
Document Center

Application Real-Time Monitoring Service:HPA for Prometheus agents

Last Updated:Mar 11, 2026

When a Prometheus agent lacks enough replicas to handle the scrape workload, it runs out of memory and restarts repeatedly, which causes delayed or lost monitoring data. Horizontal Pod Autoscaling (HPA) automatically adjusts the number of agent replicas based on your business requirements to prevent these failures.

How it works

After a Prometheus agent starts, it captures targets to obtain the number of time series, and then calculates the required number of replicas based on the collection capability of each replica. If multiple replicas are required by data collection, HPA transitions the agent from single-replica mode to multi-replica mode and distributes target collection across worker replicas.

Single-replica mode

The master replica handles both target discovery and metric collection. HPA switches to multi-replica mode when either condition is met:

  • Memory usage of the master replica exceeds 75%.

  • A sudden surge in targets causes an out of memory (OOM) error.

Multi-replica mode

After the transition, responsibilities split between replica types:

Replica typeResponsibility
MasterDiscovers targets only.
WorkerCollects metrics from assigned targets.

When any worker replica's memory usage exceeds 60%, HPA reassigns targets across workers and adds more worker replicas to rebalance the load.

Scheduling limits

The multi-factor collaborative scheduling algorithm enforces these upper bounds:

LimitValue
Maximum targets per round x total metrics4 billion
Maximum memory usage per agent70%
Maximum metrics per agent4,000,000

Prerequisites

Before you begin, make sure that:

Scaling behavior

BehaviorDetail
Maximum replicas30 (default cap). Automatic scale-out does not exceed this limit.
Automatic scale-inNot supported. Removing replicas during active collection can cause data loss. Reduce the replica count manually through the console.

Adjust the replica count

  1. Log on to the ARMS console.

  2. On the Instances page, click the name of the target Prometheus instance.

  3. In the left-side navigation pane, click Settings.

  4. On the Settings tab, click Replicas in the Actions column.

  5. In the dialog box, specify the desired number of replicas and click OK.

    Replicas dialog

Verify the replica count

After you adjust the replica count, confirm that the change took effect and that monitoring continues to work correctly.

  1. Log on to the ARMS console.

  2. On the Instances page, click the name of the target Prometheus instance.

  3. In the left-side navigation pane, click Dashboards, then open the Prometheus Agent dashboard.

  4. On the dashboard, you can view the running status of the Prometheus agent, time consumed to capture real-time and historical metrics, number of targets captured, amount of data sent, and resource usage. For a detailed explanation of each metric, see Self-monitoring dashboard of the Prometheus agent.

See also