All Products
Search
Document Center

Managed Service for Prometheus:HPA for Prometheus agents

Last Updated:Mar 11, 2026

When scrape targets grow faster than agent capacity, replicas run out of memory (OOM), restart, and cause data latency or data loss. Horizontal Pod Autoscaler (HPA) for Managed Service for Prometheus agents prevents this by automatically adding replicas based on real-time memory and workload metrics.

How it works

A Prometheus agent starts in single-replica mode and switches to multi-replica mode when the workload exceeds what one replica can handle. HPA then scales the number of worker replicas to keep memory pressure within safe thresholds.

Single-replica mode

The master replica handles both target service discovery and target scraping. When its memory usage reaches 75%, the agent automatically switches to multi-replica mode.

Note: If a single scrape job is too large, it can cause an OOM error on the master replica before the switch occurs.

Multi-replica mode

After the switch, the master handles only target service discovery while worker replicas handle target scraping.

When any worker's memory usage exceeds 60%, the agent:

  1. Reassigns scrape jobs across workers.

  2. Calculates the number of worker replicas needed to keep average memory usage at or below 60%.

  3. Triggers HPA to scale out.

Scheduling limits

The multi-factor collaborative scheduling algorithm enforces the following limits per agent per scheduling round:

LimitThreshold
Total targets x total metrics4 billion
Memory usage70%
Maximum metrics per agent4,000,000

Scaling boundaries

  • Maximum replicas: 30 scrape replicas per agent (default).

  • No automatic scale-in: The agent does not reduce replicas automatically, because scaling in can cause data loss.

Enable HPA

Upgrade the Prometheus Helm chart to version 1.0.0 or later. HPA is enabled automatically after the upgrade -- no additional configuration is required.

For upgrade instructions, see .