All Products
Search
Document Center

Application Real-Time Monitoring Service:HPA of the Prometheus agent

Last Updated:Jan 31, 2024

If a Prometheus agent does not have sufficient replicas, continuous memory overflows and restarts may occur, which may cause data delay or loss. Managed Service for Prometheus provides the Horizontal Pod Autoscaling (HPA) feature that automatically adjusts the number of agent replicas based on your business requirements.

HPA policy

After a Prometheus agent is started, it captures targets to obtain the number of timelines, and then calculates the required number of replicas based on the collection capability of each replica. If multiple replicas are required by data collection, the number of replicas is automatically increased. When the HPA feature is used, the following policy applies:

  • When a single replica of a Prometheus agent runs, the master replica needs to discover and capture targets. If the memory usage of the master replica exceeds 75%, multi-replica mode is automatically used. However, if excessive targets are captured at one time, an out of memory (OOM) error occurs and multi-replica mode is also used.

  • When multiple replicas of a Prometheus agent run at the same time, the master replica only needs to discover targets. The worker replica captures targets. If the memory usage of the worker replica exceeds 60%, the task used to capture targets is reassigned and the number of worker replicas is automatically increased to the required number calculated by the system.

    Note

    Based on the algorithm for multi-factor collaborative scheduling, the maximum product of the total number of targets that each agent can capture per round multiplied by the total number of metrics is 4 billion. The maximum memory usage is 70%. Each agent can capture a maximum of 4,000,000 metrics.

Enable the HPA feature

After the Helm version is upgraded to 1.0.0 or later, the HPA feature is automatically enabled for the Prometheus Agent. For more information, see Upgrade the component version.

Adjust the number of replicas

The automatic scale-out of the Prometheus agent does not cause an unlimited increase in the number of replicas collected by the Prometheus agent. By default, the maximum number of replicas collected by the Prometheus agent is 30. The Prometheus agent does not support automatic scale-in because it may cause data loss. To adjust the number of agent replicas, perform the following steps:

  1. Log on to the Application Real-Time Monitoring Service (ARMS) console. On the Instances page, click the name of the Prometheus instance.

  2. In the left-side navigation pane, click Settings. On the page that appears, click the Settings tab. Click Replicas in the Actions column. In the dialog box that appears, specify the number of agent replicas and click OK.ert

Verify the result

After you adjust the number of agent replicas, you can check whether the number of replicas has changed and whether the monitoring is normal on the self-monitoring dashboard of the Prometheus agent. Perform the following steps:

  1. Log on to the ARMS console. On the Instances page, click the name of the Prometheus instance.

  2. In the left-side navigation pane, click Dashboards. Click the Prometheus Agent dashboard. On the page that appears, you can view the running status of the Prometheus agent, time consumed to capture real-time and historical metrics, number of targets captured, amount of data sent, and resource usage. For more information, see Self-monitoring dashboard of the Prometheus agent.

References

For information about how to view the status and number of agents, see Configure a Prometheus agent.