Prometheus agent release notes - Managed Service for Prometheus

2025

Prometheus Probe

Version	Release date	Changes
v1.1.37	January 2026	Optimized the Endpoint rotation mechanism of o11y-addon-controller for scenarios such as Prometheus instance version upgrades. New basic metrics: cluster_pv_detail_rnumtotal cluster_pvc_detail_humtotal node volume capacitybytes total node_volume_capacitybytes_used node volume_capacitybytes_available
v1.1.36	December 2025	Upgraded gpu-exporter to v3.3.9-1.12.16-2.6.6.1-4-5b00472-aliyun to support L20A metrics.
v1.1.35	December 2025	Reduced ClusterRole permissions (release announcement). The managed agent supports EndpointSlice service discovery. Fixed an issue where removing global labels did not take effect. The managed agent supports in-cluster service access via domain names, including health checks. Public and internal Endpoints for API requests and data transmission can be configured with separate switches.
v1.1.34	December 2025	Fixed a security issue: Switched metadata to a secure, authenticated calling method.
v1.1.33	August 2025	Chore: Removed all pprof dependencies. Upgraded GPU Exporter to v3.3.9-1.12.16-2.6.6.1-2-473e40f-aliyun to be compatible with earlier driver versions.
v1.1.32	June 2025	Updated the GPU Exporter image version. Added self-monitoring metrics for the internal queue. The remote write timeout is now determined by the remote write configuration. The default value is 30s.
v1.1.31	Phased release starting from March 2025	Optimized large-scale target scheduling for faster task allocation. By default, service discovery is performed only for pods in the Running state. Optimized service discovery to reduce memory usage. Optimized log output to reduce duplicate logs. Added support for metrics related to APIServer API Priority and Fairness (APF).
v1.1.30	March 2025	Optimized the leader election logic between replicas in a multi-replica runtime. Fixed an issue where plaintext secrets were parsed incorrectly in some cases. Fixed an issue where the last collection task could not be stopped correctly when all collection configurations were deleted. Optimized the collection method for virtual-kubelet nodes to return only the metrics of the current node. Adjusted the GPU Exporter collection configuration to add the GPU Exporter pod's own information to labels that start with `source_` to avoid conflicts with labels in the timeline. Added a retry mechanism to prevent token refresh failures.
v1.1.27	January 2025	Adjusted the scheduling settings for workloads in edge clusters. Applied security hardening to some collection jobs in edge clusters. Adopted a more compatible cAdvisor service discovery mode to support container clusters with versions earlier than 1.20.0.

2024

Prometheus exporter

Version	Release date	Scope	Changes
v1.1.25	October 2024	Container environment	Added support for some new metrics from Node Exporter and Kube State Metrics. Added support for service discovery for Ingress v1. Adapted the cAdvisor data collection feature for Virtual Kubelet nodes. Added compatibility for exemplar format timelines in the OpenMetrics protocol. Fixed an issue where metric labels were not arranged in lexicographic order in some scenarios. Fixed an issue where collection configurations failed to update correctly in some scenarios. Resolved an issue where data was not collected correctly when different ServiceMonitors were configured for the same collection target.
v1.1.22	September 2024	Container environment	Added support for some new basic metrics from Node Exporter and Kube-state-metrics (KSM). Removed the `/aliyun` page on port 9335 of the arms-prom-admin service in the arms-prom namespace to meet security compliance requirements.
v1.1.20	May 2024	Container environment	[Collection] Fixed an issue where built-in collection jobs could not be custom-overwritten. [Collection] Added the `aliyun_prometheus_agent_hpa_max_limit` self-monitoring metric for the maximum number of replicas. [Collection] Improved runtime support for VPC-hosted scenarios. [Collection] Added a feature switch to enable metric reporting over HTTPS. [Collection] Added support for adaptive metric collection in ASM mTLS environments. [Collection] Fixed an issue where the metric preview URL failed if it contained invalid characters. [Collection] Fixed an issue where the program would stop working if a collection configuration tried to load a non-existent local CA certificate. [Collection] Added support for pushing self-monitoring metrics to regions such as and Saudi Arabia. [Collection] Added a node name label to the metrics of the built-in Node Exporter collection job. [Collection] Disabled the registration capability for Prometheus storage instances. [Collection] Enabled metric bucketing convergence to work in multi-replica mode. [Management] A separate component now provides the Prometheus instance registration capability. The registration mechanism of the collection component is disabled by default. [Management] Provided the ability to install and uninstall the observability Integration Center component. [Management] Added support for enabling Container Monitoring Pro Edition. [Kube-State-Metrics] Upgraded the AutoScaling API to v2. [Kube-State-Metrics] Upgraded the CronJob and PodDisruptionBudget API versions to v1. [Kube-State-Metrics] Adjusted security policies.
v1.1.20	May 2024	Alibaba Cloud services	Reduced incremental metric latency to second-level in large-scale collection scenarios. Reduced metric collection activation time for newly integrated Alibaba Cloud services from minutes to seconds. Added the ability to inject custom-selected Alibaba Cloud service tags into metrics. Due to architectural adjustments, self-monitoring metrics related to the original Prometheus Agent are no longer delivered to user instances. These metrics are free of charge. If your alerts depend on self-monitoring metrics for non-Alibaba Cloud services, such as metrics that start with `aliyun_arms`, remove these dependencies before you upgrade. The `arms_instance_id` and `arms_instance_name` metrics, which existed in some earlier instance versions, are deprecated in this version. Due to architectural adjustments, querying the list of targets is no longer supported.
v1.1.19	March 2024	Container environment	Reduced initial metric collection latency for large-scale clusters. Optimized service discovery to reduce the impact of configuration changes across collection jobs. Added self-monitoring metrics for detecting data gaps from collection exceptions. Added flexible metric whitelist filtering configurations. Fixed a batch of collection exceptions that occurred in edge cases.

2023

Expand to view the component changelog for 2023

Helm version number	Agent image version number	Changes	Release date	Impact of changes
v1.1.18	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v4.0.0	Adjusted the resource requests and limits for Node Exporter, GPU Exporter, and other resources. The Node Exporter port number is now configurable. The default value remains 9100.	December 2023	This upgrade does not affect your services.
v1.1.17	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v4.0.0	Added a cluster event collection task to support the Kubernetes deployment dashboard. Added self-monitoring metric instrumentation based on Service-Level Agreements (SLAs) for the SLA stability dashboard. Added support for BasicAuth authentication in ServiceMonitor. The secret must be in the same namespace as the ServiceMonitor. Added the Metrics Metadata feature to display the meaning of specific metrics. Added support for passing the Agent Chart version to the server-side. The server-side uses this version number to initialize or upgrade dashboards. Added a self-monitoring metric for RemoteWrite to count the time taken to send each data batch. Added self-monitoring metrics for basic metric collection errors and latency. Added self-monitoring metrics for business metric collection errors and latency. Optimized the default RemoteWrite `queue_config` parameters to `min_shards=10`, `max_samples_per_send=5000`, and `capacity=10000` to improve adaptability for large-scale clusters. Optimized the service discovery method for Container Storage Interface (CSI) collection jobs, mainly related to PersistentVolume (PV) collection. Optimized the `senderLoop` delivery frequency and modified the `syncWorkersSeries` frequency to reduce unnecessary disturbances. Streamlined some logs and optimized others to show more detailed time consumption in the scrape chain. Optimized basic metric collection jobs to have their own fixed collection period and timeout settings, instead of using the global configuration. This reduces unnecessary interference with basic metric collection. Optimized the interaction logic in Master-Slave multi-replica mode. The Master and workers, and workers among themselves, no longer affect each other, which improves stability. Optimized the Master's target delivery policy, saving about 30% of CPU and 40% of memory resource overhead and improving collection performance. Optimized `metrics_relabel` to reduce CPU usage by 70%. Optimized the Informer listener logic in multi-tenant scenarios, saving about 20% of CPU overhead. Optimized handling of occasional CoreDNS domain name resolution failures. The system now automatically switches to and uses a cached IP address, reducing reliance on real-time CoreDNS resolution and improving data sending stability. Optimized the logic for delivering collection configurations via `SendConfig` to improve delivery stability. Optimized the Master's pre-scraping strategy to save Master resources and improve its service discovery and target scheduling capabilities. Added adaptive handling for large single batches greater than 1 MB to reduce packet loss caused by backend limitations. Fixed an issue in `ScrapeLoop` where some collection targets could not be stopped, causing duplicate collection. Fixed an issue in multi-tenant scenarios where untimely updates to the pod label cache caused a single timeline to split into two. Fixed an issue where the Master occasionally failed to deliver targets to out-of-memory (OOM) or restarted replicas, causing some collection targets to be lost. Fixed issues with parsing secret types and transmitting headers in RemoteWrite. Fixed an issue where the `kubernetes-pods` shutdown operation occasionally failed to take effect. Fixed an issue where default global parameters and `external_labels` did not take effect. Custom modifications are now supported.	August 2023	This upgrade does not affect your services.
v1.1.15	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v4.0.0	Adapted for Container Service for Kubernetes (ACK) v1.26 cluster versions.	May 2023	This upgrade does not affect your services.
v1.1.14	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v4.0.0	Optimized resource consumption by about 30% for memory and 50% for CPU to improve collection capabilities. Further reduced the dependency on CoreDNS domain name resolution to improve data sending stability. ServiceMonitor supports Basic Authentication. Fixed an issue with parsing secret types in RemoteWrite. Added three self-monitoring instrumentation points. Metrics Metadata now displays the meaning of metrics. Added a collection task to gather cluster event metrics. Added a Multi-Master mechanism to handle service discovery and target scheduling for extra-large-scale clusters. This feature is disabled by default. Fixed more than three bugs.	May 2023	This upgrade does not affect your services.
v1.1.13	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v4.0.0 registry.{REGION}.aliyuncs.com/acs/gpu-prometheus-exporter:v2.3.6-994eaf7-aliyun	Upgraded GPU-Exporter to v2.3.6-994eaf7-aliyun. Added support for ACK One registered clusters.	April 2023	This upgrade does not affect your services.
v1.1.12	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.2.1 registry.{REGION}.aliyuncs.com/acs/gpu-prometheus-exporter:v2.3.6-fdb40f2-aliyun	Upgraded GPU-Exporter to v2.3.6-fdb40f2-aliyun. Optimized the pull speed for component images.	February 2023	This upgrade does not affect your services.

2022

Expand to view the component changelog for 2022

Version number	Registry Address	Changes	Release date	Impact of changes
v1.1.11	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.2.1	Added a service degradation feature to prioritize the stability of the main collection link when Remote Write fails. Added support for modifying the Global Config in collection jobs. Enhanced Remote Write. When CoreDNS fails to resolve a domain name, the system automatically switches to sending data using a pre-cached IP address. Added support for configuring multiple sending addresses.	December 2022	This upgrade does not affect your services.
v1.1.9	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.2.0	The agent now supports multiple CPU architectures, including amd64, arm, arm64, ppc64le, and s390x. Enhanced the agent's self-monitoring capabilities. Optimized the agent's memory garbage collection policy. Optimized the target scheduling policy for multiple replicas to prevent memory leaks in workers. Fixed an issue with agent memory degradation. Fixed a deadlock issue that occurred under boundary conditions in multi-replica mode. Added four new service discovery capabilities: IonOS, PuppetDB, Uyuni, and Vultr.	September 2022	This upgrade does not affect your services.
v1.1.7	arms-prom-operator:v3.1.0 gpu-prometheus-exporter:v2.3.6-2.0.0-0c0440f	Added GPU-Exporter metrics and dashboards. Enable GPU monitoring for a cluster.	July 2022	This upgrade does not affect your services.
v1.1.6	arms-prom-operator:v3.1.0 gpu-prometheus-exporter:v1.0.1-26c5321	Fixed a data collection issue in GPU-Exporter v1.x.	June 2022	This upgrade does not affect your services.
v1.1.5	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.1.0	Added support for the Integration Center. Added support for extra-large-scale clusters (>10,000 nodes). You can configure synchronization for ServiceMonitor and PodMonitor resources that are created outside the Managed Service for Prometheus console. Added support for configuring declarative service discovery for ServiceMonitors and PodMonitors that are not created in the Managed Service for Prometheus console. The upper limit for the number of agent Horizontal Pod Autoscaler (HPA) replicas is now configurable as a parameter. Added support for editing some fields of Prometheus basic metric jobs. Added support for online validation of configuration files for ServiceMonitor, PodMonitor, and Prometheus.yaml. Optimized CPU and memory resource usage and system stability.	May 2022	This upgrade does not affect your services.
v1.1.4		Applied security hardening to node-exporter. Fixed a volume mount issue in gpu-exporter.	April 2022	This upgrade does not affect your services.
v1.1.3		This is compatible with clusters running version 1.22.	February 2022	This upgrade does not affect your services.
v1.1.2		Upgraded kube-state-metrics to v2.3.0-755434c-aliyun.	January 2022	This upgrade does not affect your services.

2021

Expand to view the component changelog for 2021

Version number	Registry Address	Changes	Release date	Impact of changes
v1.1.11	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.2.1	Added a service degradation feature to prioritize the stability of the main collection link when Remote Write fails. Added support for modifying the Global Config in collection jobs. Enhanced Remote Write. When CoreDNS fails to resolve a domain name, the system automatically switches to sending data using a pre-cached IP address. Added support for configuring multiple sending addresses.	December 2022	This upgrade does not affect your services.
v1.1.9	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.2.0	The agent now supports multiple CPU architectures, including amd64, arm, arm64, ppc64le, and s390x. Enhanced the agent's self-monitoring capabilities. Optimized the agent's memory garbage collection policy. Optimized the target scheduling policy for multiple replicas to prevent memory leaks in workers. Fixed an issue with agent memory degradation. Fixed a deadlock issue that occurred under boundary conditions in multi-replica mode. Added four new service discovery capabilities: IonOS, PuppetDB, Uyuni, and Vultr.	September 2022	This upgrade does not affect your services.
v1.1.7	arms-prom-operator:v3.1.0 gpu-prometheus-exporter:v2.3.6-2.0.0-0c0440f	Added GPU-Exporter metrics and dashboards. Enable GPU monitoring for a cluster.	July 2022	This upgrade does not affect your services.
v1.1.6	arms-prom-operator:v3.1.0 gpu-prometheus-exporter:v1.0.1-26c5321	Fixed a data collection issue in GPU-Exporter v1.x.	June 2022	This upgrade does not affect your services.
v1.1.5	registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.1.0	Added support for the Integration Center. Added support for extra-large-scale clusters (>10,000 nodes). Added support for synchronizing ServiceMonitors and PodMonitors that are not created in the Managed Service for Prometheus console. You can configure declarative service discovery for ServiceMonitors and PodMonitors created outside the Managed Service for Prometheus console. The upper limit for the number of agent Horizontal Pod Autoscaler (HPA) replicas is now configurable as a parameter. Added support for editing some fields of Prometheus basic metric jobs. Added support for online validation of configuration files for ServiceMonitor, PodMonitor, and Prometheus.yaml. Optimized CPU and memory resource usage and system stability.	May 2022	This upgrade does not affect your services.
v1.1.4		Applied security hardening to node-exporter. Fixed a volume mount issue in gpu-exporter.	April 2022	This upgrade does not affect your services.
v1.1.3		Added compatibility for v1.22 clusters.	February 2022	This upgrade does not affect your services.
1.1.2		Upgraded kube-state-metrics to v2.3.0-755434c-aliyun.	January 2022	This upgrade does not affect your services.

2020

Expand to view the component changelog for 2020

Helm version number	Agent image version number	Feature overview	Release date	Impact of changes
v0.1.5	arms-prom-operator:v0.1	Added support for Alibaba Cloud Container Service for Kubernetes v1.18 clusters. Added support for pulling images from internal network addresses in a region.	October 2020	This upgrade does not affect your services.
v0.1.4	arms-prom-operator:v0.1	Provided out-of-the-box Kubernetes container monitoring, including pod, node, and resource monitoring. This is mainly used for monitoring the Kubernetes container runtime where applications are located. Provided UI-based monitoring for nine common components, such as MySQL, Redis, Kafka, ZooKeeper, and Nginx. This is mainly used for monitoring middleware that applications depend on. Provided a fully managed Managed Service for Prometheus system, including Prometheus.yaml collection rules, Grafana dashboards, and an alerting system. This meets the needs for migrating self-managed Prometheus to Alibaba Cloud. Bug fix: Fixed an authentication access bug.	July 2020	This upgrade does not affect your services.
v0.1.3	arms-prom-operator:v0.1	Added resource limits for the agent.	April 2020	This upgrade does not affect your services.

2019

Expand to view the component changelog for 2019

Helm version number	Agent image version number	Feature overview	Release date	Impact of changes
v0.1.2	arms-prom-operator:v0.1	Initial release.	August 2019	This upgrade does not affect your services.