All Products
Search
Document Center

Managed Service for Prometheus:Prometheus probe version Release notes

Last Updated:Mar 21, 2025

This topic describes the release notes for the Prometheus probe versions.

2025

Prometheus probe

Prometheus Probe Version

Release Date

Change Content

v1.1.31

Phased release starting from March 2025

  • Optimize large-scale Targets scheduling to accelerate task assignment.

  • By default, service discovery is performed only for pods in the Running state.

  • Optimize service discovery to reduce memory usage.

  • Optimize log output to reduce duplicate logs.

v1.1.30

March 2025

  • Optimize the leader election logic among replicas during runtime.

  • Fix the issue of plaintext key parsing errors in some cases.

  • Fix the issue where the last collection task cannot stop properly when all collection configurations are deleted.

  • Optimize the collection method for virtual-kubelet nodes to return only current node metrics.

  • Adjust GPU Exporter collection configuration to add GPU Exporter pod information to tags starting with source_ to avoid conflicts with timeline tags.

  • Add retry to avoid token refresh failure.

v1.1.27

January 2025

  • Adjust scheduling settings for edge cluster workloads.

  • Enhance security for some collection jobs in edge clusters.

  • Adopt a more compatible mode for cAdvisor service discovery to support container clusters below version 1.20.0.

2024

Prometheus probe

Prometheus Probe Version Number

Release Date

Collection Metric Content

Change Content

v1.1.25

October 2024

Container Environment

  • Add support for some new metrics of Node Exporter and Kube State Metrics.

  • Support service discovery for Ingress v1.

  • Adapt cAdvisor data collection for Virtual Kubelet nodes.

  • Add compatibility for OpenMetrics protocol exemplar format timeline.

  • Fix the issue where metric labels are not sorted in lexicographic order in some scenarios.

  • Fix the issue where collection configurations are not updated properly in some scenarios.

  • Resolve the issue where the same collection target is not collected correctly when different Service Monitor configurations are used.

v1.1.22

September 2024

Container Environment

  • Add support for some basic metrics of Node Exporter and KSM.

  • Remove the /aliyun page on port 9335 of the arms-prom-admin service in the arms-prom namespace to meet security compliance requirements.

v1.1.20

May 2024

Container Environment

  • [Collection] Fix the issue where built-in collection jobs cannot be customized and overwritten.

  • [Collection] Add self-monitoring metric aliyun_prometheus_agent_hpa_max_limit for the maximum number of replicas.

  • [Collection] Improve support for VPC hosting scenarios.

  • [Collection] Support enabling HTTPS reporting metrics through feature switches.

  • [Collection] Support adaptive collection metrics in ASM mTLS environments.

  • [Collection] Fix the issue where the metric preview URL fails due to abnormal characters.

  • [Collection] Fix the issue where the program does not work due to loading non-existent CA certificates locally in collection configuration.

  • [Collection] Add self-monitoring metric push for regions such as Alibaba Gov Cloud, Alibaba Finance Cloud, and Saudi Arabia.

  • [Collection] Add node name tags to Node Exporter metrics in built-in collection jobs.

  • [Collection] Disable the registration capability of Prometheus storage instances.

  • [Collection] Support bucketing metric convergence in multi-replica mode.

  • [Control] Provide independent components for Prometheus instance registration capability. The registration mechanism of collection components is disabled by default.

  • [Control] Provide installation and uninstallation capabilities for the observability access center component.

  • [Control] Support enabling Container Monitoring Pro Edition.

  • [Kube-State-Metrics] Upgrade AutoScaling API to v2.

  • [Kube-State-Metrics] Upgrade CronJob and PodDisruptionBudget API versions to v1.

  • [Kube-State-Metrics] Adjust security policies.

Alibaba Cloud Service

  • Provide more timely data processing capabilities. In large-scale data collection scenarios, the metric delay increment is reduced to seconds.

  • The effective time for metric collection when new cloud products are accessed is reduced from minutes to seconds.

  • Add the ability to inject custom selected cloud service product tags into metrics.

  • Due to architectural adjustments, the original self-monitoring metrics related to Prometheus Agent are no longer delivered to user instances (these metrics are free of charge). If you rely on self-monitoring metrics that are not cloud products (such as metrics starting with aliyun_arms) in alerts, please remove the dependency on these metrics before upgrading.

  • Some old version instances contain arms_instance_id and arms_instance_name in metrics, which are deprecated in this version.

  • Due to architectural adjustments, the Targets list query is no longer provided.

v1.1.19

March 2024

Container Environment

  • Improve the delay of metric collection when a large-scale cluster is connected for the first time.

  • Optimize service discovery mode to reduce the impact of different collection job configuration changes.

  • Enrich self-monitoring metrics to identify data incompleteness issues caused by collection anomalies.

  • Support more flexible metric whitelist cropping configuration.

  • Fix a batch of issues related to collection anomalies in edge cases.

2023

Expand the collapsible panel to view the 2023 component change records

Helm Version Number

Agent Image Version Number

Change Content

Release Date

Change Impact

v1.1.18

registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v4.0.0

  • Adjusted the Request and Limit of resources such as Node Exporter and GPU Exporter.

  • Node Exporter port number supports configuration. The default value remains 9100.

December 2023

This update has no negative impact on workloads.

v1.1.17

registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v4.0.0

  • Add cluster event collection tasks to support Kubernetes deployment dashboards.

  • Add self-monitoring metric instrumentation based on SLA, SLA stability dashboard data.

  • Add ServiceMonitor support for BasicAuth authentication. Secret needs to be used in the same namespace as ServiceMonitor.

  • Add Metrics Metadata capability to display specific metric meanings.

  • Add support for passing Agent Chart version to the server. The server initializes or upgrades the dashboard based on the version number.

  • Add RemoteWrite self-monitoring metrics to calculate the time consumed to send data in each batch.

  • Add self-monitoring metrics for basic metric collection errors and collection delays.

  • Add self-monitoring metrics for business metric collection errors and delays.

  • Optimize RemoteWrite default parameter queue_config settings to min_shards=10, max_samples_per_send=5000, capacity=10000 to enhance large-scale cluster adaptability.

  • Optimize CSI collection job service discovery method, mainly for PV collection related.

  • Optimize senderLoop distribution frequency and modify syncWorkersSeries frequency to reduce unnecessary disturbances.

  • Optimize and simplify some logs and add more detailed display of capture link latency in some logs.

  • Optimize the fixed collection cycle and collection timeout settings for basic metric collection jobs, no longer using Global configuration to reduce unnecessary interference with basic metric collection.

  • Optimize the logic of mutual influence in Master-Slave multi-replica mode. Master and Worker, Worker and Worker no longer affect each other, improving stability.

  • Optimize Master distribution Targets strategy to save approximately 30% of CPU and 40% of memory resource overhead, improving collection performance.

  • Optimize metrics_relabel to reduce CPU usage by 70%.

  • Optimize informer listening logic in multitenancy scenarios to save approximately 20% of CPU overhead in multitenancy scenarios.

  • Optimize occasional CoreDNS domain name resolution failures by automatically switching to cached IP and continuing to use it, weakly relying on CoreDNS real-time domain name resolution to improve data sending stability.

  • Optimize SendConfig distribution collection configuration logic to improve distribution stability.

  • Optimize Master pre-fetch strategy to save Master resource overhead and improve Master service discovery and Targets scheduling capabilities.

  • Optimize single batch large package greater than 1 MB automatic rotation to reduce data packet loss due to backend restrictions.

  • Fix the issue where some collection targets cannot stop in ScrapeLoop, causing duplicate collection.

  • Fix the issue where the label cache of pods in multitenancy scenarios is not updated in time, causing one timeline to become two.

  • Fix the issue where Master occasionally issues Targets abnormally for OOM or Restart replicas, causing some collection Targets to be lost.

  • Fix the issue of parsing Secret type and transmission Header in RemoteWrite.

  • Fix the issue where Kubernetes-pods shutdown operation occasionally does not take effect.

  • Fix the issue where Global default parameters and external_labels do not take effect, and support custom modification.

August 2023

This update has no negative impact on workloads.

v1.1.15

registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v4.0.0

Helm is compatible with the Kubernetes version 1.26 of Container Service for Kubernetes (ACK) clusters.

May 2023

This update has no negative impact on workloads.

v1.1.14

registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v4.0.0

  • Optimize memory consumption by approximately 30% and CPU consumption by approximately 50% to enhance collection capability.

  • Further reduce dependency on CoreDNS domain name resolution to improve data sending stability.

  • ServiceMonitor supports BasicAuth authentication.

  • Fix the issue of parsing Secret type in RemoteWrite.

  • Add three self-monitoring instrumentation points.

  • Metrics Metadata displays metric meanings.

  • Add collection tasks for collecting cluster event metrics.

  • Multi-Master mechanism to handle service discovery and Targets scheduling for ultra-large-scale clusters. Disabled by default.

  • BugFix more than three items.

This update has no negative impact on workloads.

v1.1.13

  • registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v4.0.0

  • registry.{REGION}.aliyuncs.com/acs/gpu-prometheus-exporter:v2.3.6-994eaf7-aliyun

  • Upgrade GPU-Exporter v2.3.6-994eaf7-aliyun.

  • Support ACK One registered cluster.

April 2023

This update has no negative impact on workloads.

v1.1.12

  • registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.2.1

  • registry.{REGION}.aliyuncs.com/acs/gpu-prometheus-exporter:v2.3.6-fdb40f2-aliyun

  • Upgrade GPU-Exporter v2.3.6-fdb40f2-aliyun.

  • Image pulling is accelerated.

February 2023

This update has no negative impact on workloads.

2022

Expand the collapsible panel to view the 2022 component change records

Version Number

Image Address

Change Content

Release Date

Change Impact

v1.1.11

  • registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.2.1

  • Add service degradation feature to prioritize main link collection stability when Remote Write fails.

  • Support modification of Global Config in collection jobs.

  • Enhance Remote Write. When CoreDNS fails to resolve a domain name, automatically switch to sending with a pre-cached IP.

  • Remote Write supports configuring multiple sending addresses.

December 2022

This update has no negative impact on workloads.

v1.1.9

  • registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.2.0

  • Agent supports multiple CPU architectures, including AMD64, ARM, ARM64, ppc64le, and s390x.

  • Enhance Agent self-monitoring capability.

  • Optimize Agent memory garbage collection strategy.

  • Optimize multi-replica Target scheduling policy to avoid Worker memory leaks.

  • Fix Agent memory degradation issue.

  • Fix deadlock issue in boundary conditions under multi-replica state.

  • Add four service discovery capabilities, including IONOS, PuppetDB, Uyuni, and Vultr.

September 2022

This update has no negative impact on workloads.

v1.1.7

  • arms-prom-operator:v3.1.0

  • gpu-prometheus-exporter:v2.3.6-2.0.0-0c0440f

Support the latest GPU Exporter metrics and dashboards. For more information, see the referenced document.

July 2022

This update has no negative impact on workloads.

v1.1.6

  • arms-prom-operator:v3.1.0

  • gpu-prometheus-exporter:v1.0.1-26c5321

Fix the data collection issue of GPU Exporter v1.x.

June 2022

This update has no negative impact on workloads.

v1.1.5

  • registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.1.0

  • Support integration center.

  • Support ultra-large-scale clusters (>10,000 nodes).

  • Support synchronization of ServiceMonitor and PodMonitor created by the console of non-observable monitoring Prometheus edition.

  • Support declarative service discovery configuration for ServiceMonitor and PodMonitor created by the console of non-observable monitoring Prometheus edition.

  • Support parameterized configuration of the maximum number of agent HPA replicas.

  • Support editing some fields of Prometheus basic metric jobs.

  • Support online verification of ServiceMonitor, PodMonitor, and Prometheus.yaml related configuration files.

  • Optimize CPU and memory resource usage and system stability.

May 2022

This update has no negative impact on workloads.

v1.1.4

  • The security of node-exporter is enhanced.

  • The issues that occur when you mount volumes to gpu-exporter are fixed.

April 2022

This update has no negative impact on workloads.

v1.1.3

This image version is compatible with ACK clusters that run Kubernetes 1.22.

February 2022

This update has no negative impact on workloads.

v1.1.2

Upgrade kube-state-metrics to version v2.3.0-755434c-aliyun.

January 2022

This update has no negative impact on workloads.

2021

Expand the collapsible panel to view the 2021 component change records

Version Number

Image Address

Change Content

Release Date

Change Impact

v1.1.11

  • registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.2.1

  • Add service degradation feature to prioritize main link collection stability when Remote Write fails.

  • Support modification of Global Config in collection jobs.

  • Enhance Remote Write. When CoreDNS fails to resolve a domain name, automatically switch to sending with a pre-cached IP.

  • Remote Write supports configuring multiple sending addresses.

December 2022

This update has no negative impact on workloads.

v1.1.9

  • registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.2.0

  • Agent supports multiple CPU architectures, including AMD64, ARM, ARM64, ppc64le, and s390x.

  • Enhance Agent self-monitoring capability.

  • Optimize Agent memory garbage collection strategy.

  • Optimize multi-replica Target scheduling policy to avoid Worker memory leaks.

  • Fix Agent memory degradation issue.

  • Fix deadlock issue in boundary conditions under multi-replica state.

  • Add four service discovery capabilities, including IONOS, PuppetDB, Uyuni, and Vultr.

September 2022

This update has no negative impact on workloads.

v1.1.7

  • arms-prom-operator:v3.1.0

  • gpu-prometheus-exporter:v2.3.6-2.0.0-0c0440f

Support the latest GPU Exporter metrics and dashboards. For more information, see the referenced document.

July 2022

This update has no negative impact on workloads.

v1.1.6

  • arms-prom-operator:v3.1.0

  • gpu-prometheus-exporter:v1.0.1-26c5321

Fix the data collection issue of GPU Exporter v1.x.

June 2022

This update has no negative impact on workloads.

v1.1.5

  • registry.{REGION}.aliyuncs.com/acs/arms-prometheus-agent:v3.1.0

  • Support integration center.

  • Support ultra-large-scale clusters (>10,000 nodes).

  • Support synchronization of ServiceMonitor and PodMonitor created by the console of non-observable monitoring Prometheus edition.

  • Support declarative service discovery configuration for ServiceMonitor and PodMonitor created by the console of non-observable monitoring Prometheus edition.

  • Support parameterized configuration of the maximum number of agent HPA replicas.

  • Support editing some fields of Prometheus basic metric jobs.

  • Support online verification of ServiceMonitor, PodMonitor, and Prometheus.yaml related configuration files.

  • Optimize CPU and memory resource usage and system stability.

May 2022

This update has no negative impact on workloads.

v1.1.4

  • The security of node-exporter is enhanced.

  • The issues that occur when you mount volumes to gpu-exporter are fixed.

April 2022

This update has no negative impact on workloads.

v1.1.3

This image version is compatible with ACK clusters that run Kubernetes 1.22.

February 2022

This update has no negative impact on workloads.

1.1.2

Upgrade kube-state-metrics to version v2.3.0-755434c-aliyun.

January 2022

This update has no negative impact on workloads.

2020

Expand the collapsible panel to view the 2020 component change records

Helm Version Number

Agent Image Version Number

Function Overview

Release Date

Change Impact

v0.1.5

arms-prom-operator:v0.1

  • The Kubernetes version 1.18 of ACK clusters is supported.

  • Support pulling images from the internal network address of the region.

October 2020

This update has no negative impact on workloads.

v0.1.4

arms-prom-operator:v0.1

  • Out-of-the-box K8s container monitoring, including pod monitoring, node monitoring, and resource monitoring, mainly used for monitoring the container runtime where the application is located.

  • White-screen component monitoring, including nine common component monitoring such as MySQL, Redis, Kafka, ZooKeeper, and Nginx, mainly used for monitoring application dependency middleware scenarios.

  • Fully managed Managed Service for Prometheus system, including Prometheus.yaml collection rules, Grafana dashboards, and alerting system, can meet the needs of self-built Prometheus migration to Alibaba Cloud scenarios.

  • Bug fix: Fix authentication access bug.

July 2020

This update has no negative impact on workloads.

v0.1.3

arms-prom-operator:v0.1

Add agent resource usage limits.

April 2020

This update has no negative impact on workloads.

2019

Expand the collapsible panel to view the 2019 component change records

Helm Version Number

Agent Image Version Number

Function Overview

Release Date

Change Impact

v0.1.2

arms-prom-operator:v0.1

ack-arms-prometheus is released.

August 2019

This update has no negative impact on workloads.