All Products
Search
Document Center

Cloud Monitor:Release notes

Last Updated:Dec 18, 2025

This topic describes the release notes for the CloudMonitor agent.

4.0.0.1

Category

Description

Release date

2025-10-17

New features

  1. GPU:

    • Adapted to NVIDIA driver versions 580 and later. The `power_draw` parameter was changed to `instant_power_draw`, which previously caused the metric to report no data.

  2. Introduced the pluggable feature. This feature allows tasks to run in separate processes to improve the overall stability of the agent and enable fine-grained resource allocation.

  3. Added support for Server Name Indication (SNI) when accessing HTTPS for agent versions 3.5.11 and later.

Optimizations

  1. Reduced the startup memory usage on Windows from over 150 MB to under 10 MB. This was achieved by optimizing the stacktrace library, which previously increased application memory usage by 1 MB.

  2. Optimized the PowerShell installation script. Fixed a warning that occurred when restoring the agent.properties file during installation on non-Alibaba Cloud ECS instances.

Fixed issues

  1. Fixed an issue where the agent failed to start when the PdhOpenQuery (win32) function failed.

  2. Fixed an issue on Alibaba Cloud Linux 4 where the agent failed because the /var/run directory does not exist. The path is now replaced with /var.

4.0.0

Category

Description

Release date

2025-09-23

New features

  1. GPU:

    • Added support for AMD GPUs. Metrics are collected using amd-smi. You must install this tool manually because it is not included in the AMD driver package.

    • Added support for Iluvatar CoreX GPUs.

    • For NVIDIA GPUs with driver versions 535 or later, the `-pm 1` parameter is no longer called. Calling this parameter could cause unexpected GPU behavior.

  2. Added support for retrieving the serial numbers of NVMe disks.

  3. Added support for Structured Process Language (SPL) structured logs.

  4. Added a new metric:

    • sys_fs (for Linux only), located in the acs/host namespace.

  5. Added support for four new regions:

    • Mexico (na-south-1)

    • US (Atlanta) (us-southeast-1)

    • China (Ulanqab) Apsara Stack Enterprise (cn-wulanchabu-acdr-1)

    • China (Ulanqab) General Industry Cloud (cn-wulanchabu-gic-1)

  6. Upgraded the installation script to version 1.16:

    • Preserves previous configurations (agent.properties, agent.json, and accesskey.properties) during installation. This prevents configuration loss after reinstallation or an upgrade.

    • Deletes the downloaded installation package before exiting if no upgrade is needed because the version is the same.

    • The Argus agent is no longer installed on Cloud Phone by default. To install it, you must include the `ENABLE_WUYING=True` variable.

    • Added support for sovereign clouds.

    • Added support for security hardening-only mode.

Optimizations

  1. Added manufacturer information to the argusagent_service. This helps you manage whitelists.

  2. For Linux file system monitoring, the agent now attempts to read /proc/mounts if /etc/mtab does not exist.

  3. Enhanced security hardening. Added support for the security hardening-only mode for the ECS MetaServer.

  4. Improved the stability of ping operations.

  5. If the ping target is a domain name, the IP address is re-resolved each time a ping task starts. This prevents inaccurate ping results caused by expired IP addresses.

  6. For self-monitoring on Windows, the default threshold for the number of open files is set to 999,999. This metric fluctuates widely on Windows, making it difficult to set a reasonable threshold.

  7. Enabled NoProxy mode when accessing the MetaServer.

  8. Added the `main_ip` and `host_name` fields to the `acs_host/system.process.agent` data. These fields help distinguish between instances in case of an instanceId conflict caused by a MetaServer error. This lets you quickly locate the abnormal machine.

Fixed issues

  1. Removed the `-pm 1` call for NVIDIA GPU driver versions 535 and later.

  2. Fixed an issue where the agent was stuck at starting... because it was blocked by the `docker ps` command. The solution is to use a lazy loading mode. This mode bypasses the detection of Pouch and Docker environments at startup and detects them only when needed. Although this may still cause blocking for features such as log collection, it does not affect the collection of basic metrics.

  3. All keywords from group process monitoring are converted to lowercase.

  4. Fixed a memory access violation issue on Windows.

  5. Added support for the new output format of Ascend GPUs.

  6. Added support for source ports and bidirectional probing for Telnet.

  7. Fixed an issue where using `std::ifstream` to read a non-existent file on some Linux distributions caused a SIGABRT signal, leading to an abnormal end of the agent.

Important

As of version 4.0.0, 32-bit Linux is no longer supported.

3.5.12

Category

Description

Release date

2024-10-09

New features

  • Added support for new regions:

    • Hangzhou Apsara Stack Enterprise KS01.

    • China (Chengdu) Ant Financial Cloud (cn-chengdu-ant).

  • Removed the India (Mumbai) (ap-south-1) region.

  • Changed the task scheduling clock from the system clock to a monotonic clock. This prevents timing issues caused by system time adjustments.

  • Added the tool curl command. This command probes a target address and prints the interaction process to help with on-site troubleshooting.

    /usr/local/cloudmonitor/bin/argusagent tool curl --help
    
    Usage: argusagent tool curl [options] url
    Allowed options:
      -h [ --help ]               Print this help message
      -X [ --request ] arg (=GET) Specifies a custom request method to use.
      --url arg                   Target url.
      -d [ --data ] arg           Only for POST, http bod.y
      -H [ --header ] arg         Extra header to use.
      -m [ --max-time ] arg (=30) Maximum time in seconds that you allow the whole operation to take.
      -x [ --proxy ] arg          Use the specified proxy, format: [protocol://]host[:port].
      --proxy-user arg            Specify the user name to use for proxy authentication.
      --proxy-pass arg            Specify the password to use for proxy authentication.
      --proxy-http2               Negotiate HTTP version 2 with an HTTPS proxy. The proxy might still only offer HTTP/1 and
                                  then curl sticks to using that version. This has no effect for any other kinds of
                                  proxies.
      --json arg                  Json object config, this will ignore all other options.
      --json-file arg             Json object config file, this will ignore all other options.
      --task-id arg               Detect once of http task with taskId
  • Added the `-e GetTopTasks` parameter. This lets you dynamically view the time consumption of the top 20 tasks at runtime.

    Linux

    # The path on CoreOS is /opt/cloudmonitor/bin/argusagent.
    /usr/local/cloudmonitor/bin/argusagent -e GetTopTasks

    Windows

    "C:\Program Files\Alibaba\cloudmonitor\bin\argusagent.exe -e GetTopTasks
  • Added proxy information to logs for heartbeats and metric reporting. This clarifies that the agent is not necessarily using the public network.

  • Added tagging support for process monitoring.

  • Added a self-monitoring metric: the agent automatically restarts if no basic metrics are collected for two consecutive minutes.

  • GPU: Added support for Ascend and Hygon GPUs on Linux.

  • Upgraded the installation script to version 1.13.

    • Lowered the script dependency from bash to sh. This improves compatibility and allows the agent to be installed on a wider range of systems, such as Android.

    • Added support for custom proxies during manual installation. Previously, custom proxies could only be used to download the installation script.

    • Added a self-check after download. The old version is uninstalled only if the new installation package is valid. This prevents failures caused by incomplete downloads (empty packages) from tools like wget.

    • Optimized installation logs to make them more readable and easier to use for troubleshooting.

    • Added support for installing from a local package file (`-packageFile`) on Windows. This skips the download process.

    • Fixed a bug that occurred when using a proxy to install the agent on a third-party host. The bug caused the installation package to be downloaded again without the proxy after a successful initial download.

    • Added support for Cloud Phone. This feature is intended for the Cloud Phone product, not for end users.

Fixed issues

  • Fixed an issue where the argusagent service was not started correctly as a service during installation.

  • Availability Monitoring does not support multiple headers.

  • Fixed an issue on Linux where the `hostname -i` command returned multiple IP addresses.

  • Fixed a compatibility issue that prevented the use of both hosts and URIs when creating a Telnet task through an API.

  • Fixed an issue in non-ECS mode where the `accesskey.properties` file in the `bin` directory was not recognized.

  • Fixed an issue on Linux where `udevadm` was called repeatedly if a disk did not have a serial number.

  • Fixed an issue where the HTTPS/2 proxy setting did not take effect.

  • Fixed uneven task scheduling for ping tasks when the packet loss rate was high. The issue was caused by an incorrect timing assumption in the three-timer algorithm. The logic is now event-driven, triggered by both packet receipt and timeouts.

  • Fixed an issue where old tasks were occasionally not cleared when an availability monitoring task was updated.

  • Fixed an occasional SIGSEGV error that occurred when updating availability monitoring tasks.

  • Fixed an issue on Windows where memory metrics could not be obtained if performance counter data was abnormal or missing.

  • Fixed a memory leak on Windows. The value returned by `CommandLineToArgvW` is now correctly released using the `LocalFree` function instead of `GlobalFree`.

3.5.11

Category

Description

Release date

2024-03-25

New features

  • Added official support for Windows x64.

  • Added support for IPv6.

  • Added support for HTTP/2.

  • Added support for macOS and FreeBSD, based on the Sigar library.

  • Expanded proxy support to seven protocols: HTTP, HTTPS, HTTPS/2, SOCKS4, SOCKS4A, SOCKS5, and SOCKS5h.

  • Added the following GPU enhancements:

    • Added support for C:\Windows\System32\nvidia-smi.exe.

    • You can now install a GPU while the agent is running. This means the GPU can be installed after the agent.

    • Added support for GPU data collection based on the dynamic-link library (libnvml). This makes data collection more secure and faster.

      Note

      You must manually enable the dynamic-link library by setting nvidia.nvml.enabled=true. This avoids potential freezes that can occur when enabling it by command on some systems.

  • The effective time for availability probing now supports cron expressions.

  • Prometheus data collection now supports authentication using HTTP headers.

  • Added support for the following regions:

    • cn-wuhan-lr: Wuhan Local Region.

    • cn-qingdao-acdr-ut-1 is the Qingdao Haier Dedicated Cloud.

  • Removed the following four metrics:

    • system.udp

    • system.task: the number of system processes or threads

    • memory.swap: the swap partition on Linux

    • system.cpuCore: metrics for each CPU core

Fixed issues

  • Fixed an issue on Windows where only 4 GB of memory usage was reported for a process even if its memory usage exceeded 4 GB.

  • Fixed an issue where domain name resolution would hang for more than 20 seconds on some systems.

  • Fixed an issue where some Prometheus metrics failed to parse.

  • Fixed an issue where log collection consumed high CPU resources.

  • Fixed an issue in availability monitoring where a failed ping would stop subsequent probes.

  • Fixed an issue where the host serial number could contain a line feed character.

  • Fixed an issue where running multiple probes for the same Telnet task could cause the ArgusAgent to crash.

  • Fixed an issue with non-standard SOCKS5 support.

  • Fixed an issue on Windows where the WMIC tool could not be found.

  • Fixed an issue where the agent failed to start because std::locale("") was not supported.

  • Fixed multiple potential memory leaks.

  • Fixed a SIGSEGV error caused by the invalidation of `localTimeCache` when the main function exits.

    Note

    A core dump file is generated during a normal exit.

Performance optimizations

  • Improved stability. The agent stops monitoring processes when the number of system processes exceeds a configurable threshold (default is 5,000). This prevents excessive resource consumption.

  • When you upgrade a plugin, the installation package is automatically downloaded and then deleted after a successful installation.

  • Improved the compatibility of JSON configuration files. The agent now supports C-style comments, trailing commas, and non-standard UTF-8 encoding.

  • For public cloud log collection, the JSON parser now supports non-JSON prefixes and suffixes.

  • Changed the disk data collection timer from the system clock to the hardware clock. This prevents timing errors caused by system clock adjustments.

  • For disk data collection, the mount_point/dir_name string, which is a concatenation of all mounted disk directories, is limited to a maximum length of 2,048 bytes.

    Note

    You can adjust this limit using the `agent.resource.dirName.limit` parameter. The default value is 2,048 bytes. If you set a value less than 1,024 bytes, it defaults to 1,024 bytes.

3.5.10

Category

Description

Release date

2023-09-08

New features

  • The moduleTask.json file now supports disabling features. You no longer need to delete a feature to disable it.

  • Added the argusagent tool top tool. This lets you sort and display the top N (-n N) processes based on the number of open files (-by fd), memory usage (-by mem), or CPU utilization (-by cpu).

  • Added support for disabling GPU data collection.

  • Added support for dynamic GPU data collection while the agent is running.

  • Added support for dynamically changing the number of CPU cores while the agent is running.

  • Added support for non-Alibaba Cloud hosts that are managed by Cloud Assistant.

Note

When the agent exits abnormally, it generates a minidump file. When the process starts again, it reports the minidump file to CloudMonitor. The file is used for backend analysis and bug fixing to improve agent stability.

Fixed issues

  • Fixed an issue where data failed to be reported in specific time zones.

  • Fixed an issue where the number of open files for the top five processes was abnormal.

  • Fixed a precision loss issue in reported data when a process ID exceeded 1,000,000.

  • Fixed an issue on Windows where argusagent could not be started or a required dynamic-link library was missing after the agent was installed on hosts of some existing users.

  • Fixed an issue with reading CRLF characters in configuration files on Windows.

  • Fixed an issue where the first collection of the process count was skipped, resulting in a value of 0.

Performance optimizations

  • Optimized process monitoring performance to reduce the frequency of unexpected agent exits.

  • Increased the installation success rate of the agent on the Windows operating system.

  • When resource limits are exceeded, the agent now collects the top 10 consumers of a resource (such as CPU) and enumerates the call stacks of all threads. This helps analyze the agent's resource usage.

3.5.8

Category

Description

Release date

2022-06-30

New features

  • Added a feature to distribute and store files.

  • Added metrics for network packet loss, error rate, and the number of zombie processes.

  • Added metrics for device usage and swap usage.

Fixed issues

  • Reduced the concatenation length of dir_name to 512 bytes.

  • Fixed an issue on Windows where the `GetUptime` function was called redundantly by the system module.

  • Fixed a memory leak issue caused by the IphlpapiGetTcpTable function.

  • Fixed an issue where the cpu.total metric was incorrectly calculated as cpuPercent.combined instead of 1-cpu.idle.

  • Fixed an issue where ping probes did not correctly check if the destination IP address for sending data matched the source IP address for receiving data. Also fixed an ICMP serial number collision issue.

  • Fixed an issue on Windows where the argusagent service failed to start if its path in the registry (`imagePath`) contained spaces.

3.5.7

Category

Description

Release date

2022-04-30

New feature

Added support for TCP metrics.

Fixed issues

  • Fixed an issue where excessively large data was reported due to disk directory concatenation.

  • The proxy for the Alibaba Cloud International Website does not poll.

  • Fixed an issue on Win32 where file associations for .py files were changed.

3.5.5

Category

Description

Release date

2021-12-30

New feature

Added support for log collection.

Fixed issue

None

3.5.4

Category

Description

Release date

2021-12-16

New features

  • Added support for Chinese process names for process collection on Windows.

  • Added support for Chinese usernames on Windows.

Fixed issues

  • The container service occasionally reports an incorrect number of CPU cores.

  • Fixed an IP address resolution error on Windows.

  • Fixed an issue where the CloudMonitor agent occasionally quit when collecting the number of threads for a process.

  • Fixed an issue where ping probes in availability monitoring did not work correctly.

  • Fixed an issue where the scheduling interval for availability probing was occasionally longer than the configured interval.

3.5.3

Category

Description

Release date

2021-09-10

New features

  • Added support for data collection using Exporters.

  • HTTP availability monitoring tasks now support redirection for the HTTPS protocol.

  • HTTP availability monitoring tasks now support more SSL ciphers.

  • By default, HTTP availability monitoring tasks now behave the same as the curl tool and include headers such as `user_agent`.

Fixed issues

  • Fixed an occasional calculation error for TCP connection metrics on hosts with IPv6 connections.

  • Improved the precision of the disk usage metric from an integer to include decimal places.

  • Fixed an issue where the agent was occasionally disabled when collecting the `cred` metric for a process.

3.5.2

Category

Description

Release date

2021-06-30

New features

  • The monitoring frequency of availability monitoring tasks can now be adjusted.

  • Optimized the usability of availability monitoring. For example, local logs are now more comprehensive and standardized.

  • Added the Uptime metric, which shows the system's running time since its last startup.

Fixed issue

Fixed issues such as missing dynamic-link libraries for the Windows version of the CloudMonitor agent.

3.4.10

Category

Description

Release date

2021-03-11

New feature

None

Fixed issue

Fixed an issue where the agent failed to read the AccessKey path after being connected to a non-Alibaba Cloud host.

3.4.9

Category

Description

Release date

2021-01-05

New feature

Added support for the SOCKS5 proxy.

Fixed issue

Fixed an issue where certain dynamic-link libraries were missing on non-Alibaba Cloud hosts that run Windows Server 2012 or earlier.

3.4.8

Category

Description

Release date

2020-11-17

New feature

None

Fixed issue

Fixed an issue where availability monitoring could not correctly parse URLs.

3.4.7

Category

Description

Release date

2020-07-27

New features

  • Supported metrics: disk I/O and single CPU core utilization.

  • Reduced resource consumption: The agent was refactored to consume fewer hardware resources and have less impact on system load during metric collection.

  • Introduced a self-protection mechanism. The agent automatically exits if its system resource consumption exceeds limits due to high system load or other reasons.

Fixed issue

Fixed issues with the Go version of the CloudMonitor agent, such as automatically changing the system time.

Earlier versions

For more information, see earlier versions.