All Products
Search
Document Center

CloudMonitor:Metrics

Last Updated:Dec 12, 2023

Host metrics include operating system metrics and Elastic Compute Service (ECS) basic metrics. Operating system metrics are collected every 15 seconds. ECS basic metrics are collected every minute.

Note

The data of operating system metrics that are collected by the CloudMonitor agent may be inconsistent with the data of ECS basic metrics due to the following causes:

  • Different monitoring frequencies: The monitoring data that is displayed in a monitoring chart is the average value of the metric data that is collected within a statistical period. The statistical period is 15 seconds for operating system metrics and 1 minute for ECS basic metrics. If metric data significantly fluctuates, the values of ECS basic metrics become less than the values of operating system metrics.

  • Different monitored objects: The network traffic data that is collected based on ECS basic metrics is used for billing. The data does not include the free traffic between ECS instances and Server Load Balancer (SLB) instances. The network traffic data that is collected by the CloudMonitor agent is the actual network traffic of each network interface card (NIC). Therefore, the size of the network traffic data that is collected by the CloudMonitor agent is greater than the size of the network traffic data that is collected based on ECS basic metrics. In this case, the size of the data that the CloudMonitor agent collects exceeds the bandwidth or traffic quota that you purchase.

Operating system metrics

  • CPU metrics

    • Windows

      The system calls the NtQuerySystemInformation function in ntdll.dll to obtain the CPU time that is consumed by each process or thread. The system calls this function twice to obtain the CPU utilization of each process or thread that runs in the period between two calls.

    • Linux

      You can check the output of the top command to view information about the metrics that are described in the following table.

    Metric

    Description

    Unit

    MetricName

    Dimensions

    Statistics

    Remarks for Linux

    (Agent)cpu.idle

    The CPU idle time in percentage.

    %

    cpu_idle

    userId and instanceId

    Maximum, Minimum, and Average

    The percentage of the CPU idle time to the total CPU time.

    (Agent)cpu.system

    The CPU utilization of the kernel.

    %

    cpu_system

    userId and instanceId

    Maximum, Minimum, and Average

    The CPU utilization for a system context switch. A high value indicates that excessive processes or threads run on the host.

    (Agent)cpu.user

    The CPU utilization of user processes.

    %

    cpu_user

    userId and instanceId

    Maximum, Minimum, and Average

    The CPU utilization of user processes.

    (Agent)cpu.wait

    The percentage of the CPU that waits for I/O operations to complete.

    %

    cpu_wait

    userId and instanceId

    Maximum, Minimum, and Average

    A high value indicates frequent I/O operations.

    (Agent)cpu.other

    The percentage of the CPU that is occupied by other operations.

    %

    cpu_other

    userId and instanceId

    Maximum, Minimum, and Average

    Calculation method: CPU utilization of low-priority processes + CPU utilization of SoftIrq + CPU utilization of Irq + CPU utilization of Stolen.

    (Agent)cpu.total

    The percentage of the CPU that is occupied.

    %

    cpu_total

    userId and instanceId

    Maximum, Minimum, and Average

    Current consumption = 1 - Host.cpu.idle

  • Memory metrics

    • Windows

      The system calls the GlobalMemoryStatusEx function in kernel32.dll to obtain the current physical and virtual memory usage in a 32-bit Windows operating system.

    • Linux

      You can check the output of the free command to view information about the metrics described in the following table. The free command obtains memory information from the /proc/meminfo file.

    Metric

    Description

    Unit

    MetricName

    Dimensions

    Statistics

    Remarks for Linux

    (Agent)memory.total.space

    The total size of memory.

    Byte

    memory_totalspace

    userId and instanceId

    Maximum, Minimum, and Average

    The total size of memory on the host.

    Data source: the value of the MemTotal parameter in the /proc/meminfo file.

    (Agent)memory.free.space

    The size of available memory.

    Byte

    memory_freespace

    userId and instanceId

    Maximum, Minimum, and Average

    The size of available memory in the system.

    Data source: the value of the MemFree parameter in the /proc/meminfo file.

    (Agent)memory.used.space

    The size of used memory.

    Byte

    memory_usedspace

    userId and instanceId

    Maximum, Minimum, and Average

    The size of used memory in the system.

    Calculation method: Total size of memory - Size of available memory.

    (Agent)memory.actualused.space

    The size of memory that is consumed by users.

    Byte

    memory_actualusedspace

    userId and instanceId

    Maximum, Minimum, and Average

    Calculation method:

    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: Total size of memory - Value of MemAvailable.

    • If the MemAvailable parameter does not exist in the /proc/meminfo file, the following formula is used for calculation: Size of used memory - Size of memory that is used by buffers - Size of cached memory.

    Note

    The calculation result is more accurate in CentOS 7.2, Ubuntu 16.04, or their later versions that use the latest Linux kernel. For more information about the MemAvailable parameter, see commit.

    (Agent)memory.free.utilization

    The percentage of available memory.

    %

    memory_freeutilization

    userId and instanceId

    Maximum, Minimum, and Average

    Calculation method:

    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: Value of MemAvailable/Total size of memory × 100%.

    • If MemAvailable does not exist in the /proc/meminfo file, the following formula is used for calculation: (Total size of memory - Value of actualused)/Total size of memory × 100%.

    (Agent)memory.used.utilization

    The memory usage.

    %

    memory_usedutilization

    userId and instanceId

    Maximum, Minimum, and Average

    Calculation method:

    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: (Total size of memory - Value of MemAvailable)/Total size of memory × 100%.

    • If the MemAvailable does not exist in the /proc/meminfo file, the following formula is used for calculation: (Total size of memory - Total size of available memory - Size of memory that is used by buffers - Size of cached memory)/Total size of memory × 100%.

  • Metrics of average system loads

    • Windows

      The metric is unavailable for hosts that run Windows.

    • Linux

      You can check the output of the top command to view information about the metrics that are described in the following table. A higher value of a metric indicates more running processes.

    Metric

    Description

    Unit

    MetricName

    Dimensions

    Statistics

    (Agent)load.1m

    The average system load within the previous minute.

    None

    load_1m

    userId and instanceId

    Maximum, Minimum, and Average

    (Agent)load.5m

    The average system load within the previous 5 minutes.

    None

    load_5m

    userId and instanceId

    Maximum, Minimum, and Average

    (Agent)load.15m

    The average system load within the previous 15 minutes.

    None

    load_15m

    userId and instanceId

    Maximum, Minimum, and Average

    (Agent)load.1m.percore

    The average system load per CPU core within the previous minute.

    None

    load_per_core_1m

    userId and instanceId

    Maximum, Minimum, and Average

    (Agent)load.5m.percore

    The average system load per CPU core within the previous 5 minutes.

    None

    load_per_core_5m

    userId and instanceId

    Maximum, Minimum, and Average

    (Agent)load.15m.percore

    The average system load per CPU core within the previous 15 minutes.

    None

    load_per_core_15m

    userId and instanceId

    Maximum, Minimum, and Average

  • Disk metrics

    • Windows

      The system calls the GetDiskFreeSpaceExA function in Kernel32.dll to obtain the used disk space, disk usage, free disk space, and total disk space. The system calls the RegConnectRegistryA function to connect to the HKEY_PERFORMANCE_DATA entry in the registry. Then, the system calls the RegQueryValueExA function to query the disk information in HKEY_PERFORMANCE_DATA, including the read count, write count, read bytes, written bytes, read time, write time, and disk active time.

    • Linux

      You can check the output of the df command to view information about the metrics for disk and inode usage. You can check the output of the iostat command to view information about the metrics for disk reads and writes.

    Metric

    Description

    Unit

    MetricName

    Dimensions

    Statistics

    Host.diskusage.used

    The disk space in use.

    Byte

    diskusage_used

    userId, instanceId, and device

    Maximum, Minimum, and Average

    Host.diskusage.utilization

    The disk usage.

    %

    diskusage_utilization

    userId, instanceId, and device

    Maximum, Minimum, and Average

    Host.diskusage.free

    The size of available disk space for regular users and superusers.

    Byte

    diskusage_free

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)disk.usage.avail_device

    The size of available disk space for regular users.

    Byte

    diskusage_avail

    userId, instanceId, and device

    Maximum, Minimum, and Average

    Host.diskusage.total

    The size of the total disk space.

    Byte

    diskusage_total

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)disk.read.bps_device

    The number of bytes that are read from the disk per second.

    Byte/s

    disk_readbytes

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)disk.write.bps_device

    The number of bytes that are written to the disk per second.

    Byte/s

    disk_writebytes

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)disk.read.iops_device

    The number of read requests that the disk receives per second.

    Requests/s

    disk_readiops

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)disk.write.iops_device

    The number of write requests that the disk receives per second.

    Requests/s

    disk_writeiops

    userId, instanceId, and device

    Maximum, Minimum, and Average

  • File system metric

    • Windows

      The metric is unavailable for hosts that run Windows.

    • Linux

      You can check the output of the df command to view information about the metric described in the following table.

    Metric

    Description

    Unit

    MetricName

    Dimensions

    Statistics

    Remarks for Linux

    (Agent)fs.inode.utilization_device

    The inode usage.

    %

    fs_inodeutilization

    userId, instanceId, and device

    Maximum, Minimum, and Average

    Linux operating systems use inode numbers rather than file names to identify files. If inodes are used up, you cannot create files even if the disk space is sufficient. Therefore, the system must monitor the inode usage. The number of inodes indicates the number of files. A large number of small files can cause a high inode usage.

  • Network metrics

    • Windows

      The system calls the GetAdaptersAddresses function in Iphlpapi.dll to obtain the addresses of NICs on the host. The system calls the GetIfTable function to obtain the data of metrics for each interface, for example, the number of bits that an interface receives and sends per second, number of packets that an interface receives and sends per second, and number of error packets that an interface receives and sends.

    • Linux

      • You can check the output of the ss command to view information about the TCP connection metric.

        Note

        TCP connections represent the connections that are established between ECS instances and clients over TCP.

        By default, the CloudMonitor agent collects the following data about TCP connections in different states: TCP_TOTAL, ESTABLISHED, and NOT_ESTABLISHED. TCP_TOTAL indicates the total number of connections. ESTABLISHED indicates the number of established connections. NOT_ESTABLISHED indicates the number of connections that are not in the ESTABLISHED state.

      • You can check the output of the iftop command to view information about the network metrics.

    Metric

    Description

    Unit

    MetricName

    Dimensions

    Statistics

    (Agent)network.in.rate_device

    The number of bits that the NIC receives per second. This is the downstream bandwidth of the NIC.

    bit/s

    networkin_rate

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)network.out.rate_device

    The number of bits that the NIC sends per second. This is the upstream bandwidth of the NIC.

    bit/s

    networkout_rate

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)network.in.packages_device

    The number of packets that the NIC receives per second.

    Count/s

    networkin_packages

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)network.out.packages_device

    The number of packets that the NIC sends per second.

    Count/s

    networkout_packages

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)network.in.errorpackages_device

    The number of inbound error packets that the drive detects.

    Count/s

    networkin_errorpackages

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)network.out.errorpackages_device

    The number of outbound error packets that the drive detects.

    Count/s

    networkout_errorpackages

    userId, instanceId, and device

    Maximum, Minimum, and Average

    (Agent)network.tcp.connection_state

    The number of TCP connections in each state. These connection states include LISTEN, SYN_SENT, ESTABLISHED, SYN_RECV, FIN_WAIT1, CLOSE_WAIT, FIN_WAIT2, LAST_ACK, TIME_WAIT, CLOSING, and CLOSED.

    Count

    net_tcpconnection

    userId, instanceId, and state

    Maximum, Minimum, and Average

  • Process metrics

    • Windows

      • Query

        The system calls the OpenProcess function in Kernel32.dll to obtain the handle of a process. The system calls the GetProcessTimes function twice to obtain the CPU time consumed by the process and then calculate the CPU utilization of the process in the interval between the two executions of the command. The system calls the RegConnectRegistryA function to connect to the HKEY_PERFORMANCE_DATA entry in the registry. Then, the system calls the RegQueryValueExA function to query the process information in HKEY_PERFORMANCE_DATA, including the process ID, parent process ID, priority, virtual memory, resident memory, shared memory, number of files that the process opens, thread count, page errors, read bytes, and written bytes.

      • Count the number of processes that match the specified keyword

        • The system calls the OpenProcess function to obtain the handle of a process. The system calls the NtQueryInformationProcess function in ntdll.dll to obtain RTL_USER_PROCESS_PARAMETERS of the process. The system calls the ReadProcessMemory function to obtain the arguments and root path of the process from the command line information. This way, the system can obtain the directory of the process.

        • The system calls the OpenProcessToken function to obtain the handle of a token. The system calls the GetTokenInformation function to obtain the token information. The system calls the LookupAccountSid function to obtain the username and user group of the process.

        • The system matches the directory, username, and user group of the process with the keyword. If the process information matches the keyword, the system increases the value of Host.process.number by 1.

    • Linux

      • You can check the output of the top command to view information about the CPU utilization and memory usage of processes. The CPU utilization indicates the consumption of multi-core CPUs.

      • You can check the output of the lsof command to view information about the Host.process.openfile metric.

      • You can check the output of the ps aux | grep '<Keyword>' command to view information about the Host.process.number metric.

    Metric

    Description

    Unit

    MetricName

    Dimensions

    Statistics

    Remarks

    (Agent)process.cpu_pid

    The CPU utilization of a process.

    %

    process.cpu

    userId, instanceId, name, and pid

    Average

    You cannot configure alert rules for this metric.

    (Agent)process.memory_pid

    The memory usage of a process.

    %

    process.memory

    userId, instanceId, name, and pid

    Average

    You cannot configure alert rules for this metric.

    (Agent)process.openfile_pid

    The number of files that are opened by a process.

    Count

    process.openfile

    userId, instanceId, name, and pid

    Average

    You cannot configure alert rules for this metric.

    (Agent)process.count_processname

    The number of processes that match the specified keyword.

    Count

    process.number

    userId, instanceId, and processName

    Average

    You cannot configure alert rules for this metric.

Basic metrics

If your hosts are ECS instances, you can collect the following metrics without the need to install the CloudMonitor agent. CloudMonitor collects data of these metrics every minute.

Metric

Description

Unit

MetricName

Dimensions

Statistics

(ECS)CPUUtilization

The CPU utilization.

%

CPUUtilization

userId and instanceId

Maximum, Minimum, and Average

(ECS)InternetInRate(Classic Network)

The average rate of inbound Internet traffic.

bit/s

InternetInRate

userId and instanceId

Maximum, Minimum, and Average

(ECS)IntranetInRate

The average rate of inbound internal network traffic.

bit/s

IntranetInRate

userId and instanceId

Maximum, Minimum, and Average

(ECS)InternetOutRate(Classic Network)

The average rate of outbound Internet traffic.

bit/s

InternetOutRate

userId and instanceId

Maximum, Minimum, and Average

(ECS)IntranetOutRate

The average rate of outbound traffic over the internal network.

bit/s

IntranetOutRate

userId and instanceId

Maximum, Minimum, and Average

(ECS)DiskReadBPS

The number of bytes that are read from the system disk per second.

Byte/s

DiskReadBPS

userId and instanceId

Maximum, Minimum, and Average

(ECS)DiskWriteBPS

The number of bytes that are written to the system disk per second.

Byte/s

DiskWriteBPS

userId and instanceId

Maximum, Minimum, and Average

(ECS)DiskReadIOPS

The number of read operations that are performed on the system disks per second.

Requests/s

DiskReadIOPS

userId and instanceId

Maximum, Minimum, and Average

(ECS)DiskWriteIOPS

The number of write operations that are performed on the system disks per second.

Requests/s

DiskWriteIOPS

userId and instanceId

Average, Minimum, and Maximum

(ECS)InternetInRate_IP

The inbound bandwidth from the Internet.

bit/s

VPC_PublicIP_InternetInRate

userId, instanceId, and ip

Maximum, Minimum, and Average

(ECS)InternetOutRate_IP

The outbound bandwidth to the Internet.

bit/s

VPC_PublicIP_InternetOutRate

userId, instanceId, and ip

Maximum, Minimum, and Average

(ECS)InternetOutRatePercent_IP

The utilization of the outbound bandwidth to the Internet.

%

VPC_PublicIP_InternetOutRate_Percent

userId, instanceId, and ip

Average

(ECS)InternetIn(Classic Network)

The inbound traffic over the Internet.

Byte

InternetIn

userId and instanceId

Average, Minimum, Maximum, and Sum

(ECS)InternetOut(Classic Network)

The outbound traffic over the Internet.

Byte

InternetOut

userId and instanceId

Maximum, Minimum, and Average

(ECS)IntranetInRate

The inbound traffic over the internal network.

Byte

IntranetInRate

userId and instanceId

Maximum, Minimum, and Average