Host metrics include operating system metrics and Elastic Compute Service (ECS) basic metrics. Operating system metrics are collected every 15 seconds. ECS basic metrics are collected every minute.

Note The data of operating system metrics that are collected by the CloudMonitor agent may be inconsistent with the data of ECS basic metrics due to the following causes:
  • Different monitoring frequencies: The monitoring data that is displayed in a monitoring chart is the average value of the metric data that is collected within a statistical period. The statistical period is 15 seconds for operating system metrics and 1 minute for ECS basic metrics. If metric data significantly fluctuates, the values of ECS basic metrics become smaller than the values of operating system metrics.
  • Different monitored objects: The network traffic data that is collected based on ECS basic metrics is used for billing. The data does not include the free traffic between ECS instances and Server Load Balancer (SLB) instances. The network traffic data that is collected by the CloudMonitor agent is the actual network traffic of each network interface card (NIC). Therefore, the size of the network traffic data that is collected by the CloudMonitor agent is greater than the size of the network traffic data that is collected based on ECS basic metrics. In this case, the size of the data that the CloudMonitor agent collects exceeds the bandwidth or traffic quota that you purchase.

Operating system metrics

  • CPU metrics
    • Linux

      You can check the output of the top command to view information about the metrics that are described in the following table.

    • Windows

      The system calls the NtQuerySystemInformation function in ntdll.dll to obtain the CPU time that is consumed by each process or thread. The system calls this function twice to obtain the CPU utilization of each process or thread that runs in the period between two calls.

    Metric Description Unit MetricName Dimensions Statistics Remarks for Linux
    (Agent)cpu.idle The CPU idle time in percentage. % cpu_idle userId and instanceId Maximum, Minimum, and Average The percentage of the CPU idle time in the total CPU time.
    (Agent)cpu.system The CPU utilization of the kernel. % cpu_system userId and instanceId Maximum, Minimum, and Average The CPU utilization for a system context switch. A high value indicates that excessive processes or threads run on the host.
    (Agent)cpu.user The CPU utilization of user processes. % cpu_user userId and instanceId Maximum, Minimum, and Average The CPU utilization of user processes.
    (Agent)cpu.wait The percentage of the CPU that waits for I/O operations to complete. % cpu_wait userId and instanceId Maximum, Minimum, and Average A high value indicates frequent I/O operations.
    (Agent)cpu.other The percentage of the CPU that is occupied by other operations. % cpu_other userId and instanceId Maximum, Minimum, and Average Calculation method: CPU utilization of low-priority processes + CPU utilization of SoftIrq + CPU utilization of Irq + CPU utilization of Stolen.
    (Agent)cpu.total The percentage of the CPU that is occupied. % cpu_total userId and instanceId Maximum, Minimum, and Average Calculation method: CPU utilization of user processes + CPU utilization of the kernel + CPU utilization of low-priority processes + CPU utilization of CPU wait.
  • Memory metrics
    • Linux

      You can check the output of the free command to view information about the metrics described in the following table. The free command obtains memory information from the /proc/meminfo file.

    • Windows

      The system calls the GlobalMemoryStatusEx function in kernel32.dll to obtain the current physical and virtual memory usage in a 32-bit Windows operating system.

    Metric Description Unit MetricName Dimensions Statistics Remarks for Linux
    (Agent)memory.total.space The total size of memory. Byte memory_totalspace userId and instanceId Maximum, Minimum, and Average

    The total size of memory on the host.

    Data source: the value of the MemTotal parameter in the /proc/meminfo file.

    (Agent)memory.free.space The size of available memory. Byte memory_freespace userId and instanceId Maximum, Minimum, and Average

    The size of available memory in the system.

    Data source: the value of the MemFree parameter in the /proc/meminfo file.

    (Agent)memory.used.space The size of used memory. Byte memory_usedspace userId and instanceId Maximum, Minimum, and Average

    The size of used memory in the system.

    Calculation method: Total size of memory - Size of available memory.

    (Agent)memory.actualused.space The size of memory that is consumed by users. Byte memory_actualusedspace userId and instanceId Maximum, Minimum, and Average Calculation method:
    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: Total size of memory - Value of MemAvailable.
    • If the MemAvailable parameter does not exist in the /proc/meminfo file, the following formula is used for calculation: Size of used memory - Size of memory that is used by buffers - Size of cached memory.
    Note The calculation result is more accurate in CentOS 7.2, Ubuntu 16.04, or their later versions that use the latest Linux kernel. For more information about the MemAvailable parameter, see commit.
    (Agent)memory.free.utilization The percentage of available memory. % memory_freeutilization userId and instanceId Maximum, Minimum, and Average Calculation method:
    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: Value of MemAvailable/Total size of memory × 100%.
    • If MemAvailable does not exist in the /proc/meminfo file, the following formula is used for calculation: (Total size of memory - Value of actualused)/Total size of memory × 100%.
    (Agent)memory.used.utilization The memory usage. % memory_usedutilization userId and instanceId Maximum, Minimum, and Average Calculation method:
    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: (Total size of memory - Value of MemAvailable)/Total size of memory × 100%.
    • If the MemAvailable does not exist in the /proc/meminfo file, the following formula is used for calculation: (Total size of memory - Total size of available memory - Size of memory that is used by buffers - Size of cached memory)/Total size of memory × 100%.
  • Metrics of average system loads
    • Linux

      You can check the output of the top command to view information about the metrics that are described in the following table. A higher value of a metric indicates more running processes.

    • Windows

      The metric is unavailable for hosts that run Windows.

    Metric Description Unit MetricName Dimensions Statistics
    (Agent)load.1m The average system load within the previous minute. None load_1m userId and instanceId Maximum, Minimum, and Average
    (Agent)load.5m The average system load within the previous 5 minutes. None load_5m userId and instanceId Maximum, Minimum, and Average
    (Agent)load.15m The average system load within the previous 15 minutes. None load_15m userId and instanceId Maximum, Minimum, and Average
    (Agent)load.1m.percore The average system load per CPU core within the previous minute. None load_per_core_1m userId and instanceId Maximum, Minimum, and Average
    (Agent)load.5m.percore The average system load per CPU core within the previous 5 minutes. None load_per_core_5m userId and instanceId Maximum, Minimum, and Average
    (Agent)load.15m.percore The average system load per CPU core within the previous 15 minutes. None load_per_core_15m userId and instanceId Maximum, Minimum, and Average
  • Disk metrics
    • Linux

      You can check the output of the df command to view information about the metrics for disk and inode usage. You can check the output of the iostat command to view information about the metrics for disk reads and writes.

    • Windows

      The system calls the GetDiskFreeSpaceExA function in Kernel32.dll to obtain the used disk space, disk usage, free disk space, and total disk space. The system calls the RegConnectRegistryA function to connect to the HKEY_PERFORMANCE_DATA entry in the registry. Then, the system calls the RegQueryValueExA function to query the disk information in HKEY_PERFORMANCE_DATA, including the read count, write count, read bytes, written bytes, read time, write time, and disk active time.

    Metric Description Unit MetricName Dimensions Statistics
    Host.diskusage.used The disk space in use. Byte diskusage_used userId, instanceId, and device Maximum, Minimum, and Average
    Host.diskusage.utilization The disk usage. % diskusage_utilization userId, instanceId, and device Maximum, Minimum, and Average
    Host.diskusage.free The size of available disk space for regular users and superusers. Byte diskusage_free userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.usage.avail_device The size of available disk space for regular users. Byte diskusage_avail userId, instanceId, and device Maximum, Minimum, and Average
    Host.diskusage.total The size of the total disk space. Byte diskusage_total userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.read.bps_device The number of bytes that are read from the disk per second. Byte/s disk_readbytes userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.write.bps_device The number of bytes that are written to the disk per second. Byte/s disk_writebytes userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.read.iops_device The number of read requests that the disk receives per second. Requests/s disk_readiops userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.write.iops_device The number of write requests that the disk receives per second. Requests/s disk_writeiops userId, instanceId, and device Maximum, Minimum, and Average
  • File system metric
    • Linux

      You can check the output of the df command to view information about the metric described in the following table.

    • Windows

      The metric is unavailable for hosts that run Windows.

    Metric Description Unit MetricName Dimensions Statistics Remarks for Linux
    (Agent)fs.inode.utilization_device The inode usage. % fs_inodeutilization userId, instanceId, and device Maximum, Minimum, and Average Linux operating systems use inode numbers rather than file names to identify files. If inodes are used up, you cannot create files even if the disk space is sufficient. Therefore, the system must monitor the inode usage. The number of inodes indicates the number of files. A large number of small files can cause a high inode usage.
  • Network metrics
    • Linux
      • You can check the output of the ss command to view information about the TCP connection metric.
        Note TCP connections represent the connections that are established between ECS instances and clients over TCP.

        By default, the CloudMonitor agent collects the following data about TCP connections in different states: TCP_TOTAL, ESTABLISHED, and NON_ESTABLISHED. TCP_TOTAL indicates the total number of connections. ESTABLISHED indicates the number of established connections. NON_ESTABLISHED indicates the number of connections that are not in the ESTABLISHED state.

      • You can check the output of the iftop command to view information about the network metrics.
    • Windows

      The system calls the GetAdaptersAddresses function in Iphlpapi.dll to obtain the addresses of NICs on the host. The system calls the GetIfTable function to obtain the data of metrics for each interface, for example, the number of bits that an interface receives and sends per second, number of packets that an interface receives and sends per second, and number of error packets that an interface receives and sends.

    Metric Description Unit MetricName Dimensions Statistics
    (Agent)network.in.rate_device The number of bits that the NIC receives per second. This is the downstream bandwidth of the NIC. bit/s networkin_rate userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.out.rate_device The number of bits that the NIC sends per second. This is the upstream bandwidth of the NIC. bit/s networkout_rate userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.in.packages_device The number of packets that the NIC receives per second. Count/s networkin_packages userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.out.packages_device The number of packets that the NIC sends per second. Count/s networkout_packages userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.in.errorpackages_device The number of inbound error packets that the drive detects. Count/s networkin_errorpackages userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.out.errorpackages_device The number of outbound error packets that the drive detects. Count/s networkout_errorpackages userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.tcp.connection_state The number of TCP connections in each state. These connection states include LISTEN, SYN_SENT, ESTABLISHED, SYN_RECV, FIN_WAIT1, CLOSE_WAIT, FIN_WAIT2, LAST_ACK, TIME_WAIT, CLOSING, and CLOSED. Count net_tcpconnection userId, instanceId, and state Maximum, Minimum, and Average
  • Process metrics
    • Linux
      • You can check the output of the top command to view information about the CPU utilization and memory usage of processes. The CPU utilization indicates the consumption of multi-core CPUs.
      • You can check the output of the lsof command to view information about the Host.process.openfile metric.
      • You can check the output of the ps aux grep '<Keyword>' command to view information about the Host.process.number metric.
    • Windows
      • Query

        The system calls the OpenProcess function in Kernel32.dll to obtain the handle of a process. The system calls the GetProcessTimes function twice to obtain the CPU time consumed by the process and then calculate the CPU utilization of the process in the interval between the two executions of the command. Call the RegConnectRegistryA function to connect to the HKEY_PERFORMANCE_DATA entry in the registry. Then, the system calls the RegQueryValueExA function to query the process information in HKEY_PERFORMANCE_DATA, including the process ID, parent process ID, priority, virtual memory, resident memory, shared memory, number of files that the process opens, thread count, page errors, read bytes, and written bytes.

      • Count the number of processes that match the specified keyword
        • The system calls the OpenProcess function to obtain the handle of a process. Call the NtQueryInformationProcess function in ntdll.dll to obtain RTL_USER_PROCESS_PARAMETERS of the process. The system calls the ReadProcessMemory function to obtain the arguments and root path of the process from the command line information. This way, the system can obtain the directory of the process.
        • The system calls the OpenProcessToken function to obtain the handle of a token. The system calls the GetTokenInformation function to obtain the token information. The system calls the LookupAccountSid function to obtain the username and user group of the process.
        • The system matches the directory, username, and user group of the process with the keyword. If the process information matches the keyword, the system increases the value of Host.process.number by 1.
    Metric Description Unit MetricName Dimensions Statistics Remarks
    (Agent)process.cpu_pid The CPU utilization of a process. % process.cpu userId, instanceId, name, and pid Average You cannot configure alert rules for this metric.
    (Agent)process.memory_pid The memory usage of a process. % process.memory userId, instanceId, name, and pid Average You cannot configure alert rules for this metric.
    (Agent)process.openfile_pid The number of files that are opened by a process. Count process.openfile userId, instanceId, name, and pid Average You cannot configure alert rules for this metric.
    (Agent)process.count_processname The number of processes that match the specified keyword. Count process.number userId, instanceId, and processName Average You cannot configure alert rules for this metric.

Basic metrics

If your hosts are ECS instances, you can collect the following metrics without the need to install the CloudMonitor agent. CloudMonitor collects data of these metrics every minute.
Metric Description Unit MetricName Dimensions Statistics
(ECS)CPUUtilization The CPU utilization. % CPUUtilization userId and instanceId Maximum, Minimum, and Average
(ECS)InternetInRate(Classic Network) The average rate of inbound Internet traffic. bit/s InternetInRate userId and instanceId Maximum, Minimum, and Average
(ECS)IntranetInRate The average rate of inbound internal network traffic. bit/s IntranetInRate userId and instanceId Maximum, Minimum, and Average
(ECS)InternetOutRate(Classic Network) The average rate of outbound Internet traffic. bit/s InternetOutRate userId and instanceId Maximum, Minimum, and Average
(ECS)IntranetOutRate The average rate of outbound traffic over the internal network. bit/s IntranetOutRate userId and instanceId Maximum, Minimum, and Average
(ECS)DiskReadBPS The number of bytes that are read from the system disk per second. Byte/s DiskReadBPS userId and instanceId Maximum, Minimum, and Average
(ECS)DiskWriteBPS The number of bytes that are written to the system disk per second. Byte/s DiskWriteBPS userId and instanceId Maximum, Minimum, and Average
(ECS)DiskReadIOPS The number of read operations that are performed on the system disks per second. Count/s DiskReadIOPS userId and instanceId Maximum, Minimum, and Average
(ECS)DiskWriteIOPS The number of write operations that are performed on the system disks per second. Count/s DiskWriteIOPS userId and instanceId Average, Minimum, and Maximum
(ECS)InternetInRate_IP The inbound bandwidth from the Internet. bit/s VPC_PublicIP_InternetInRate userId, instanceId, and ip Maximum, Minimum, and Average
(ECS)InternetOutRate_IP The outbound bandwidth to the Internet. bit/s VPC_PublicIP_InternetOutRate userId, instanceId, and ip Maximum, Minimum, and Average
(ECS)InternetOutRatePercent_IP The utilization of the outbound bandwidth to the Internet. bit/s VPC_PublicIP_InternetOutRate_Percent userId, instanceId, and ip Average
(ECS)InternetIn(Classic Network) The inbound traffic over the Internet. Byte InternetIn userId and instanceId Average, Minimum, Maximum, and Sum
(ECS)InternetOut(Classic Network) The outbound traffic over the Internet. Byte InternetOut userId and instanceId Maximum, Minimum, and Average
(ECS)IntranetInRate The inbound traffic over the internal network. Byte IntranetInRate userId and instanceId Maximum, Minimum, and Average