Metrics for hosts include operating system metrics and Elastic Compute Service (ECS) basic metrics. Operating system metrics are collected every 15 seconds. ECS basic metrics are collected every minute.

Note The data of operating system metrics that are collected by the CloudMonitor agent may be inconsistent with the data of ECS basic metrics due to the following causes:
  • Different monitoring frequencies: The monitoring data that is displayed in a monitoring chart is the average value of the metric data that is collected in a statistical period. The statistical period is 15 seconds for operating system metrics and 1 minute for ECS basic metrics. If metric data sharply fluctuates, the values of ECS basic metrics are smaller than those of operating system metrics.
  • Different monitored objects: The network traffic data that is collected based on ECS basic metrics is used for billing. The data does not include the free traffic between ECS instances and Server Load Balancer (SLB) instances. The network traffic data that is collected by the CloudMonitor agent is the actual network traffic of each network interface card (NIC). Therefore, the network traffic data that is collected by the CloudMonitor agent is greater than that the network traffic data that is collected based on ECS basic metrics. In this case, the data that the CloudMonitor agent collects is greater than the purchased bandwidth or traffic quota.

Operating system metrics

  • CPU metrics
    • Linux

      You can check the output of the top command to understand the metrics that are described in the following table.

    • Windows

      Call the NtQuerySystemInformation function in ntdll.dll to obtain the CPU time consumed by each process or thread. Call this function twice to obtain the CPU utilization of each process or thread that runs in the period between the previous time and the current time.

    Metric Description Unit MetricName Dimensions Statistics Remarks for Linux
    (Agent)cpu.idle The CPU idle time in percentage. % cpu_idle userId and instanceId Maximum, Minimum, and Average The percentage of the CPU idle time on the total CPU time.
    (Agent)cpu.system The CPU utilization of the kernel. % cpu_system userId and instanceId Maximum, Minimum, and Average The CPU utilization for a system context switch. A high value indicates that excessive processes or threads run on the host.
    (Agent)cpu.user The CPU utilization of user processes. % cpu_user userId and instanceId Maximum, Minimum, and Average The CPU utilization of user processes.
    (Agent)cpu.wait The percentage of the CPU that waits for I/O operations to complete. % cpu_wait userId and instanceId Maximum, Minimum, and Average A high value indicates frequent I/O operations.
    (Agent)cpu.other The percentage of the CPU that is occupied by other operations. % cpu_other userId and instanceId Maximum, Minimum, and Average Calculation method: CPU utilization of Nice + CPU utilization of SoftIrq + CPU utilization of Irq + CPU utilization of Stolen.
    (Agent)cpu.total The percentage of the CPU that is occupied. % cpu_total userId and instanceId Maximum, Minimum, and Average Current Consumption=1-Host.cpu.idle
  • Memory metrics
    • Linux

      You can check the output of the free command to understand the metrics described in the following table. The free command obtains memory information from the /proc/meminfo file.

    • Windows

      Call the GlobalMemoryStatusEx function in kernel32.dll to obtain the current physical and virtual memory usage in a 32-bit Windows operating system.

    Metric Description Unit MetricName Dimensions Statistics Remarks for Linux
    (Agent)memory.total.space The total amount of memory. byte memory_totalspace userId and instanceId Maximum, Minimum, and Average

    The total memory amount of the host.

    Data source: the value of MemTotal in the /proc/meminfo file.

    (Agent)memory.free.space The amount of available memory. byte memory_freespace userId and instanceId Maximum, Minimum, and Average

    The amount of available memory in the system.

    Data source: the value of the MemFree parameter in the /proc/meminfo file.

    (Agent)memory.used.space The amount of memory in use. byte memory_usedspace userId and instanceId Maximum, Minimum, and Average

    The amount of used memory in the system.

    Calculation method: Total amount of memory - Amount of available memory.

    (Agent)memory.actualused.space The amount of memory that is used by users. byte memory_actualusedspace userId and instanceId Maximum, Minimum, and Average Calculation method:
    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: Total amount of memory - Value of MemAvailable.
    • If the MemAvailable parameter does not exist in the /proc/meminfo file, the following formula is used for calculation: Amount of memory that is used - Amount of memory that is used by buffers - Amount of cached memory.
    Note The calculation result is more accurate in CentOS 7.2, Ubuntu 16.04, or their later versions that use the new Linux kernel. For more information about MemAvailable, see commit.
    (Agent)memory.free.utilization The percentage of available memory. % memory_freeutilization userId and instanceId Maximum, Minimum, and Average Calculation method:
    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: Value of MemAvailable/Total amount of memory × 100%.
    • If MemAvailable does not exist in the /proc/meminfo file, the following formula is used for calculation: (Total amount of memory - Value of actualused)/Total amount of memory × 100%.
    (Agent)memory.used.utilization The memory usage. % memory_usedutilization userId and instanceId Maximum, Minimum, and Average Calculation method:
    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: (Total amount of memory - Value of MemAvailable)/Total amount of memory × 100%.
    • If the MemAvailable does not exist in the /proc/meminfo file, the following formula is used for calculation: (Total amount of memory - Total amount of available memory - Amount of memory that is used by buffers - Amount of cached memory)/Total amount of memory × 100%.
  • Metrics of average system loads
    • Linux

      You can check the output of the top command to understand the metrics described in the following table. A higher value of a metric indicates a busier system.

    • Windows

      The metrics of average system loads are unavailable for hosts that run Windows.

    Metric Description Unit MetricName Dimensions Statistics
    (Agent)load.1m The average system load in the last minute. N/A load_1m userId and instanceId Maximum, Minimum, and Average
    (Agent)load.5m The average system load in the last 5 minutes. N/A load_5m userId and instanceId Maximum, Minimum, and Average
    (Agent)load.15m The average system load in the last 15 minutes. N/A load_15m userId and instanceId Maximum, Minimum, and Average
    (Agent)load.1m.percore The average system load per CPU core in the last minute. N/A load_per_core_1m userId and instanceId Maximum, Minimum, and Average
    (Agent)load.5m.percore The average system load per CPU core in the last 5 minutes. N/A load_per_core_5m userId and instanceId Maximum, Minimum, and Average
    (Agent)load.15m.percore The average system load per CPU core in the last 15 minutes. N/A load_per_core_15m userId and instanceId Maximum, Minimum, and Average
  • Disk metrics
    • Linux

      You can check the output of the df command to understand the metrics about disk and inode usage. You can check the output of the iostat command to understand metrics about disk reads and writes.

    • Windows

      Call the GetDiskFreeSpaceExA function in Kernel32.dll to obtain the used disk space, disk usage, free disk space, and total disk space. Call the RegConnectRegistryA function to connect to the HKEY_PERFORMANCE_DATA entry in the registry. Then, call the RegQueryValueExA function to query the disk information in HKEY_PERFORMANCE_DATA, including the read count, write count, read bytes, written bytes, read time, write time, and disk active time.

    Metric Description Unit MetricName Dimensions Statistics
    (Agent)disk.usage.used_device The disk space in use. byte diskusage_used userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.usage.utilization_device The disk usage. % diskusage_utilization userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.usage.free_device The available disk space for regular users and superusers. byte diskusage_free userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.usage.avail_device The amount of available disk space for regular users. byte diskusage_avail userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.usage.total_device The total disk space. byte diskusage_total userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.read.bps_device The number of bytes that are read from the disk per second. byte/s disk_readbytes userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.write.bps_device The number of bytes that are written to the disk per second. byte/s disk_writebytes userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.read.iops_device The number of read requests that the disk receives per second. request/s disk_readiops userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)disk.write.iops_device The number of write requests that the disk receives per second. request/s disk_writeiops userId, instanceId, and device Maximum, Minimum, and Average
  • File system metric
    • Linux

      You can check the output of the df command to understand the metric described in the following table.

    • Windows

      The file system metric is unavailable for hosts that run Windows.

    Metric Description Unit MetricName Dimensions Statistics Remarks for Linux
    (Agent)fs.inode.utilization_device The inode usage. % fs_inodeutilization userId, instanceId, and device Maximum, Minimum, and Average Linux operating systems use inode numbers rather than file names to identify files. When you have used up inode numbers, you cannot create new files even if the disk space is available. Therefore, the system must monitor the inode usage. The number of inodes indicates the number of files. A large number of small files can cause a high inode usage.
  • Network metrics
    • Linux
      • You can check the output of the ss command to understand the TCP connection metrics.
        Note TCP connections represent connections that are established to ECS instances over TCP.

        By default, the CloudMonitor agent collects the following data about TCP connections in different states: TCP_TOTAL, ESTABLISHED, and NON_ESTABLISHED. TCP_TOTAL indicates the total number of connections. ESTABLISHED indicates the number of established connections. NON_ESTABLISHED indicates the number of connections that are not in the established state.

      • You can check the output of the iftop command to understand the network metrics.
    • Windows

      Call the GetAdaptersAddresses function in Iphlpapi.dll to obtain the addresses of NICs on the host. Call the GetIfTable function to obtain the data of metrics for each interface, including the number of bits that an interface receives and sends per second, number of packets that an interface receives and sends per second, and number of error packets that an interface receives and sends.

    Metric Description Unit MetricName Dimensions Statistics
    (Agent)network.in.rate_IP The number of bits that the NIC receives per second. This is the upstream bandwidth of the NIC. bit/s networkin_rate userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.out.rate_IP The number of bits that the NIC sends per second. This is the downstream bandwidth of the NIC. bit/s networkout_rate userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.in.packages_IP The number of packets that are received by the NIC per second. packet/s networkin_packages userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.out.packages_IP The number of packets that the NIC sends per second. packet/s networkout_packages userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.in.errorpackages_IP The number of inbound error packets that the drive detects. packet/s networkin_errorpackages userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.out.errorpackages_IP The number of outbound error packets that the drive detects. packet/s networkout_errorpackages userId, instanceId, and device Maximum, Minimum, and Average
    (Agent)network.tcp.connection_state The number of TCP connections in each state. These connection states include LISTEN, SYN_SENT, ESTABLISHED, SYN_RECV, FIN_WAIT1, CLOSE_WAIT, FIN_WAIT2, LAST_ACK, TIME_WAIT, CLOSING, and CLOSED. count net_tcpconnection userId, instanceId, and state Maximum, Minimum, and Average
  • Process metrics
    • Linux
      • You can check the output of the top command to understand the CPU utilization and memory usage of processes. The CPU utilization indicates the consumption of multi-core CPUs.
      • You can check the output of the lsof command to understand the Host.process.openfile metric.
      • You can check the output of the ps aux | grep '<Keyword>' command to understand the Host.process.number metric.
    • Windows
      • Query process information

        Call the OpenProcess function in Kernel32.dll to obtain the handle of a process. Call the GetProcessTimes function twice to obtain the CPU time consumed by the process and then calculate the CPU utilization of the process in the interval between the two executions of the command. Call the RegConnectRegistryA function to connect to the HKEY_PERFORMANCE_DATA entry in the registry. Then, call the RegQueryValueExA function to query the process information in HKEY_PERFORMANCE_DATA, including the process ID, parent process ID, priority, virtual memory, resident memory, shared memory, number of files that the process opens, thread count, page errors, read bytes, and written bytes.

      • Count the number of processes that match the specified keyword
        • Call the OpenProcess function to obtain the handle of a process. Call the NtQueryInformationProcess function in ntdll.dll to obtain RTL_USER_PROCESS_PARAMETERS of the process. Call the ReadProcessMemory function to obtain the arguments and root path of the process from the command line information. This way, you can obtain the directory of the process.
        • Call the OpenProcessToken function to obtain the handle of a token. Call the GetTokenInformation function to obtain the token information. Call the LookupAccountSid function to obtain the username and user group of the process.
        • Match the directory, username, and user group of the process with the keyword. If the process information matches the keyword, increase the value of Host.process.number by 1.
    Metric Description Unit MetricName Dimensions Statistics Remarks
    (Agent)process.cpu_pid The CPU utilization of a process. % process.cpu userId, instanceId, name, and pid Average You cannot configure alert rules for this metric.
    (Agent)process.memory_pid The memory usage of a process. % process.memory userId, instanceId, name, and pid Average You cannot configure alert rules for this metric.
    (Agent)process.openfile_pid The number of files that are opened by a process. file process.openfile userId, instanceId, name, and pid Average You cannot configure alert rules for this metric.
    (Agent)process.count_processname The number of processes that match the specified keyword. process process.number userId, instanceId, and processName Average You cannot configure alert rules for this metric.

Basic metrics

If your hosts are ECS instances, the CloudMonitor agent is not required to collect the following metrics. CloudMonitor collects data of these metrics every minute.
Metric Description Unit MetricName Dimensions Statistics
(ECS)CPUUtilization The CPU utilization. % CPUUtilization userId and instanceId Maximum, Minimum, and Average
(ECS)InternetInRate The average rate of inbound Internet traffic. bit/s InternetInRate userId and instanceId Maximum, Minimum, and Average
(ECS)IntranetInRate The average rate of inbound internal network traffic. bit/s IntranetInRate userId and instanceId Maximum, Minimum, and Average
(ECS)InternetOutRate The average rate of outbound Internet traffic. bit/s InternetOutRate userId and instanceId Maximum, Minimum, and Average
(ECS)IntranetOutRate The average rate of outbound traffic over the internal network. bit/s IntranetOutRate userId and instanceId Maximum, Minimum, and Average
(ECS)DiskReadBPS The number of bytes that are read from the system disk per second. byte/s DiskReadBPS userId and instanceId Maximum, Minimum, and Average
(ECS)DiskWriteBPS The number of bytes that are written to the system disk per second. byte/s DiskWriteBPS userId and instanceId Maximum, Minimum, and Average
(ECS)DiskReadIOPS The number of read operations performed on the system disks per second. count/s DiskReadIOPS userId and instanceId Maximum, Minimum, and Average
(ECS)DiskWriteIOPS The number of write operations performed on the system disks per second. count/s DiskWriteIOPS userId and instanceId Average, Minimum, and Maximum
(ECS)InternetInRate_IP The inbound bandwidth from the Internet. bit/s VPC_PublicIP_InternetInRate userId, instanceId, and ip Maximum, Minimum, and Average
(ECS)InternetOutRate_IP The outbound traffic to the Internet. bit/s VPC_PublicIP_InternetOutRate userId, instanceId, and ip Maximum, Minimum, and Average
(ECS)InternetOutRatePercent_IP The utilization of the outbound bandwidth to the Internet. bit/s VPC_PublicIP_InternetOutRate_Percent userId, instanceId, and ip Maximum, Minimum, and Average
(ECS)InternetIn The inbound traffic over the Internet. byte InternetIn userId and instanceId Average, Minimum, Maximum, Sum
(ECS)InternetOut The outbound traffic over the Internet. byte InternetOut userId and instanceId Maximum, Minimum, and Average
(ECS)IntranetInRate The inbound traffic over the internal network. byte IntranetInRate userId and instanceId Maximum, Minimum, and Average
(ECS)IntranetOutRate The outbound traffic over the internal network. byte IntranetOutRate userId and instanceId Average