Metrics for hosts include operating system metrics and Elastic Compute Service (ECS) basic metrics. Operating system metrics are collected every 15 seconds. ECS basic metrics are collected every minute.

Note The data of operating system metrics that are collected by the CloudMonitor agent may be different from the data of ECS basic metrics. The differences are attributed to the following causes:
  • Different data collection frequencies: The monitoring data that is displayed in a monitoring chart is the average value of the metric data that is collected in a statistical period. ECS basic metrics are collected every minute. Operating system metrics are collected every 15 seconds. If metric data significantly fluctuates, the values of basic metrics may be less than the values of operating system metrics.
  • Different monitored objects: The network traffic data that is collected based on ECS basic metrics is used for billing. The data does not include the free traffic between ECS instances and Server Load Balancer (SLB) instances. The network traffic data that is collected by the CloudMonitor agent is the actual network traffic of each network interface card (NIC). Therefore, the amount of network traffic that is collected by the CloudMonitor agent is greater than the amount of network traffic that is collected based on ECS basic metrics. In this case, the amount of data that is collected by the CloudMonitor agent is greater than the purchased bandwidth or traffic quota.

Operating system metrics

  • CPU metrics
    • Linux

      For more information about the metrics that are described in the following table, see the description of the top command.

    • Windows

      You can call the NtQuerySystemInformation function of the ntdll.dll file to query the CPU time of each process or thread. You call this function twice to obtain the CPU utilization of each process or thread that runs in the period between the previous time and the current time.

    Metric Description Unit MetricName Dimensions Statistics Remarks for Linux
    Host.cpu.idle The CPU idle time in percentage. % cpu_idle userId and instanceId Maximum, Minimum, and Average The percentage of the CPU idle time on the total CPU time.
    Host.cpu.system The CPU utilization of the kernel. % cpu_system userId and instanceId Maximum, Minimum, and Average The CPU utilization for a system context switch. A high value indicates that excessive processes or threads run on the host.
    Host.cpu.user The current CPU utilization of user processes. % cpu_user userId and instanceId Maximum, Minimum, and Average The CPU utilization of user processes.
    Host.cpu.iowait The percentage of the CPU that waits for I/O operations to complete. % cpu_wait userId and instanceId Maximum, Minimum, and Average A high value indicates frequent I/O operations.
    Host.cpu.other The percentage of the CPU utilization for other operations. % cpu_other userId and instanceId Maximum, Minimum, and Average Calculation method: CPU utilization of Nice + CPU utilization of SoftIrq + CPU utilization of Irq + CPU utilization of Stolen.
    Host.cpu.total The percentage of the current CPU utilization. % cpu_total userId and instanceId Maximum, Minimum, and Average The sum of the preceding metric values. This metric is commonly used for alerting.
  • Memory metrics
    • Linux

      For more information about the metrics that are described in the following table, see the description of the free command. The free command queries the required information from the /proc/meminfo file.

    • Windows

      You can call the GlobalMemoryStatusEx function of kernel32.dll file to query the current physical and virtual memory usage in a 32-bit Windows operating system.

    Metric Description Unit MetricName Dimensions Statistics Remarks for Linux
    Host.mem.total The total amount of memory. Byte memory_totalspace userId and instanceId Maximum, Minimum, and Average

    The total amount of memory of the host.

    Data source: the value of the MemTotal parameter in the /proc/meminfo file.

    Host.mem.free The amount of available memory. Byte memory_freespace userId and instanceId Maximum, Minimum, and Average

    The amount of available memory in the system.

    Data source: the value of the MemFree parameter in the /proc/meminfo file.

    Host.mem.used The amount of memory that is used. Byte memory_usedspace userId and instanceId Maximum, Minimum, and Average

    The amount of memory that is used by the system.

    Calculation method: Total amount of memory - Amount of available memory.

    Host.mem.actualused The amount of memory that is used by users. Byte memory_actualusedspace userId and instanceId Maximum, Minimum, and Average Calculation method:
    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: Total amount of memory - Value of MemAvailable.
    • If the MemAvailable parameter does not exist in the /proc/meminfo file, the following formula is used for calculation: Amount of memory that is used - Amount of memory that is used by buffers - Amount of cached memory.
    Note The calculation result is more accurate in CentOS 7.2 and Ubuntu 16.04 or later systems that use the latest Linux kernel. For more information about the MemAvailable parameter, see commit.
    Host.mem.freeutilization The percentage of available memory. % memory_freeutilization userId and instanceId Maximum, Minimum, and Average Calculation method:
    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: Value of MemAvailable/Total amount of memory × 100%.
    • If MemAvailable does not exist in the /proc/meminfo file, the following formula is used for calculation: (Total amount of memory - Value of actualused)/Total amount of memory × 100%.
    Host.mem.usedutilization The memory usage. % memory_usedutilization userId and instanceId Maximum, Minimum, and Average Calculation method:
    • If the MemAvailable parameter exists in the /proc/meminfo file, the following formula is used for calculation: (Total amount of memory - Value of MemAvailable)/Total amount of memory × 100%.
    • If the MemAvailable does not exist in the /proc/meminfo file, the following formula is used for calculation: (Total amount of memory - Total amount of available memory - Amount of memory that is used by buffers - Amount of cached memory)/Total amount of memory × 100%.
  • Metrics of average system loads
    • Linux

      For more information about the metrics that are described in the following table, see the description of the top command. A higher value indicates a busier system.

    • Windows

      The metrics of average system loads are unavailable for Windows hosts.

    Metric Description Unit MetricName Dimensions Statistics
    Host.load1 The average system load in the last minute. N/A load_1m userId and instanceId Maximum, Minimum, and Average
    Host.load5 The average system load in the last 5 minutes. N/A load_5m userId and instanceId Maximum, Minimum, and Average
    Host.load15 The average system load in the last 15 minutes. N/A load_15m userId and instanceId Maximum, Minimum, and Average
    host.loadpercore1 The average system load per CPU core in the last minute. N/A load_per_core_1m userId and instanceId Maximum, Minimum, and Average
    host.loadpercore5 The average system load per CPU core in the last 5 minutes. N/A load_per_core_5m userId and instanceId Maximum, Minimum, and Average
    host.loadpercore15 The average system load per CPU core in the last 15 minutes. N/A load_per_core_15m userId and instanceId Maximum, Minimum, and Average
  • Disk metrics
    • Linux

      For more information about the metrics for disk and inode usage, see the description of the df command. For more information about the metrics for disk reads and writes, see the description of the iostat command.

    • Windows

      You can call the GetDiskFreeSpaceExA function of the Kernel32.dll file to query the used disk space, disk usage, free disk space, and total disk space. You can call the RegConnectRegistryA function to connect to the HKEY_PERFORMANCE_DATA key of the registry. Then, you can call the RegQueryValueExA function to query the disk information in the HKEY_PERFORMANCE_DATA key. The disk information includes the read count, write count, read bytes, written bytes, read time, write time, and disk active time.

    Metric Description Unit MetricName Dimensions Statistics
    Host.diskusage.used The amount of disk space that is used. Byte diskusage_used userId, instanceId, and device Maximum, Minimum, and Average
    Host.disk.utilization The disk usage. % diskusage_utilization userId, instanceId, and device Maximum, Minimum, and Average
    Host.diskusage.free The amount of available disk space for regular users and superusers. Byte diskusage_free userId, instanceId, and device Maximum, Minimum, and Average
    Host.diskusage.avail The amount of available disk space for regular users. Byte diskusage_avail userId, instanceId, and device Maximum, Minimum, and Average
    Host.diskusage.total The total amount of disk space. Byte diskusage_total userId, instanceId, and device Maximum, Minimum, and Average
    Host.disk.readbytes The number of bytes that are read from the disk per second. Byte/s disk_readbytes userId, instanceId, and device Maximum, Minimum, and Average
    Host.disk.writebytes The number of bytes that are written to the disk per second. Byte/s disk_writebytes userId, instanceId, and device Maximum, Minimum, and Average
    Host.disk.readiops The number of read requests that the disk receives per second. count/s disk_readiops userId, instanceId, and device Maximum, Minimum, and Average
    Host.disk.writeiops The number of write requests that the disk receives per second. count/s disk_writeiops userId, instanceId, and device Maximum, Minimum, and Average
  • File system metric
    • Linux

      For more information about the metric that is described in the following table, see the description of the df command.

    • Windows

      The file system metric is unavailable for Windows hosts.

    Metric Description Unit MetricName Dimensions Statistics Remarks for Linux
    Host.fs.inode The inode usage. % fs_inodeutilization userId, instanceId, and device Maximum, Minimum, and Average Linux uses inode numbers instead of file names to identify files. The system must monitor this metric. If all inode numbers are used up, you cannot create new files even if the system still has available disk space. The number of inodes indicates the number of files. A large number of small files can cause high inode usage.
  • Network metrics
    • Linux
      • For more information about the TCP connection metric that is described in the following table, see the description of the ss command.
        By default, the CloudMonitor agent collects the following TCP connection states: TCP_TOTAL, ESTABLISHED, and NON_ESTABLISHED. TCP_TOTAL indicates the total number of connections. ESTABLISHED indicates the number of established connections. NON_ESTABLISHED indicates the number of connections that are not in the established state. To collect the number of each connection state, perform the following steps:
        1. Use the root user to log on to the host where the CloudMonitor agent resides.
        2. In the cloudmonitor/config/conf.properties file, change the value of netstat.tcp.disable to false.
        3. Restart the CloudMonitor agent.

          For more information, see How do I restart the CloudMonitor agent?

      • For more information about the network metrics that are described in the following table, see the description of the iftop command.
    • Windows

      The system calls the GetAdaptersAddresses function of the Iphlpapi.dll file to query the addresses of NICs on the host. Then, the system calls the GetIfTable function to query the data of the metrics for each interface. These metrics include the number of bits that an interface receives and sends per second, number of packets that an interface receives and sends per second, and number of error packets that an interface receives and sends.

    Metric Description Unit MetricName Dimensions Statistics
    Host.netin.rate The number of bits that are received by the NIC per second. The value indicates the upstream bandwidth of the NIC. bit/s networkin_rate userId, instanceId, and device Maximum, Minimum, and Average
    Host.netout.rate The number of bits that are sent by the NIC per second. The value indicates the downstream bandwidth of the NIC. bit/s networkout_rate userId, instanceId, and device Maximum, Minimum, and Average
    Host.netin.packages The number of packets that are received by the NIC per second. count/s networkin_packages userId, instanceId, and device Maximum, Minimum, and Average
    Host.netout.packages The number of packets that are sent by the NIC per second. count/s networkout_packages userId, instanceId, and device Maximum, Minimum, and Average
    Host.netin.errorpackage The number of inbound error packets that are detected by the drive. count/s networkin_errorpackages userId, instanceId, and device Maximum, Minimum, and Average
    Host.netout.errorpackages The number of outbound error packets that are detected by the drive. count/s networkout_errorpackages userId, instanceId, and device Maximum, Minimum, and Average
    Host.tcpconnection The number of TCP connections in each state. These connection states include LISTEN, SYN_SENT, ESTABLISHED, SYN_RECV, FIN_WAIT1, CLOSE_WAIT, FIN_WAIT2, LAST_ACK, TIME_WAIT, CLOSING, and CLOSED. Count net_tcpconnection userId, instanceId, and state Maximum, Minimum, and Average
  • Process metrics
    • Linux
      • For more information about the CPU utilization and memory usage of processes, see the description of the top command. The CPU utilization includes the CPU utilization of all cores.
      • For more information about the Host.process.openfile metric, see the description of the lsof command.
      • For more information about the Host.process.number metric, see the description of the ps aux | grep '<Keyword>' command.
    • Windows
      • Query

        The system calls the OpenProcess function of the Kernel32.dll file to obtain the handle of a process. To calculate the CPU utilization of the process in a period of time, the system calls the GetProcessTimes function twice to obtain the CPU utilization of the process that runs in the period between two calls. The system calls the RegConnectRegistryA function to connect to the HKEY_PERFORMANCE_DATA key of the registry. Then, the system calls the RegQueryValueExA function to query the process information in the HKEY_PERFORMANCE_DATA key. The information includes the process ID, parent process ID, priority, virtual memory, resident memory, shared memory, number of files that are opened by the process, thread count, page errors, read bytes, and written bytes.

      • Calculate
        • The system calls the OpenProcess function to obtain the handle of a process. Then, the system calls the NtQueryInformationProcess function of the ntdll.dll file to obtain the RTL_USER_PROCESS_PARAMETERS parameter of the process. Lastly, the system calls the ReadProcessMemory function to obtain the arguments and root path of the process from the command line information. The preceding operations allow you to obtain the directory of the process.
        • The system calls the OpenProcessToken function to obtain the handle of a token. Then, the system calls the GetTokenInformation function to obtain the token information. Lastly, the system calls the LookupAccountSid function to obtain the username and user group of the process.
        • The system compares the arguments and root path of each process with the directory path, username, and user group of the process. Each time the specified keyword is matched, the system increments the value of Host.process.number by 1.
    Metric Description Unit MetricName Dimensions Statistics Remarks
    Host.process.cpu The CPU utilization of a process. % process.cpu userId, instanceId, name, and pid Average You cannot configure alert rules for this metric.
    Host.process.memory The memory usage of a process. % process.memory userId, instanceId, name, and pid Average You cannot configure alert rules for this metric.
    Host.process.openfile The number of files that are opened by a process. File process.openfile userId, instanceId, name, and pid Average You cannot configure alert rules for this metric.
    Host.process.number The number of processes that match the specified keyword. Process process.number userId, instanceId, and processName Average You cannot configure alert rules for this metric.

Basic metrics

If your hosts are ECS instances, the CloudMonitor agent is not required to collect the following metrics. CloudMonitor collects these metrics every minute.
Metric Description Unit MetricName Dimensions Statistics
ECS.CPUUtilization The CPU utilization. % CPUUtilization userId and instanceId Maximum, Minimum, and Average
ECS.InternetInRate Classic network-The average rate of inbound traffic over the Internet. bit/s InternetInRate userId and instanceId Maximum, Minimum, and Average
ECS.IntranetInRate The average rate of inbound traffic over the private network. bit/s IntranetInRate userId and instanceId Maximum, Minimum, and Average
ECS.InternetOutRate Classic network-The average rate of outbound traffic over the Internet. bit/s InternetOutRate userId and instanceId Maximum, Minimum, and Average
ECS.IntranetOutRate The average rate of outbound traffic over the private network. bit/s IntranetOutRate userId and instanceId Maximum, Minimum, and Average
ECS.SystemDiskReadbps The number of bytes that are read from the system disk per second. Byte/s DiskReadBPS userId and instanceId Maximum, Minimum, and Average
ECS.SystemDiskWritebps The number of bytes that are written to the system disk per second. Byte/s DiskWriteBPS userId and instanceId Maximum, Minimum, and Average
ECS.SystemDiskReadOps The number of reads from the system disk per second. count/s DiskReadIOPS userId and instanceId Maximum, Minimum, and Average
ECS.SystemDiskWriteOps The number of writes to the system disk per second. count/s DiskWriteIOPS userId and instanceId Average, Minimum, and Maximum
ECS.VPC_PublicIP_InternetInRate The bandwidth of inbound traffic over the Internet for a virtual private cloud (VPC). bit/s  VPC_PublicIP_InternetInRate userId, instanceId, and ip Maximum, Minimum, and Average
ECS.VPC_PublicIP_InternetOutRate VPC-The bandwidth of outbound traffic over the Internet. bit/s  VPC_PublicIP_InternetOutRate userId, instanceId, and ip Maximum, Minimum, and Average
ECS.VPC_PublicIP_InternetOutRate_Percent VPC-The bandwidth usage of outbound traffic over the Internet. bit/s  VPC_PublicIP_InternetOutRate_Percent userId, instanceId, and ip Maximum, Minimum, and Average
ECS.InternetIn The inbound traffic over the Internet. Byte InternetIn userId and instanceId Average, Minimum, Maximum, and Sum
ECS.InternetOut The outbound traffic over the Internet. Byte InternetOut userId and instanceId Maximum, Minimum, and Average
ECS.IntranetIn The inbound traffic over the private network. Byte IntranetInRate userId and instanceId Maximum, Minimum, and Average
ECS.IntranetOut The outbound traffic over the private network. Byte IntranetOutRate userId and instanceId Average