Host monitoring metrics are divided into agent-collected metrics and ECS native metrics. Agent-collected metrics are collected every 15 seconds, and ECS basic metrics are collected every minute.

Note
The ECS basic metric data may be inconsistent with the operating system (OS) metric data mainly because of:
  • Different statistical frequencies Metric chart data has the average values collected during measurement periods. The statistical frequency of basic monitoring is one minute, whereas that of OS monitoring is 15 seconds. In case of large metric data fluctuations, basic metric data is smaller than OS metric data because the former data is de-peaked.
  • Different statistical perspectives The network traffic billing data in basic monitoring does not include the unbilled network traffic between ECS and Server Load Balancer. Whereas, the network traffic statistics in OS monitoring records the actual network traffic of each network adapter. Therefore, the network data in OS monitoring is greater than that in basic monitoring (that is, the agent-collected data is greater than the actual purchased bandwidth or traffic quota).

Agent-collected metrics

  • CPU metrics

    You can refer to the Linux top command to understand the meaning of the metrics.

    Metric Definition Unit remark
    Host.cpu.idle Percentage of currently idle CPUs % Percentage of the current CPU is idle
    Host.cpu.system Percentage of the current kernel space used as CPU % This metric measures the consumption resulting from system context switchover. A great value indicates that many processes or threads are running on the server.
    Host.cpu.user This metric measures the CPU consumption of user processes. % CPU consumption by user processes
    Host. CPU. iowaiit Percentage of CPUs currently waiting for Io operation % This is a relatively high value, which means that there are frequent Io operations.
    Host.cpu.other Other CPU usage percentage % Other consumption, calculated in the form of (Nice + sofpratt q + IRQ + stolen) Consumption
    Host.cpu.totalUsed Percentage of total CPU currently consumed % The sum of the CPU consumption above, usually used for alarm purposes.
  • Memory related monitors

    You can refer to the free command to understand the meaning of the indicators.

    Metrics Definition Unit Description
    Host.mem.total Total memory Bytes Total Server Memory
    Host.mem.used Amount of used memory Bytes Memory Used by the user program + buffers + Cache, the amount of memory used for the buffer, and the amount of memory used for the system cache used by the cache
    Host.mem.actualused Memory actually used by the user Bytes calculation formula:(used - buffers - cached)
    Host.mem.free Amount of memory remaining Bytes Calculated as (total memory-amount of memory used)
    Host.mem.freeutilization Percentage of memory remaining % Calculated as (amount of remaining memory/total amount of memory * 100)
    Host.mem.usedutilization Memory usage % Calculated as (actual used/total * 100)
  • Metrics of average system load

    You can refer to the Linux TOP command to understand what the metrics mean. The higher the value of the monitoring item indicates that the more busy the system is.

    Metrics Definition Unit
    Host.load1 Average system load over the past 1 minute, Windows operating system does not have this metric None
    Host. load5 Average system load over the past 5 minutes, Windows operating system does not have this metric None
    Host. load15 Average system load over the past 15 minutes, Windows operating system does not have this metric None
  • Disk related metrics
    • Disk usage and inode usage refer to the Linux DF command.
    • Disk read/write metrics can refer to the Linux iostat command.
    • Metric Definition Unit
      Host.diskusage.used Used storage space on disk Bytes
      Host.disk.utilization Disk usage %
      Host.diskusage.free Remaining storage space on disk Bytes
      Host.diskussage.total Total disk storage Bytes
      Host.disk.readbytes The number of bytes read per second by the disk. Bytes/s
      Host.disk.writebytes Number of bytes written per second on disk Bytes/s
      Host.disk.readiops Number of read requests per second on disk Times/second
      Host.disk.writeiops Number of write requests per second on disk Times/second
  • File System Monitor
    Metrics Definition Unit Description:
    Host.fs.inode Inode usage, the Unix/Linux system uses inode numbers to identify files, and the disks are not fully stocked, however, when inode has been assigned, it will not be able to create a new file on disk, windows operating system does not have this metric. % Inode number represents the number of file system files, and a large number of small files can cause too high inode usage.
  • Network related metrics
    • You can refer to the Linux iftop command For a collection of TCP connections, refer to the Linux SS Command.
    • The number of TCP connections is collected by default By default, statistics are collected on the number of TCP connections by TCP_TOTAL (total connections), ESTABLISHED (normally established connections), and NON_ESTABLISHED (connections not in the established state). If you want to obtain the number of connections in each state, follow the subsequent procedure:
      • Linux

        Set netstat.tcp.disable in the cloudmonitor/config/conf.propertiesconfiguration file to false to enable data collection. Restart the Agent once you modify the configuration. Restart the Agent once you modify the configuration.

      • Windows

          Set netstat.tcp.disable in the C:\”Program\Alibaba\cloudmonitor\configconfiguration file to falseto enable data collection.  Restart the Agent once you modify the configuration.

    Metric Definition Unit
    Host.netin.rate Number of bits received by the network adapter per second, that is, the uplink bandwidth of the network adapter.  bits/s
    Host.netout.rate Number of bits sent by the network adapter per second, that is, the downlink bandwidth of the network adapter.  bits/s
    Host.netin.packages Number of packets received by the network adapter per second.   packets/s
    Host.netout.packages Number of incoming error packets detected by the drive.   packets/s
    Host.netin.errorpackage Number of outgoing error packets detected by the drive.   packets/s
    Host.netout.errorpackages Number of outgoing error packets detected by the drive.   packets/s
    Host.tcpconnection Number of TCP connections in various states, including LISTEN, SYN_SENT, ESTABLISHED, SYN_RECV, FIN_WAIT1, CLOSE_WAIT, FIN_WAIT2, LAST_ACK, TIME_WAIT, CLOSING, and CLOSED. 
  • Process metrics
    • For details regarding process-specific CPU usage and memory usage, refer to the Linux top command. CPU usage indicates the CPU consumption of multiple kernels.
    • For details about Host.process.openfile, refer to the Linux lsof command.
    • For details about Host.process.number, refer to the Linux ps aux |grep 'keyword' command.
    Metric Definition Unit
    Host.process.cpu CPU usage of a process. %
    Host.process.memory Memory usage of a process. %
    Host.process.openfile Number of files opened by a process. Files
    Host.process.number Number of processes that match the specified keyword. Processes

ECS metrics

If your host is an ECS server, the following metrics are provided without agent installation once you purchase an ECS instance. The collection granularity is one minute.

Metric Definition Unit
ECS.CPUUtilization CPU usage %
ECS.InternetInRate Average rate of Internet inbound traffic. bits/s
ECS.IntranetInRate Average rate of intranet inbound traffic. bits/s
ECS.InternetOutRate Average rate of Internet outbound traffic. bits/s
ECS.IntranetOutRate Average rate of intranet outbound traffic. bits/s
ECS.SystemDiskReadbps Number of bytes read from the system disk per second.  Bytes/s
ECS.SystemDiskWritebps Number of bytes written to the system disk per second.  Bytes/s
ECS.SystemDiskReadOps Number of times data is read from the system disk per second. times/s
ECS.SystemDiskWriteOps Number of times data is written to the system disk per second. times/s
ECS. internetin Internet inbound traffic. bytes
ECS.InternetOut Internet outbound traffic. bytes
ECS.IntranetIn  Intranet inbound traffic. bytes
ECS.IntranetOut  Intranet outbound traffic. bytes