Host monitoring metrics are divided into agent-collected metrics and ECS native metrics. Agent-collected metrics are collected every 15 seconds, and ECS basic metrics are collected every minute.
| The ECS basic metric data may be inconsistent with the operating system (OS) metric data mainly because of:
- CPU metrics
You can refer to the Linux top command to understand the meaning of the metrics.
Metric Definition Unit remark Host.cpu.idle Percentage of currently idle CPUs % Percentage of the current CPU is idle Host.cpu.system Percentage of the current kernel space used as CPU % This metric measures the consumption resulting from system context switchover. A great value indicates that many processes or threads are running on the server. Host.cpu.user This metric measures the CPU consumption of user processes. % CPU consumption by user processes Host. CPU. iowaiit Percentage of CPUs currently waiting for Io operation % This is a relatively high value, which means that there are frequent Io operations. Host.cpu.other Other CPU usage percentage % Other consumption, calculated in the form of (Nice + sofpratt q + IRQ + stolen) Consumption Host.cpu.totalUsed Percentage of total CPU currently consumed % The sum of the CPU consumption above, usually used for alarm purposes.
- Memory related monitors
You can refer to the free command to understand the meaning of the indicators.
Metrics Definition Unit Description Host.mem.total Total memory Bytes Total Server Memory Host.mem.used Amount of used memory Bytes Memory Used by the user program + buffers + Cache, the amount of memory used for the buffer, and the amount of memory used for the system cache used by the cache Host.mem.actualused Memory actually used by the user Bytes calculation formula:(used - buffers - cached) Host.mem.free Amount of memory remaining Bytes Calculated as (total memory-amount of memory used) Host.mem.freeutilization Percentage of memory remaining % Calculated as (amount of remaining memory/total amount of memory * 100) Host.mem.usedutilization Memory usage % Calculated as (actual used/total * 100)
- Metrics of average system load
You can refer to the Linux TOP command to understand what the metrics mean. The higher the value of the monitoring item indicates that the more busy the system is.
Metrics Definition Unit Host.load1 Average system load over the past 1 minute, Windows operating system does not have this metric None Host. load5 Average system load over the past 5 minutes, Windows operating system does not have this metric None Host. load15 Average system load over the past 15 minutes, Windows operating system does not have this metric None
- Disk related metrics
- Disk usage and inode usage refer to the Linux DF command.
- Disk read/write metrics can refer to the Linux iostat command.
Metric Definition Unit Host.diskusage.used Used storage space on disk Bytes Host.disk.utilization Disk usage % Host.diskusage.free Remaining storage space on disk Bytes Host.diskussage.total Total disk storage Bytes Host.disk.readbytes The number of bytes read per second by the disk. Bytes/s Host.disk.writebytes Number of bytes written per second on disk Bytes/s Host.disk.readiops Number of read requests per second on disk Times/second Host.disk.writeiops Number of write requests per second on disk Times/second
- File System Monitor
Metrics Definition Unit Description: Host.fs.inode Inode usage, the Unix/Linux system uses inode numbers to identify files, and the disks are not fully stocked, however, when inode has been assigned, it will not be able to create a new file on disk, windows operating system does not have this metric. % Inode number represents the number of file system files, and a large number of small files can cause too high inode usage.
- Network related metrics
- You can refer to the Linux iftop command For a collection of TCP connections, refer to the Linux SS Command.
- The number of TCP connections is collected by default By default, statistics are collected on the number of TCP connections by TCP_TOTAL (total connections), ESTABLISHED (normally established connections), and NON_ESTABLISHED (connections not in the established state). If you want to obtain the number of connections in each state, follow the subsequent procedure:
netstat.tcp.disablein the cloudmonitor/config/conf.propertiesconfiguration file to
falseto enable data collection. Restart the Agent once you modify the configuration. Restart the Agent once you modify the configuration.
netstat.tcp.disablein the C:\”Program\Alibaba\cloudmonitor\configconfiguration file to
falseto enable data collection. Restart the Agent once you modify the configuration.
Metric Definition Unit Host.netin.rate Number of bits received by the network adapter per second, that is, the uplink bandwidth of the network adapter. bits/s Host.netout.rate Number of bits sent by the network adapter per second, that is, the downlink bandwidth of the network adapter. bits/s Host.netin.packages Number of packets received by the network adapter per second. packets/s Host.netout.packages Number of incoming error packets detected by the drive. packets/s Host.netin.errorpackage Number of outgoing error packets detected by the drive. packets/s Host.netout.errorpackages Number of outgoing error packets detected by the drive. packets/s Host.tcpconnection Number of TCP connections in various states, including LISTEN, SYN_SENT, ESTABLISHED, SYN_RECV, FIN_WAIT1, CLOSE_WAIT, FIN_WAIT2, LAST_ACK, TIME_WAIT, CLOSING, and CLOSED.
- Process metrics
- For details regarding process-specific CPU usage and memory usage, refer to the Linux top command. CPU usage indicates the CPU consumption of multiple kernels.
- For details about Host.process.openfile, refer to the Linux lsof command.
- For details about Host.process.number, refer to the Linux ps aux |grep 'keyword' command.
Metric Definition Unit Host.process.cpu CPU usage of a process. % Host.process.memory Memory usage of a process. % Host.process.openfile Number of files opened by a process. Files Host.process.number Number of processes that match the specified keyword. Processes
If your host is an ECS server, the following metrics are provided without agent installation once you purchase an ECS instance. The collection granularity is one minute.
|ECS.InternetInRate||Average rate of Internet inbound traffic.||bits/s|
|ECS.IntranetInRate||Average rate of intranet inbound traffic.||bits/s|
|ECS.InternetOutRate||Average rate of Internet outbound traffic.||bits/s|
|ECS.IntranetOutRate||Average rate of intranet outbound traffic.||bits/s|
|ECS.SystemDiskReadbps||Number of bytes read from the system disk per second.||Bytes/s|
|ECS.SystemDiskWritebps||Number of bytes written to the system disk per second.||Bytes/s|
|ECS.SystemDiskReadOps||Number of times data is read from the system disk per second.||times/s|
|ECS.SystemDiskWriteOps||Number of times data is written to the system disk per second.||times/s|
|ECS. internetin||Internet inbound traffic.||bytes|
|ECS.InternetOut||Internet outbound traffic.||bytes|
|ECS.IntranetIn||Intranet inbound traffic.||bytes|
|ECS.IntranetOut||Intranet outbound traffic.||bytes|