You can use Prometheus Monitoring for Alibaba Cloud to monitor all internal and external storage resources in your cluster. Internal storage resources include RootFS and ephemeral storage. External storage resources include disks, NAS, and OSS persistent volumes (PVs). Container storage monitoring also provides multi-dimensional basic storage metrics at no cost.
Internal cluster storage resource monitoring
Internal cluster storage resources include RootFS and ephemeral storage.
RootFS: The environment for the container runtime. It includes all the data and configurations required by the container runtime. You can monitor RootFS usage, such as the amount used, usage rate, and space distribution.
Ephemeral storage: Stores temporary data for the container runtime. It primarily stores temporary files and cache generated by the container runtime. You can monitor ephemeral storage usage, such as the amount used, usage rate, and space distribution.
RootFS metrics
If your cluster runs Kubernetes 1.22 or later, you must upgrade the Container Storage Interface (CSI) plug-in to v1.28.3-eb95171-aliyun or later. For the component changelog, see csi-plugin. To upgrade, see Upgrade the CSI component.
Metric | Type | Description |
container_fs_limit_bytes | Gauge | Total RootFS space. Unit: bytes. |
container_fs_usage_bytes | Gauge | Used RootFS space. Unit: bytes. |
container_fs_available_bytes | Gauge | Available RootFS space. Unit: bytes. |
container_fs_inodes_total | Gauge | Total number of inodes in RootFS. |
container_fs_inodes_used | Gauge | Number of used inodes in RootFS. |
container_fs_inodes_free | Gauge | Number of available inodes in RootFS. |
container_fs_reads_bytes_total | Gauge | Total bytes read from RootFS. |
container_fs_writes_bytes_total | Gauge | Total bytes written to RootFS. |
Ephemeral storage metrics
The following metrics depend on the csi-plugin component. The CSI component must be v1.28.3-eb95171-aliyun or later. For the component changelog, see csi-plugin. To upgrade the component, see Upgrade the CSI component.
Metric | Type | Description |
ephemeral_storage_pod_limit_bytes | Gauge | Total ephemeral storage space for the pod. Unit: bytes. |
ephemeral_storage_pod_usage_bytes | Gauge | Used ephemeral storage space for the pod. Unit: bytes. |
ephemeral_storage_pod_available_bytes | Gauge | Available ephemeral storage space for the pod. Unit: bytes. |
ephemeral_storage_pod_inodes_total | Gauge | Total number of inodes in the ephemeral storage of the pod. |
ephemeral_storage_pod_inodes_used | Gauge | Number of used inodes in the ephemeral storage of the pod. |
ephemeral_storage_pod_inodes_free | Gauge | Number of available inodes in the ephemeral storage of the pod. |
External cluster storage resource monitoring
Disk persistent volume metrics
The following metrics depend on the csi-plugin component. The CSI component must be v1.18.8.46-afb19e46-aliyun or later. For the component changelog, see csi-plugin. To upgrade the component, see Upgrade the CSI component.
Metric | Type | Description |
node_volume_capacity_bytes_total | Counter | Total disk space. Unit: bytes. |
node_volume_capacity_bytes_used | Counter | Used disk space. Unit: bytes. |
node_volume_capacity_bytes_available | Counter | Available disk space. Unit: bytes. |
node_volume_inodes_total | Counter | Total number of inodes on the disk. |
node_volume_inodes_used | Counter | Number of used inodes on the disk. |
node_volume_inodes_available | Counter | Number of available inodes on the disk. |
node_volume_read_bytes_total | Counter | Total bytes successfully read. |
node_volume_read_completed_total | Counter | Total number of successful read operations. |
node_volume_read_merged_total | Counter | Number of read operations merged by the kernel. |
node_volume_read_time_milliseconds_total | Counter | Total time spent on read operations. Unit: milliseconds. |
node_volume_write_bytes_total | Counter | Total bytes successfully written. |
node_volume_write_completed_total | Counter | Total number of successful write operations. |
node_volume_write_merged_total | Counter | Number of write operations merged by the kernel. |
node_volume_write_time_milliseconds_total | Counter | Total time spent on write operations. Unit: milliseconds. |
node_volume_io_now | Gauge | Number of I/O operations in progress. |
node_volume_io_time_seconds_total | Counter | Total time spent on I/O operations. Unit: seconds. |
NAS persistent volume metrics
The following metrics depend on the csi-plugin component. The CSI component must be v1.18.8.46-afb19e46-aliyun or later. For the component changelog, see csi-plugin. To upgrade the component, see Upgrade the CSI component.
Metric | Type | Description |
node_volume_capacity_bytes_total | Counter | Total space of the NAS PV. Unit: bytes. |
node_volume_capacity_bytes_used | Counter | Used space of the NAS PV. Unit: bytes. |
node_volume_capacity_bytes_available | Counter | Available space of the NAS PV. Unit: bytes. |
node_volume_read_bytes_total | Counter | Total bytes successfully read. |
node_volume_read_sent_bytes_total | Counter | Total bytes sent for network requests during read operations. |
node_volume_read_completed_total | Counter | Total number of successful read operations. |
node_volume_read_transmissions_total | Counter | Total number of network requests for read operations. |
node_volume_read_timeouts_total | Counter | Total number of timeouts for read operations. |
node_volume_read_time_milliseconds_total | Counter | Total time spent on read operations. Unit: milliseconds. |
node_volume_read_queue_time_milliseconds_total | Counter | Total queue time for read operations before network transmission. Unit: milliseconds. |
node_volume_read_rtt_time_milliseconds_total | Counter | Total time spent waiting for server responses during read operations. Unit: milliseconds. |
node_volume_write_bytes_total | Counter | Total bytes successfully written. |
node_volume_write_recv_bytes_total | Counter | Total bytes received from network requests during write operations. |
node_volume_write_completed_total | Counter | Total number of successful write operations. |
node_volume_write_transmissions_total | Counter | Total number of network requests for write operations. |
node_volume_write_timeouts_total | Counter | Total number of timeouts for write operations. |
node_volume_write_time_milliseconds_total | Counter | Total time spent on write operations. Unit: milliseconds. |
node_volume_write_queue_time_milliseconds_total | Counter | Total queue time for write operations before network transmission. Unit: milliseconds. |
node_volume_write_rtt_time_milliseconds_total | Counter | Total time spent waiting for server responses during write operations. Unit: milliseconds. |
OSS persistent volume metrics
The following metrics depend on the csi-plugin component. The CSI component must be v1.22.14-820d8870-aliyun or later. For the component changelog, see csi-plugin. To upgrade the component, see Upgrade the CSI component.
Metric | Type | Description |
node_volume_inode_bytes_total_counter | Counter | Total number of |
node_volume_inode_bytes_used_counter | Counter | Number of used |
node_volume_inode_bytes_available_counter | Counter | Number of available |
node_volume_read_bytes_total_counter | Counter | Total bytes successfully read. |
node_volume_read_completed_total_counter | Counter | Total number of successful read operations. |
node_volume_read_time_milliseconds_total_counter | Counter | Total time spent on read operations. Unit: milliseconds. |
node_volume_write_bytes_total_counter | Counter | Total bytes successfully written. |
node_volume_write_completed_total_counter | Counter | Total number of successful write operations. |
node_volume_write_time_milliseconds_total_counter | Counter | Total time spent on write operations. Unit: milliseconds. |
node_volume_posix_mkdir_total_counter | Counter | Total number of POSIX |
node_volume_posix_rmdir_total_counter | Counter | Total number of POSIX |
node_volume_posix_opendir_total_counter | Counter | Total number of POSIX |
node_volume_posix_readdir_total_counter | Counter | Total number of POSIX |
node_volume_posix_read_total_counter | Counter | Total number of POSIX |
node_volume_posix_write_total_counter | Counter | Total number of POSIX |
node_volume_posix_flush_total_counter | Counter | Total number of POSIX |
node_volume_posix_fsync_total_counter | Counter | Total number of POSIX |
node_volume_posix_release_total_counter | Counter | Total number of POSIX |
node_volume_posix_create_total_counter | Counter | Total number of POSIX |
node_volume_posix_open_total_counter | Counter | Total number of POSIX |
node_volume_posix_access_total_counter | Counter | Total number of POSIX |
node_volume_posix_rename_total_counter | Counter | Total number of POSIX |
node_volume_posix_chown_total_counter | Counter | Total number of POSIX |
node_volume_posix_chmod_total_counter | Counter | Total number of POSIX |
node_volume_posix_truncate_total_counter | Counter | Total number of POSIX |
node_volume_oss_put_object_total_counter | Counter | Total number of OSS |
node_volume_oss_get_object_total_counter | Counter | Total number of OSS |
node_volume_oss_head_object_total_counter | Counter | Total number of OSS |
node_volume_oss_delete_object_total_counter | Counter | Total number of OSS |
node_volume_oss_post_object_total_counter | Counter | Total number of OSS |
node_volume_hot_spot_read_file_top | Gauge | Hot spot file for |
node_volume_hot_spot_write_file_top | Gauge | Hot spot file for |
node_volume_hot_spot_head_file_top | Gauge | Hot spot file for |
node_volume_backend_read_bytes_total_counter | Counter | Total bytes successfully read from the OSS backend. |
node_volume_backend_write_bytes_total_counter | Counter | Total bytes successfully written to the OSS backend. |
node_volume_backend_read_completed_total_counter | Counter | Total number of successful read operations on the OSS backend. |
node_volume_backend_write_completed_total_counter | Counter | Total number of successful write operations on the OSS backend. |
node_volume_backend_read_time_milliseconds_total_counter | Counter | Total time spent on read operations on the OSS backend. Unit: milliseconds. |
node_volume_backend_write_time_milliseconds_total_counter | Counter | Total time spent on write operations on the OSS backend. Unit: milliseconds. |
node_volume_backend_posix_getattr_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_getmode_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_access_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_lookup_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_mknod_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_remove_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_setattr_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_link_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_readlink_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_statfs_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_rename_total_counter | Counter | Total number of POSIX |
node_volume_backend_posix_readdir_total_counter | Counter | Total number of POSIX |
FUSE mount target metrics
The following metrics are in a phased release and depend on the csi-plugin component. The component must be v1.32.2 or later. For the component changelog, see csi-plugin. To upgrade the component, see Upgrade the CSI component.
Metric | Type | Description |
node_volume_mount_retry_count | Counter | The number of retries triggered when CSI creates a Filesystem in Userspace (FUSE) mount target due to a mount timeout or an execution fault. A continuously increasing value usually indicates a mount problem, which causes the related application pods to remain in the |
node_volume_mount_point_failover_count | Counter | The total number of times a FUSE mount target successfully performed a failover and recovered to a healthy state after a disconnection caused by events such as a client process crash. This metric is supported by only some client types. For unsupported clients, this value is always |
node_volume_mount_point_status | Gauge | The real-time health status of the mount target.
|
node_volume_last_fuse_client_exit_reason | Gauge | The timestamp and reason for the last unexpected exit of the FUSE client process. For example, |