Container Service for Kubernetes (ACK) uses Prometheus Monitoring for Alibaba Cloud to collect storage metrics from all storage resources in your cluster — internal resources like RootFS and ephemeral storage, and external persistent volumes (PVs) backed by disk, NAS, or OSS. All metrics are available at no additional cost.
Prerequisites
Before using storage metrics, make sure the csi-plugin component meets the minimum version requirement for each storage type:
| Storage type | Minimum csi-plugin version | Additional requirement |
|---|---|---|
| RootFS | v1.28.3-eb95171-aliyun | Kubernetes 1.22 or later |
| Ephemeral storage | v1.28.3-eb95171-aliyun | — |
| Disk PV | v1.18.8.46-afb19e46-aliyun | — |
| NAS PV | v1.18.8.46-afb19e46-aliyun | — |
| OSS PV | v1.22.14-820d8870-aliyun | — |
| FUSE mount target | v1.32.2 | Phased release |
For the csi-plugin changelog, see csi-plugin. To upgrade, see Upgrade the CSI component.
Internal storage metrics
Internal storage resources are attached to the container runtime itself. ACK monitors two types:
-
RootFS: The root filesystem for the container runtime. It includes all the data and configurations required by the container runtime.
-
Ephemeral storage: Temporary storage for the container runtime. It primarily stores temporary files and cache generated by the container runtime.
RootFS metrics
All RootFS metrics are of type Gauge — values can increase or decrease and reflect the current state at the time of measurement.
| Metric | Type | Description |
|---|---|---|
container_fs_limit_bytes |
Gauge | Total RootFS space. Unit: bytes. |
container_fs_usage_bytes |
Gauge | Used RootFS space. Unit: bytes. |
container_fs_available_bytes |
Gauge | Available RootFS space. Unit: bytes. |
container_fs_inodes_total |
Gauge | Total number of inodes in RootFS. |
container_fs_inodes_used |
Gauge | Number of used inodes in RootFS. |
container_fs_inodes_free |
Gauge | Number of available inodes in RootFS. |
container_fs_reads_bytes_total |
Gauge | Total bytes read from RootFS. |
container_fs_writes_bytes_total |
Gauge | Total bytes written to RootFS. |
Ephemeral storage metrics
All ephemeral storage metrics are of type Gauge.
| Metric | Type | Description |
|---|---|---|
ephemeral_storage_pod_limit_bytes |
Gauge | Total ephemeral storage space for the pod. Unit: bytes. |
ephemeral_storage_pod_usage_bytes |
Gauge | Used ephemeral storage space for the pod. Unit: bytes. |
ephemeral_storage_pod_available_bytes |
Gauge | Available ephemeral storage space for the pod. Unit: bytes. |
ephemeral_storage_pod_inodes_total |
Gauge | Total number of inodes in the ephemeral storage of the pod. |
ephemeral_storage_pod_inodes_used |
Gauge | Number of used inodes in the ephemeral storage of the pod. |
ephemeral_storage_pod_inodes_free |
Gauge | Number of available inodes in the ephemeral storage of the pod. |
External storage metrics
External storage resources are PVs provisioned from disk, NAS, or OSS. ACK exposes separate metric sets for each PV type, reflecting the different I/O characteristics of each backend.
Disk PV metrics
| Metric | Type | Description |
|---|---|---|
node_volume_capacity_bytes_total |
Counter | Total disk space. Unit: bytes. |
node_volume_capacity_bytes_used |
Counter | Used disk space. Unit: bytes. |
node_volume_capacity_bytes_available |
Counter | Available disk space. Unit: bytes. |
node_volume_inodes_total |
Counter | Total number of inodes on the disk. |
node_volume_inodes_used |
Counter | Number of used inodes on the disk. |
node_volume_inodes_available |
Counter | Number of available inodes on the disk. |
node_volume_read_bytes_total |
Counter | Total bytes successfully read. |
node_volume_read_completed_total |
Counter | Total number of successful read operations. |
node_volume_read_merged_total |
Counter | Number of read operations merged by the kernel. |
node_volume_read_time_milliseconds_total |
Counter | Total time spent on read operations. Unit: milliseconds. |
node_volume_write_bytes_total |
Counter | Total bytes successfully written. |
node_volume_write_completed_total |
Counter | Total number of successful write operations. |
node_volume_write_merged_total |
Counter | Number of write operations merged by the kernel. |
node_volume_write_time_milliseconds_total |
Counter | Total time spent on write operations. Unit: milliseconds. |
node_volume_io_now |
Gauge | Number of I/O operations in progress. |
node_volume_io_time_seconds_total |
Counter | Total time spent on I/O operations. Unit: seconds. |
NAS PV metrics
NAS PV metrics include network-layer details — sent/received bytes, transmission counts, timeouts, and round-trip times — that reflect the NFS protocol behavior underlying NAS access.
| Metric | Type | Description |
|---|---|---|
node_volume_capacity_bytes_total |
Counter | Total space of the NAS PV. Unit: bytes. |
node_volume_capacity_bytes_used |
Counter | Used space of the NAS PV. Unit: bytes. |
node_volume_capacity_bytes_available |
Counter | Available space of the NAS PV. Unit: bytes. |
node_volume_read_bytes_total |
Counter | Total bytes successfully read. |
node_volume_read_sent_bytes_total |
Counter | Total bytes sent for network requests during read operations. |
node_volume_read_completed_total |
Counter | Total number of successful read operations. |
node_volume_read_transmissions_total |
Counter | Total number of network requests for read operations. |
node_volume_read_timeouts_total |
Counter | Total number of timeouts for read operations. |
node_volume_read_time_milliseconds_total |
Counter | Total time spent on read operations. Unit: milliseconds. |
node_volume_read_queue_time_milliseconds_total |
Counter | Total queue time for read operations before network transmission. Unit: milliseconds. |
node_volume_read_rtt_time_milliseconds_total |
Counter | Total time spent waiting for server responses during read operations. Unit: milliseconds. |
node_volume_write_bytes_total |
Counter | Total bytes successfully written. |
node_volume_write_recv_bytes_total |
Counter | Total bytes received from network requests during write operations. |
node_volume_write_completed_total |
Counter | Total number of successful write operations. |
node_volume_write_transmissions_total |
Counter | Total number of network requests for write operations. |
node_volume_write_timeouts_total |
Counter | Total number of timeouts for write operations. |
node_volume_write_time_milliseconds_total |
Counter | Total time spent on write operations. Unit: milliseconds. |
node_volume_write_queue_time_milliseconds_total |
Counter | Total queue time for write operations before network transmission. Unit: milliseconds. |
node_volume_write_rtt_time_milliseconds_total |
Counter | Total time spent waiting for server responses during write operations. Unit: milliseconds. |
OSS PV metrics
OSS PV metrics cover three layers: the POSIX interface exposed to the application, the OSS API calls made to the backend, and backend-level read/write operations and POSIX metadata operations on the OSS backend.
Inode and basic read/write metrics
| Metric | Type | Description |
|---|---|---|
node_volume_inode_bytes_total_counter |
Counter | Total number of inodes in the OSS PV. |
node_volume_inode_bytes_used_counter |
Counter | Number of used inodes in the OSS PV. |
node_volume_inode_bytes_available_counter |
Counter | Number of available inodes in the OSS PV. |
node_volume_read_bytes_total_counter |
Counter | Total bytes successfully read. |
node_volume_read_completed_total_counter |
Counter | Total number of successful read operations. |
node_volume_read_time_milliseconds_total_counter |
Counter | Total time spent on read operations. Unit: milliseconds. |
node_volume_write_bytes_total_counter |
Counter | Total bytes successfully written. |
node_volume_write_completed_total_counter |
Counter | Total number of successful write operations. |
node_volume_write_time_milliseconds_total_counter |
Counter | Total time spent on write operations. Unit: milliseconds. |
POSIX operation metrics
| Metric | Type | Description |
|---|---|---|
node_volume_posix_mkdir_total_counter |
Counter | Total number of POSIX mkdir operations. |
node_volume_posix_rmdir_total_counter |
Counter | Total number of POSIX rmdir operations. |
node_volume_posix_opendir_total_counter |
Counter | Total number of POSIX opendir operations. |
node_volume_posix_readdir_total_counter |
Counter | Total number of POSIX readdir operations. |
node_volume_posix_read_total_counter |
Counter | Total number of POSIX read operations. |
node_volume_posix_write_total_counter |
Counter | Total number of POSIX write operations. |
node_volume_posix_flush_total_counter |
Counter | Total number of POSIX flush operations. |
node_volume_posix_fsync_total_counter |
Counter | Total number of POSIX fsync operations. |
node_volume_posix_release_total_counter |
Counter | Total number of POSIX release operations. |
node_volume_posix_create_total_counter |
Counter | Total number of POSIX create operations. |
node_volume_posix_open_total_counter |
Counter | Total number of POSIX open operations. |
node_volume_posix_access_total_counter |
Counter | Total number of POSIX access operations. |
node_volume_posix_rename_total_counter |
Counter | Total number of POSIX rename operations. |
node_volume_posix_chown_total_counter |
Counter | Total number of POSIX chown operations. |
node_volume_posix_chmod_total_counter |
Counter | Total number of POSIX chmod operations. |
node_volume_posix_truncate_total_counter |
Counter | Total number of POSIX truncate operations. |
OSS API operation metrics
| Metric | Type | Description |
|---|---|---|
node_volume_oss_put_object_total_counter |
Counter | Total number of OSS put operations. |
node_volume_oss_get_object_total_counter |
Counter | Total number of OSS get operations. |
node_volume_oss_head_object_total_counter |
Counter | Total number of OSS head operations. |
node_volume_oss_delete_object_total_counter |
Counter | Total number of OSS delete operations. |
node_volume_oss_post_object_total_counter |
Counter | Total number of OSS post operations. |
Hot spot metrics
Hot spot metrics are of type Gauge and report which files are receiving the most operations at any given time.
| Metric | Type | Description |
|---|---|---|
node_volume_hot_spot_read_file_top |
Gauge | Hot spot file for read operations. |
node_volume_hot_spot_write_file_top |
Gauge | Hot spot file for write operations. |
node_volume_hot_spot_head_file_top |
Gauge | Hot spot file for head operations. |
OSS backend metrics
| Metric | Type | Description |
|---|---|---|
node_volume_backend_read_bytes_total_counter |
Counter | Total bytes successfully read from the OSS backend. |
node_volume_backend_write_bytes_total_counter |
Counter | Total bytes successfully written to the OSS backend. |
node_volume_backend_read_completed_total_counter |
Counter | Total number of successful read operations on the OSS backend. |
node_volume_backend_write_completed_total_counter |
Counter | Total number of successful write operations on the OSS backend. |
node_volume_backend_read_time_milliseconds_total_counter |
Counter | Total time spent on read operations on the OSS backend. Unit: milliseconds. |
node_volume_backend_write_time_milliseconds_total_counter |
Counter | Total time spent on write operations on the OSS backend. Unit: milliseconds. |
node_volume_backend_posix_getattr_total_counter |
Counter | Total number of POSIX getattr operations on the OSS backend. |
node_volume_backend_posix_getmode_total_counter |
Counter | Total number of POSIX getmode operations on the OSS backend. |
node_volume_backend_posix_access_total_counter |
Counter | Total number of POSIX access operations on the OSS backend. |
node_volume_backend_posix_lookup_total_counter |
Counter | Total number of POSIX lookup operations on the OSS backend. |
node_volume_backend_posix_mknod_total_counter |
Counter | Total number of POSIX mknod operations on the OSS backend. |
node_volume_backend_posix_remove_total_counter |
Counter | Total number of POSIX remove operations on the OSS backend. |
node_volume_backend_posix_setattr_total_counter |
Counter | Total number of POSIX setattr operations on the OSS backend. |
node_volume_backend_posix_link_total_counter |
Counter | Total number of POSIX link operations on the OSS backend. |
node_volume_backend_posix_readlink_total_counter |
Counter | Total number of POSIX readlink operations on the OSS backend. |
node_volume_backend_posix_statfs_total_counter |
Counter | Total number of POSIX statfs operations on the OSS backend. |
node_volume_backend_posix_rename_total_counter |
Counter | Total number of POSIX rename operations on the OSS backend. |
node_volume_backend_posix_readdir_total_counter |
Counter | Total number of POSIX readdir operations on the OSS backend. |
FUSE mount target metrics
These metrics are in a phased release and require csi-plugin v1.32.2 or later.
FUSE mount target metrics track the health and stability of Filesystem in Userspace (FUSE) mount targets managed by CSI. Use these metrics to detect mount failures and distinguish between "the pod cannot start" and "the storage backend is healthy but disconnected."
| Metric | Type | Description |
|---|---|---|
node_volume_mount_retry_count |
Counter | Number of retries triggered when CSI creates a FUSE mount target due to a mount timeout or an execution fault. A continuously increasing value indicates a mount problem, which causes related application pods to remain in the ContainerCreating state. |
node_volume_mount_point_failover_count |
Counter | Total number of times a FUSE mount target successfully performed a failover and recovered to a healthy state after a disconnection — for example, after a client process crash. Supported by only some client types; unsupported clients always return 0. |
node_volume_mount_point_status |
Gauge | Real-time health status of the mount target. 0 = healthy (mounted, connection normal; application pods can access data through the PV). 1 = unhealthy (mount failed, connection disconnected, or another abnormal state; pods may remain in ContainerCreating or experience abnormal I/O). |
node_volume_last_fuse_client_exit_reason |
Gauge | Timestamp and reason for the last unexpected exit of the FUSE client process. For example, 2025-11-06T07:19:32Z:: signal: killed indicates the process received a kill signal. |