All Products
Search
Document Center

Cloud Parallel File Storage:Data monitoring

Last Updated:Oct 30, 2025

You can view the capacity and performance information of a CPFS for Lingjun file system to understand its storage usage, read/write throughput, and read/write IOPS. By setting alert rules for important metrics, you can receive prompt notifications about exceptions and handle them quickly. This topic describes the metrics that CPFS for Lingjun supports and how to configure alert rules for them.

Background information

CloudMonitor is a service that monitors Alibaba Cloud resources and internet applications. You can use CloudMonitor to monitor metrics of various cloud resources and set alerts for specific metrics. This provides a complete picture of your resource usage and application status on Alibaba Cloud and lets you handle faults promptly to ensure that your services run smoothly. For more information, see What is CloudMonitor?.

Retention policy of monitoring data

Monitoring data is retained for 90 days. After the retention period expires, the monitoring data is automatically cleared. The retention period starts when data is generated.

Monitoring metrics

CPFS for Lingjun supports comprehensive monitoring of file system capacity, instance performance, and client performance through CloudMonitor. Two sets of monitoring metrics are available: a new version (recommended) and an old version. The new metrics address issues in the old version, such as inconsistent naming and unclear structure, and offer improved usability and maintainability.

  • New customers: You can use the new metrics directly.

  • Existing customers: You can continue to use the old metrics to ensure business continuity. However, we recommend that you gradually migrate to the new version.

Important

If you are an existing customer who wants to switch to the new metrics, you must first test them in a test environment.

New version metrics (recommended)

The new monitoring metrics are currently available in the following region: China (Beijing).

Capacity monitoring

Type

Metric

Metric name

Unit

Description

File system - Standard

BmStdCapacity

Total file system storage capacity for the Intelligent Computing Edition (Standard Specifications)

Bytes (B)

The total storage space of the file system.

BmStdCapacityUsed

Data usage of a standard CPFS for Lingjun file system

Bytes (B)

The amount of data that is currently used by the file system.

BmStdInodeLimit

Maximum number of files for a standard AI Computing Edition file system

Unit

The maximum total number of files and directories that the file system can hold.

BmStdInodeAlloc

Number of allocated files in a standard CPFS for Lingjun file system

Unit

The total number of files and directories that are currently allocated (created) in the file system.

BmStdInodeUsed

Number of used files in a standard CPFS for Lingjun file system

Item

The total number of files and directories that are currently used in the file system.

File system - Large

Large-specification file systems are available only to specific users. If you are not a user of a large-specification file system, ignore the related metrics.

BmLargeCapacity

Total storage space for large-specification file systems in the Intelligent Computing Edition

Bytes (B)

The total storage space of the file system.

BmLargeCapacityUsed

Data volume of file systems for large-scale AI computing

Bytes (B)

The amount of data that is currently used by the file system.

BmLargeInodeLimit

Maximum number of files in a large CPFS for Lingjun file system

Unit

The maximum total number of files and directories that the file system can hold.

BmLargeInodeAlloc

Number of allocated files in a large CPFS for Lingjun file system

Item

The total number of files and directories that are currently allocated (created) in the file system.

BmLargeInodeUsed

File count in the large-scale AI Computing Edition file system

Unit

The total number of files and directories that are currently used in the file system.

Fileset - Standard

BmStdFsetCapacityLimit

Capacity quota of a standard CPFS for Lingjun fileset

Bytes (B)

The maximum capacity quota set for a single fileset.

BmStdFsetCapacityUsed

Current capacity of the standard specification fileset for the AI Computing Edition

Bytes (B)

The capacity that is currently used by a single fileset.

BmStdFsetInodeLimit

Standard specifications for the Intelligent Computing Edition: Quota on the number of files per fileset

Unit

The maximum quota for the number of files and directories set for a single fileset.

BmStdFsetInodeAlloc

Number of pre-allocated files in a standard CPFS for Lingjun fileset

Unit

The total number of files and directories that are currently pre-allocated for a single fileset.

BmStdFsetInodeUsed

Number of files in a standard fileset for the Intelligent Computing Edition

Unit

The number of files and directories that are currently used by a single fileset.

Fileset - Large

Large-specification file systems are available only to specific users. If you are not a user of a large-specification file system, ignore the related metrics.

BmLargeFsetCapacityLimit

Capacity Quotas for Large Filesets in the Intelligent Computing Edition

Bytes (B)

The maximum available capacity set for a single fileset.

BmLargeFsetCapacityUsed

Current capacity of the large-specification fileset in the Intelligent Computing Edition

Bytes (B)

The amount of data that is currently used by a single fileset.

BmLargeFsetInodeLimit

File count quota of a large CPFS for Lingjun fileset

Unit

The maximum total number of files and directories that can be held in a single fileset.

BmLargeFsetInodeAlloc

Number of pre-allocated files in a large CPFS for Lingjun fileset

Unit

The total number of files and directories that are currently allocated (created) for a single fileset.

BmLargeFsetInodeUsed

Current file count in large-specification filesets for the AI Computing Edition

Unit

The total number of files and directories that are currently used by a single fileset.

Performance monitoring

Type

Metric

Metric name

Unit

Description

File system - Standard

BmStdReadThroughput

Read throughput of a standard CPFS for Lingjun file system

Bytes/s

The average read throughput of the file system in bytes per second during a statistical period.

BmStdWriteThroughput

Write throughput of the file system for the Standard specification of the Intelligent Computing Edition

Bytes/s

The average write throughput of the file system in bytes per second during a statistical period.

BmStdReadIops

File system read IOPS for the Intelligent Computing Edition's Standard Tier

Count/s (IOPS)

The average number of read IOPS per second for the file system during a statistical period.

BmStdWriteIops

File System Write IOPS for the Intelligent Computing Edition (Standard Specifications)

Count/s (IOPS)

The average number of write IOPS per second for the file system during a statistical period.

BmStdReadLatency

Read latency of the file system for the Intelligent Computing Edition Standard Specification

ms

The average read latency of the file system during a statistical period.

BmStdWriteLatency

Write latency of the standard-tier Intelligent Computing Edition file system

ms

The average write latency of the file system during a statistical period.

BmStdMetaQps

Metadata QPS of a standard CPFS for Lingjun file system

Count/s (IOPS)

The average number of metadata requests per second for the file system during a statistical period.

BmStdMetaLatency

Metadata latency of a standard CPFS for Lingjun file system

ms

The average latency of metadata operations for the file system during a statistical period.

File system - Large

Large-specification file systems are available only to specific users. If you are not a user of a large-specification file system, ignore the related metrics.

BmLargeReadThroughput

Read throughput of a large CPFS for Lingjun file system

Bytes/s

The average read throughput of the file system in bytes per second during a statistical period.

BmLargeWriteThroughput

High-specification file system write throughput (Intelligent Computing Edition)

Bytes/s

The average write throughput of the file system in bytes per second during a statistical period.

BmLargeReadIops

Read IOPS of a large CPFS for Lingjun file system

Count/s (IOPS)

The average number of read IOPS per second for the file system during a statistical period.

BmLargeWriteIops

Write IOPS of a large CPFS for Lingjun file system

Count/s (IOPS)

The average number of write IOPS per second for the file system during a statistical period.

BmLargeReadLatency

Read latency in large-scale file systems (AI Computing Edition)

ms

The average read latency of the file system during a statistical period.

BmLargeWriteLatency

Write latency of large-scale AI Computing Edition file systems

ms

The average write latency of the file system during a statistical period.

BmLargeMetaQps

Metadata operation QPS of a large CPFS for Lingjun file system

Count/s (IOPS)

The average number of metadata requests per second for the file system during a statistical period.

BmLargeMetaLatency

Metadata operation latency of a large CPFS for Lingjun file system

Microsecond (μs)

The average latency of metadata operations for the file system during a statistical period.

Client

ClientReadThroughput

Client read throughput for the Intelligent Computing Edition

Bytes/s

The average read throughput in bytes per second for the client during a statistical period.

ClientWriteThroughput

Client write throughput for AI Computing Edition

Bytes/s

The average write throughput in bytes per second for the client during a statistical period.

ClientReadIops

Client read IOPS on the Intelligent Computing Edition

Count/s (IOPS)

The average number of read IOPS per second for the client during a statistical period.

ClientWriteIops

Client Write IOPS for the Intelligent Computing Edition

Count/s (IOPS)

The average number of write IOPS per second for the client during a statistical period.

ClientReadLatency

Average Client Read Latency for the Intelligent Computing Edition

Microsecond (μs)

The average read latency for the client during a statistical period.

ClientWriteLatency

Average Client Write Latency of the Intelligent Computing Edition

us

The average write latency for the client during a statistical period.

ClientMetaLatency

Intelligent Computing Edition: Client metadata latency

ms

The average latency for a client to complete a single metadata operation.

ClientMetaQps

Intelligent Computing Edition: Client metadata QPS

Count/s (IOPS)

The average number of metadata requests per second for the client during a statistical period.

Connections

VpcClientCount

Number of clients per Intelligent Computing Edition VPC

Unit

The total number of clients connected to the file system through a VPC.

RdmaClientCount

Number of RDMA clients for the Intelligent Computing Edition

Unit

The total number of clients connected to the file system through RDMA.

Note
  • The elastic file client is a client installed by the CPFS team on compute nodes. It connects the compute nodes to the CPFS for Lingjun file system.

  • You can view client performance only in the CloudMonitor console or by calling CloudMonitor API operations. For more information, see View CPFS performance monitoring or View CPFS performance monitoring.

  • When using a CPFS for Lingjun file system on ECS or PAI Lingjun AI Computing Service (single-tenant) resources, the hostname is the hostname of the node.

  • When using a CPFS for Lingjun file system on PAI general computing resources or Lingjun resources, the hostname is the pod ID of the task.

  • For more information about the new monitoring metrics, see CloudMonitor Metric Query.

Old version metrics

Capacity monitoring

Type

Metric

Metric name

Unit

Description

File system

CPFSCapacity

Total storage space

Bytes

The total storage space of the file system during a statistical period.

CPFSCapacityUsed

Data volume

Bytes

The amount of data that is actually used by the file system during a statistical period.

CPFSInode Limit

Maximum number of files

Unit

The maximum number of files that can be used by the file system during a statistical period.

CPFSInode Alloc

Number of allocated files

Unit

The number of files that are allocated by the file system during a statistical period.

CPFSInode Used

Number of used files

Unit

The number of files that are used by the file system during a statistical period.

Fileset

BMCPFSFsetCapacityLimit

Fileset allocated capacity

Bytes

The maximum storage space that a fileset can use to write data. After the quota is reached, no more data can be written.

BMCPFSFsetCapacityUsed

Fileset used capacity

Bytes

The storage space that is actually used by the fileset.

BMCPFSFsetInodeLimit

Number of files allocated by fileset

Item

The maximum number of files and directories that a fileset can use to write data. After the quota is reached, no more data can be written.

BMCPFSFsetInodeUsed

Number of files used by fileset

Unit

The number of files that are actually used by the fileset.

Performance monitoring

Type

Metric

Metric name

Unit

Description

File system

ThruputRead

Read throughput

Bytes/s

The average read throughput of the file system in bytes per second during a statistical period.

ThruputWrite

Write throughput

Bytes/s

The average write throughput of the file system in bytes per second during a statistical period.

IopsRead

Read IOPS

Count/s

The average number of read IOPS per second for the file system during a statistical period.

IopsWrite

Write IOPS

Count/s

The average number of write IOPS per second for the file system during a statistical period.

Dataflow

ThroughputImport

Import throughput

Bytes/s

The average throughput in bytes per second for a dataflow import task during a statistical period.

ThroughputExport

Export throughput

Bytes/s

The average throughput in bytes per second for a dataflow export task during a statistical period.

QPSImportMeta

Import metadata QPS

Count/s

The average number of metadata requests per second for a dataflow import task during a statistical period.

QPSExportMeta

Export metadata QPS

Count/s

The average number of metadata requests per second for a dataflow export task during a statistical period.

IOPSImport

Import IOPS

Count/s

The average number of IOPS per second for a dataflow import task during a statistical period.

IOPSEXport

Export IOPS

Count/s

The average number of IOPS per second for a dataflow export task during a statistical period.

LatencyImport

Import latency

us

The average latency of a dataflow import task during a statistical period.

LatencyExport

Export latency

us

The average latency of a dataflow export task during a statistical period.

Client

ClientReadIops

Client read IOPS

Count/s

The average number of read IOPS per second for the client during a statistical period.

ClientWriteIops

Client write IOPS

Count/s

The average number of write IOPS per second for the client during a statistical period.

ClientReadLatency

Client average read latency

us

The average read latency for the client during a statistical period.

ClientWriteLatency

Client average write latency

us

The average write latency for the client during a statistical period.

ClientReadThroughput

Client read throughput

Bytes/s

The average read throughput in bytes per second for the client during a statistical period.

ClientWriteThroughput

Client write throughput

Bytes/s

The average write throughput in bytes per second for the client during a statistical period.

Note
  • The elastic file client is a client installed by the CPFS team on compute nodes. It connects the compute nodes to the CPFS for Lingjun file system.

  • You can view client performance only in the CloudMonitor console or by calling CloudMonitor API operations. For more information, see View CPFS performance monitoring or View CPFS performance monitoring.

  • When using a CPFS for Lingjun file system on ECS or PAI Lingjun AI Computing Service (single-tenant) resources, the hostname is the hostname of the node.

  • When using a CPFS for Lingjun file system on PAI general computing resources or Lingjun resources, the hostname is the pod ID of the task.

  • For more information about the old monitoring metrics, see CloudMonitor Metric Query.

Alert rule description

In the CloudMonitor console, you can set alert rules for different metrics. If a metric for a resource meets the specified alert condition, CloudMonitor automatically sends an alert notification. The following table describes the alert levels, notification mechanisms, and alert conditions.

Alert level

Notification mechanism

Alert condition

Critical

Phone call, text message, email, and DingTalk Robot

The average value of the metric meets the specified judgment condition for N consecutive statistical periods. Set the value of N based on the alert level.

Note

The alert condition varies based on the selected metric type. The condition displayed on the interface prevails.

Warning

Text message, email, and DingTalk Robot

Info

Email and DingTalk Robot

References