Fluid data acceleration runs as a set of control plane components and JindoRuntime cache nodes in your cluster. Without built-in observability, diagnosing performance bottlenecks or cache inefficiencies requires manual log inspection. By integrating Fluid with Managed Service for Prometheus through the Application Real-Time Monitoring Service (ARMS) console, you get pre-built dashboards that surface real-time metrics for the Fluid control plane and the JindoRuntime cache system.
Prerequisites
Before you begin, ensure that you have:
-
Managed Service for Prometheus enabled for a Container Service for Kubernetes (ACK) cluster or ACK Serverless cluster. For more information, see Managed Service for Prometheus.
-
The cloud-native AI suite deployed and Fluid data acceleration enabled. For more information, see Deploy the cloud-native AI suite.
-
To use all features of the Fluid control plane dashboard, install ack-fluid 0.9.7 or later in the cluster.
-
To use all features of the Fluid JindoRuntime cache system dashboard, install ack-fluid 1.0.11 or later.
-
Limitations
The cache system dashboard supports only cache runtime components of the JindoRuntime type (JindoCache engine).
Step 1: Enable Prometheus scraping for Fluid
-
Log on to the ARMS console.
-
In the left-side navigation pane, click Integration Center. In the AI section, click the Fluid card.
-
In the Select a Kubernetes cluster section, select the target cluster. If the page shows that Fluid is already installed, skip the steps in this section.
-
In the Configuration Information section, configure the following parameters, then click OK.
Parameter Description Name (Not required) A unique name for the Fluid exporter. Leave it blank if not needed. metrics collection interval (seconds) The interval at which the service collects monitoring data. -
To verify the integration, go to Integration Management in the left-side navigation pane. On the Integrated Addons tab, click the Fluid card. On the Environments tab, click View Details in the Actions column to confirm the Fluid component and alert rules are configured.
Step 2: View the Fluid dashboards
Two dashboards are available after integration:
-
Fluid Control Plane: monitoring data for the Fluid control plane components
-
Fluid JindoRuntime Dashboard: monitoring data for the JindoRuntime cache system
You can access the dashboards from the ACK console or the ARMS console.
View dashboards from the ACK console (recommended)
-
Log on to the ACK console. In the left-side navigation pane, click Clusters.
-
On the Clusters page, click the cluster where Fluid is installed. In the left-side pane, choose Operations > Prometheus Monitoring.
-
On the Prometheus Monitoring page, choose Others > Fluid Control Plane to open the control plane dashboard.
-
On the Prometheus Monitoring page, choose Others > Fluid JindoRuntime Dashboard to open the JindoRuntime cache system dashboard.
View dashboards from the ARMS console
-
Log on to the ARMS console.
-
In the left-side navigation pane, click Integration Management. On the Query Dashboards tab, select Fluid from the component drop-down list.
-
Click Fluid Control Plane to view the control plane dashboard, or click Fluid JindoRuntime Dashboard to view the JindoRuntime cache system dashboard.
Dashboard reference
Fluid control plane dashboard
The control plane dashboard shows the status and performance of Fluid control plane components. For a full list of metrics and their descriptions, see Fluid dashboard parameters.
| Section | What you can monitor |
|---|---|
| Component running status | Number of running Fluid control plane pods, restart count, and restart timestamps |
| Fluid Controller Detailed Indicators | Controller workload (busy/idle), processing failures, and Kubernetes API request volume |
| Fluid Webhook Detailed Indicators | Webhook resource usage, request throughput, and request processing latency |
| Resource usage | CPU and memory usage per control plane component, network transmit rate, and network receive rate |
Screenshots of each section:
-
Component running status

-
Fluid Controller Detailed Indicators

-
Fluid Webhook Detailed Indicators

-
Resource usage

Fluid JindoRuntime cache system dashboard
The JindoRuntime cache system dashboard shows the health and performance of the distributed cache. For a full list of metrics and their descriptions, see Fluid dashboard parameters.
| Section | What you can monitor |
|---|---|
| Dataset Overview | Healthy pod counts for the master, worker, and FUSE components, and resource configuration per pod |
| Cache System Metrics | Cache usage, cache hit ratio, aggregated bandwidth, and QPS for file metadata operations |
| FUSE Metrics (via CSI) | I/O per FUSE pod, metadata operation latency and QPS, read/write latency and QPS — for FUSE clients mounted through the Fluid CSI plug-in |
| FUSE Metrics (via Sidecar) | Metadata operation latency and QPS, read/write latency and QPS — for FUSE clients mounted through the Fluid FUSE sidecar |
Screenshots of each section:
-
Dataset Overview

-
Cache System Metrics

-
FUSE Metrics (via CSI)

-
FUSE Metrics (via Sidecar)

Metrics reference
Fluid control plane metrics
The following metrics are exposed by the Fluid control plane components.
| Metric | Type | Description |
|---|---|---|
| dataset_ufs_total_size | Gauge | Size of datasets mounted to existing Dataset objects in the cluster |
| dataset_ufs_file_num | Gauge | Number of datasets mounted to existing Dataset objects in the cluster |
| runtime_setup_error_total | Counter | Number of runtime startup failures during controller reconciliation |
| runtime_sync_healthcheck_error_total | Counter | Number of runtime health check failures during controller reconciliation |
| controller_runtime_reconcile_time_seconds_bucket | Histogram | Duration of the reconciliation process |
| controller_runtime_reconcile_errors_total | Counter | Number of reconciliation failures |
| controller_runtime_reconcile_total | Counter | Number of successful reconciliations |
| controller_runtime_max_concurrent_reconciles | Gauge | Maximum number of concurrent reconciliations supported by the controller |
| controller_runtime_active_workers | Gauge | Number of active reconciliations in the controller |
| workqueue_adds_total | Counter | Number of Add events processed by the controller workqueue |
| workqueue_depth | Gauge | Current length of the controller workqueue |
| workqueue_queue_duration_seconds_bucket | Histogram | Time a pending object has been waiting in the controller workqueue |
| workqueue_work_duration_seconds_bucket | Histogram | Duration distribution of tasks completed by the controller |
| workqueue_unfinished_work_seconds | Gauge | Total duration of all tasks currently being processed in the workqueue |
| workqueue_longest_running_processor_seconds | Gauge | Longest time the controller has spent processing a single task |
| rest_client_requests_total | Counter | HTTP request count, broken down by status code, method, and host |
| rest_client_request_duration_seconds_bucket | Histogram | HTTP response latency, broken down by verb and URL |
| controller_runtime_webhook_requests_in_flight | Gauge | Number of requests currently being processed by the webhook |
| controller_runtime_webhook_requests_total | Counter | Total number of requests processed by the webhook |
| controller_runtime_webhook_latency_seconds_bucket | Histogram | Request processing latency of the webhook |
| process_cpu_seconds_total | Counter | Total CPU time consumed by the process |
| process_resident_memory_bytes | Gauge | Resident memory used by the process |
JindoCache server metrics
The following metrics are exposed by the JindoCache server components.
| Metric | Type | Description |
|---|---|---|
| jindocache_server_total_stsnodes_num | Gauge | Number of alive worker replicas in the distributed cache system |
| jindocache_server_total_disk_cap | Gauge | Maximum disk cache capacity (including RAM disks such as tmpfs) |
| jindocache_server_total_used_disk_cap | Gauge | Disk cache used (including RAM disks such as tmpfs) |
| jindocache_server_total_mem_cap | Gauge | Maximum RAM cache capacity |
| jindocache_server_total_used_mem_cap | Gauge | RAM cache used |
| jindocache_server_total_used_rocksdb_cap | Gauge | RocksDB storage used in the distributed cache system |
| jindocache_server_backend_read_bytes_total | Gauge | Total bytes read from the underlying storage system (cache miss). Unit: bytes |
| jindocache_server_backend_read_time_total | Gauge | Total duration of reads from the underlying storage system. Unit: microseconds |
| jindocache_server_backend_readop_num_total | Gauge | Total number of reads from the underlying storage system, which equals the number of blocks in the JindoCache |
| jindocache_server_backend_read_bytes_time_total_window | Gauge | Duration of reads from the underlying storage system within a 1-minute window. Unit: microseconds |
| jindocache_server_backend_read_bytes_total_window | Gauge | Bytes read from the underlying storage system within a 1-minute window. Unit: bytes |
| jindocache_server_remote_read_bytes_total | Gauge | Total bytes served by remote cache hits (data and application on different nodes). Unit: bytes |
| jindocache_server_remote_read_time_total | Gauge | Total duration of remote cache hits. Unit: microseconds |
| jindocache_server_remote_readop_num_total | Gauge | Total number of remote cache hits |
| jindocache_server_remote_read_bytes_time_total_window | Gauge | Duration of remote cache hits within a 1-minute window. Unit: microseconds |
| jindocache_server_remote_read_bytes_total_window | Gauge | Bytes served by remote cache hits within a 1-minute window. Unit: bytes |
| jindocache_server_local_read_bytes_total | Gauge | Total bytes served by local cache hits (data and application on the same node). Unit: bytes |
| jindocache_server_local_read_time_total | Gauge | Total duration of local cache hits. Unit: microseconds |
| jindocache_server_local_readop_num_total | Gauge | Total number of local cache hits |
| jindocache_server_local_read_bytes_time_total_window | Gauge | Duration of local cache hits within a 1-minute window. Unit: microseconds |
| jindocache_server_local_read_bytes_total_window | Gauge | Bytes served by local cache hits within a 1-minute window. Unit: bytes |
| jindocache_server_ns_filelet_op_count_total | Gauge | Total file metadata operations on the JindoCache master component (getAttr and listStatus) |
| jindocache_server_ns_filelet_op_time_total | Gauge | Total duration of file metadata operations on the JindoCache master component |
| jindocache_server_ns_get_attr_op_total | Gauge | Number of getAttr operations on the JindoCache master component |
| jindocache_server_ns_get_attr_time_total | Gauge | Duration of getAttr operations on the JindoCache master component |
| jindocache_server_ns_get_attr_fallback_op_total | Gauge | Number of times the JindoCache master component read file metadata from the underlying storage system |
| jindocache_server_ns_list_status_op_total | Gauge | Number of listStatus operations on the JindoCache master component |
| jindocache_server_ns_list_status_time_total | Gauge | Duration of listStatus operations on the JindoCache master component |
| jindocache_server_ns_list_status_fallback_op_total | Gauge | Number of times the JindoCache master component read directory listings from the underlying storage system |
| jindocache_server_dist_get_attr_op_num_total | Gauge | Number of getAttr operations on the JindoCache client side |
| jindocache_server_dist_get_attr_time_total | Gauge | Duration of getAttr operations on the JindoCache client side |
| jindocache_server_dist_list_dir_op_num_total | Gauge | Number of listStatus operations on the JindoCache client side |
| jindocache_server_dist_list_dir_time_total | Gauge | Duration of listStatus operations on the JindoCache client side |
JindoCache FUSE client metrics
The following metrics are exposed by the Jindo FUSE client.
| Metric | Type | Description |
|---|---|---|
| jindo_fuse_open_count | Gauge | Number of open operations |
| jindo_fuse_open_latency | Gauge | P50 latency of open operations |
| jindo_fuse_open_latency_80 | Gauge | P80 latency of open operations |
| jindo_fuse_open_latency_90 | Gauge | P90 latency of open operations |
| jindo_fuse_open_latency_99 | Gauge | P99 latency of open operations |
| jindo_fuse_open_latency_999 | Gauge | P99.9 latency of open operations |
| jindo_fuse_open_latency_9999 | Gauge | P99.99 latency of open operations |
| jindo_fuse_getattr_count | Gauge | Number of getAttr operations |
| jindo_fuse_getattr_latency | Gauge | P50 latency of getAttr operations |
| jindo_fuse_getattr_latency_80 | Gauge | P80 latency of getAttr operations |
| jindo_fuse_getattr_latency_90 | Gauge | P90 latency of getAttr operations |
| jindo_fuse_getattr_latency_99 | Gauge | P99 latency of getAttr operations |
| jindo_fuse_getattr_latency_999 | Gauge | P99.9 latency of getAttr operations |
| jindo_fuse_getattr_latency_9999 | Gauge | P99.99 latency of getAttr operations |
| jindo_fuse_readdir_count | Gauge | Number of readDir operations |
| jindo_fuse_readdir_latency | Gauge | P50 latency of readDir operations |
| jindo_fuse_readdir_latency_80 | Gauge | P80 latency of readDir operations |
| jindo_fuse_readdir_latency_90 | Gauge | P90 latency of readDir operations |
| jindo_fuse_readdir_latency_99 | Gauge | P99 latency of readDir operations |
| jindo_fuse_readdir_latency_999 | Gauge | P99.9 latency of readDir operations |
| jindo_fuse_readdir_latency_9999 | Gauge | P99.99 latency of readDir operations |
| jindo_fuse_read_count | Gauge | Number of read operations |
| jindo_fuse_read_latency | Gauge | P50 latency of read operations |
| jindo_fuse_read_latency_80 | Gauge | P80 latency of read operations |
| jindo_fuse_read_latency_90 | Gauge | P90 latency of read operations |
| jindo_fuse_read_latency_99 | Gauge | P99 latency of read operations |
| jindo_fuse_read_latency_999 | Gauge | P99.9 latency of read operations |
| jindo_fuse_read_latency_9999 | Gauge | P99.99 latency of read operations |
| jindo_fuse_write_count | Gauge | Number of write operations |
| jindo_fuse_write_latency | Gauge | P50 latency of write operations |
| jindo_fuse_write_latency_80 | Gauge | P80 latency of write operations |
| jindo_fuse_write_latency_90 | Gauge | P90 latency of write operations |
| jindo_fuse_write_latency_99 | Gauge | P99 latency of write operations |
| jindo_fuse_write_latency_999 | Gauge | P99.9 latency of write operations |
| jindo_fuse_write_latency_9999 | Gauge | P99.99 latency of write operations |