All Products
Search
Document Center

Container Service for Kubernetes:Enable Managed Service for Prometheus for the Fluid component

Last Updated:Mar 26, 2026

Fluid data acceleration runs as a set of control plane components and JindoRuntime cache nodes in your cluster. Without built-in observability, diagnosing performance bottlenecks or cache inefficiencies requires manual log inspection. By integrating Fluid with Managed Service for Prometheus through the Application Real-Time Monitoring Service (ARMS) console, you get pre-built dashboards that surface real-time metrics for the Fluid control plane and the JindoRuntime cache system.

Prerequisites

Before you begin, ensure that you have:

  • Managed Service for Prometheus enabled for a Container Service for Kubernetes (ACK) cluster or ACK Serverless cluster. For more information, see Managed Service for Prometheus.

  • The cloud-native AI suite deployed and Fluid data acceleration enabled. For more information, see Deploy the cloud-native AI suite.

    • To use all features of the Fluid control plane dashboard, install ack-fluid 0.9.7 or later in the cluster.

    • To use all features of the Fluid JindoRuntime cache system dashboard, install ack-fluid 1.0.11 or later.

Limitations

The cache system dashboard supports only cache runtime components of the JindoRuntime type (JindoCache engine).

Step 1: Enable Prometheus scraping for Fluid

  1. Log on to the ARMS console.

  2. In the left-side navigation pane, click Integration Center. In the AI section, click the Fluid card.

  3. In the Select a Kubernetes cluster section, select the target cluster. If the page shows that Fluid is already installed, skip the steps in this section.

  4. In the Configuration Information section, configure the following parameters, then click OK.

    Parameter Description
    Name (Not required) A unique name for the Fluid exporter. Leave it blank if not needed.
    metrics collection interval (seconds) The interval at which the service collects monitoring data.
  5. To verify the integration, go to Integration Management in the left-side navigation pane. On the Integrated Addons tab, click the Fluid card. On the Environments tab, click View Details in the Actions column to confirm the Fluid component and alert rules are configured.

Step 2: View the Fluid dashboards

Two dashboards are available after integration:

  • Fluid Control Plane: monitoring data for the Fluid control plane components

  • Fluid JindoRuntime Dashboard: monitoring data for the JindoRuntime cache system

You can access the dashboards from the ACK console or the ARMS console.

View dashboards from the ACK console (recommended)

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the cluster where Fluid is installed. In the left-side pane, choose Operations > Prometheus Monitoring.

  3. On the Prometheus Monitoring page, choose Others > Fluid Control Plane to open the control plane dashboard.

  4. On the Prometheus Monitoring page, choose Others > Fluid JindoRuntime Dashboard to open the JindoRuntime cache system dashboard.

View dashboards from the ARMS console

  1. Log on to the ARMS console.

  2. In the left-side navigation pane, click Integration Management. On the Query Dashboards tab, select Fluid from the component drop-down list.

  3. Click Fluid Control Plane to view the control plane dashboard, or click Fluid JindoRuntime Dashboard to view the JindoRuntime cache system dashboard.

Dashboard reference

Fluid control plane dashboard

The control plane dashboard shows the status and performance of Fluid control plane components. For a full list of metrics and their descriptions, see Fluid dashboard parameters.

Section What you can monitor
Component running status Number of running Fluid control plane pods, restart count, and restart timestamps
Fluid Controller Detailed Indicators Controller workload (busy/idle), processing failures, and Kubernetes API request volume
Fluid Webhook Detailed Indicators Webhook resource usage, request throughput, and request processing latency
Resource usage CPU and memory usage per control plane component, network transmit rate, and network receive rate

Screenshots of each section:

  • Component running status组件运行状态

  • Fluid Controller Detailed Indicators控制器详细指标

  • Fluid Webhook Detailed Indicatorswebhook详细指标

  • Resource usage资源使用

Fluid JindoRuntime cache system dashboard

The JindoRuntime cache system dashboard shows the health and performance of the distributed cache. For a full list of metrics and their descriptions, see Fluid dashboard parameters.

Section What you can monitor
Dataset Overview Healthy pod counts for the master, worker, and FUSE components, and resource configuration per pod
Cache System Metrics Cache usage, cache hit ratio, aggregated bandwidth, and QPS for file metadata operations
FUSE Metrics (via CSI) I/O per FUSE pod, metadata operation latency and QPS, read/write latency and QPS — for FUSE clients mounted through the Fluid CSI plug-in
FUSE Metrics (via Sidecar) Metadata operation latency and QPS, read/write latency and QPS — for FUSE clients mounted through the Fluid FUSE sidecar

Screenshots of each section:

  • Dataset Overview

    Dataset Overview

  • Cache System Metrics

    Cache System Metrics

  • FUSE Metrics (via CSI)

    FUSE Metrics via CSI

  • FUSE Metrics (via Sidecar)

    FUSE Metrics via Sidecar

Metrics reference

Fluid control plane metrics

The following metrics are exposed by the Fluid control plane components.

Metric Type Description
dataset_ufs_total_size Gauge Size of datasets mounted to existing Dataset objects in the cluster
dataset_ufs_file_num Gauge Number of datasets mounted to existing Dataset objects in the cluster
runtime_setup_error_total Counter Number of runtime startup failures during controller reconciliation
runtime_sync_healthcheck_error_total Counter Number of runtime health check failures during controller reconciliation
controller_runtime_reconcile_time_seconds_bucket Histogram Duration of the reconciliation process
controller_runtime_reconcile_errors_total Counter Number of reconciliation failures
controller_runtime_reconcile_total Counter Number of successful reconciliations
controller_runtime_max_concurrent_reconciles Gauge Maximum number of concurrent reconciliations supported by the controller
controller_runtime_active_workers Gauge Number of active reconciliations in the controller
workqueue_adds_total Counter Number of Add events processed by the controller workqueue
workqueue_depth Gauge Current length of the controller workqueue
workqueue_queue_duration_seconds_bucket Histogram Time a pending object has been waiting in the controller workqueue
workqueue_work_duration_seconds_bucket Histogram Duration distribution of tasks completed by the controller
workqueue_unfinished_work_seconds Gauge Total duration of all tasks currently being processed in the workqueue
workqueue_longest_running_processor_seconds Gauge Longest time the controller has spent processing a single task
rest_client_requests_total Counter HTTP request count, broken down by status code, method, and host
rest_client_request_duration_seconds_bucket Histogram HTTP response latency, broken down by verb and URL
controller_runtime_webhook_requests_in_flight Gauge Number of requests currently being processed by the webhook
controller_runtime_webhook_requests_total Counter Total number of requests processed by the webhook
controller_runtime_webhook_latency_seconds_bucket Histogram Request processing latency of the webhook
process_cpu_seconds_total Counter Total CPU time consumed by the process
process_resident_memory_bytes Gauge Resident memory used by the process

JindoCache server metrics

The following metrics are exposed by the JindoCache server components.

Metric Type Description
jindocache_server_total_stsnodes_num Gauge Number of alive worker replicas in the distributed cache system
jindocache_server_total_disk_cap Gauge Maximum disk cache capacity (including RAM disks such as tmpfs)
jindocache_server_total_used_disk_cap Gauge Disk cache used (including RAM disks such as tmpfs)
jindocache_server_total_mem_cap Gauge Maximum RAM cache capacity
jindocache_server_total_used_mem_cap Gauge RAM cache used
jindocache_server_total_used_rocksdb_cap Gauge RocksDB storage used in the distributed cache system
jindocache_server_backend_read_bytes_total Gauge Total bytes read from the underlying storage system (cache miss). Unit: bytes
jindocache_server_backend_read_time_total Gauge Total duration of reads from the underlying storage system. Unit: microseconds
jindocache_server_backend_readop_num_total Gauge Total number of reads from the underlying storage system, which equals the number of blocks in the JindoCache
jindocache_server_backend_read_bytes_time_total_window Gauge Duration of reads from the underlying storage system within a 1-minute window. Unit: microseconds
jindocache_server_backend_read_bytes_total_window Gauge Bytes read from the underlying storage system within a 1-minute window. Unit: bytes
jindocache_server_remote_read_bytes_total Gauge Total bytes served by remote cache hits (data and application on different nodes). Unit: bytes
jindocache_server_remote_read_time_total Gauge Total duration of remote cache hits. Unit: microseconds
jindocache_server_remote_readop_num_total Gauge Total number of remote cache hits
jindocache_server_remote_read_bytes_time_total_window Gauge Duration of remote cache hits within a 1-minute window. Unit: microseconds
jindocache_server_remote_read_bytes_total_window Gauge Bytes served by remote cache hits within a 1-minute window. Unit: bytes
jindocache_server_local_read_bytes_total Gauge Total bytes served by local cache hits (data and application on the same node). Unit: bytes
jindocache_server_local_read_time_total Gauge Total duration of local cache hits. Unit: microseconds
jindocache_server_local_readop_num_total Gauge Total number of local cache hits
jindocache_server_local_read_bytes_time_total_window Gauge Duration of local cache hits within a 1-minute window. Unit: microseconds
jindocache_server_local_read_bytes_total_window Gauge Bytes served by local cache hits within a 1-minute window. Unit: bytes
jindocache_server_ns_filelet_op_count_total Gauge Total file metadata operations on the JindoCache master component (getAttr and listStatus)
jindocache_server_ns_filelet_op_time_total Gauge Total duration of file metadata operations on the JindoCache master component
jindocache_server_ns_get_attr_op_total Gauge Number of getAttr operations on the JindoCache master component
jindocache_server_ns_get_attr_time_total Gauge Duration of getAttr operations on the JindoCache master component
jindocache_server_ns_get_attr_fallback_op_total Gauge Number of times the JindoCache master component read file metadata from the underlying storage system
jindocache_server_ns_list_status_op_total Gauge Number of listStatus operations on the JindoCache master component
jindocache_server_ns_list_status_time_total Gauge Duration of listStatus operations on the JindoCache master component
jindocache_server_ns_list_status_fallback_op_total Gauge Number of times the JindoCache master component read directory listings from the underlying storage system
jindocache_server_dist_get_attr_op_num_total Gauge Number of getAttr operations on the JindoCache client side
jindocache_server_dist_get_attr_time_total Gauge Duration of getAttr operations on the JindoCache client side
jindocache_server_dist_list_dir_op_num_total Gauge Number of listStatus operations on the JindoCache client side
jindocache_server_dist_list_dir_time_total Gauge Duration of listStatus operations on the JindoCache client side

JindoCache FUSE client metrics

The following metrics are exposed by the Jindo FUSE client.

Metric Type Description
jindo_fuse_open_count Gauge Number of open operations
jindo_fuse_open_latency Gauge P50 latency of open operations
jindo_fuse_open_latency_80 Gauge P80 latency of open operations
jindo_fuse_open_latency_90 Gauge P90 latency of open operations
jindo_fuse_open_latency_99 Gauge P99 latency of open operations
jindo_fuse_open_latency_999 Gauge P99.9 latency of open operations
jindo_fuse_open_latency_9999 Gauge P99.99 latency of open operations
jindo_fuse_getattr_count Gauge Number of getAttr operations
jindo_fuse_getattr_latency Gauge P50 latency of getAttr operations
jindo_fuse_getattr_latency_80 Gauge P80 latency of getAttr operations
jindo_fuse_getattr_latency_90 Gauge P90 latency of getAttr operations
jindo_fuse_getattr_latency_99 Gauge P99 latency of getAttr operations
jindo_fuse_getattr_latency_999 Gauge P99.9 latency of getAttr operations
jindo_fuse_getattr_latency_9999 Gauge P99.99 latency of getAttr operations
jindo_fuse_readdir_count Gauge Number of readDir operations
jindo_fuse_readdir_latency Gauge P50 latency of readDir operations
jindo_fuse_readdir_latency_80 Gauge P80 latency of readDir operations
jindo_fuse_readdir_latency_90 Gauge P90 latency of readDir operations
jindo_fuse_readdir_latency_99 Gauge P99 latency of readDir operations
jindo_fuse_readdir_latency_999 Gauge P99.9 latency of readDir operations
jindo_fuse_readdir_latency_9999 Gauge P99.99 latency of readDir operations
jindo_fuse_read_count Gauge Number of read operations
jindo_fuse_read_latency Gauge P50 latency of read operations
jindo_fuse_read_latency_80 Gauge P80 latency of read operations
jindo_fuse_read_latency_90 Gauge P90 latency of read operations
jindo_fuse_read_latency_99 Gauge P99 latency of read operations
jindo_fuse_read_latency_999 Gauge P99.9 latency of read operations
jindo_fuse_read_latency_9999 Gauge P99.99 latency of read operations
jindo_fuse_write_count Gauge Number of write operations
jindo_fuse_write_latency Gauge P50 latency of write operations
jindo_fuse_write_latency_80 Gauge P80 latency of write operations
jindo_fuse_write_latency_90 Gauge P90 latency of write operations
jindo_fuse_write_latency_99 Gauge P99 latency of write operations
jindo_fuse_write_latency_999 Gauge P99.9 latency of write operations
jindo_fuse_write_latency_9999 Gauge P99.99 latency of write operations

What's next