All Products
Search
Document Center

Container Service for Kubernetes:Enable monitoring for the Fluid control plane components

Last Updated:Sep 06, 2023

Fluid is a Kubernetes-native distributed dataset orchestration and acceleration engine that serves data-intensive applications, such as big data applications and AI applications, in cloud-native scenarios. Fluid provides application-oriented dataset abstraction, a scalable data engine plug-in, automated data operations, data acceleration, and runtime platform agnostic. You can install the Fluid monitoring component on Prometheus instances of Prometheus Service with a few clicks and use the out-of-the-box dashboards provided by Prometheus Service to monitor Fluid. This topic describes how to enable Prometheus Service for Fluid.

Table of contents

Prerequisites

  • Prometheus Service is enabled for your Container Service for Kubernetes (ACK) or ACK Serverless cluster. For more information, see Enable Managed Service for Prometheus.

  • The cloud-native AI suite is deployed and Fluid data acceleration is enabled. The version of the ack-fluid component is 0.9.7 or later. For more information, see Deploy the cloud-native AI suite.

Limits

  • You can install the Fluid monitoring component only on Prometheus instances whose type is Prometheus for Container Service.

  • You can monitor only Fluid control plane components, such as the Fluid controllers and Fluid webhook.

Install the Fluid monitoring component

Use the integration center of Prometheus Service

  1. Log on to the Prometheus Service console.

  2. In the left-side navigation pane, click Monitoring List. On the Prometheus Service page, click the name of the Prometheus instance that you want to manage to navigate to the Integration Center page.

  3. Integrate the Fluid component with Prometheus Service

    • If this is the first time you install the Fluid monitoring component, click + Install on the Fluid card in the uninstalled section of the Integration Center page. Install

    • If the Fluid monitoring component is already installed on your Prometheus instance, skip the preceding step.

  4. Configure the parameters in the STEP2 section and click OK.

    Parameter

    Description

    Exporter Name

    The unique name of the Fluid exporter.

    Metrics scrape interval (seconds)

    The interval at which monitoring data is collected.

    • You can view the monitoring metrics on the Metrics tab in the STEP2 section.

    • The installed Fluid monitoring component is displayed in the Installed section of the Integration Center page. Click the Fluid monitoring component. In the panel that appears, you can view Targets, Metrics, Dashboards, Alerts, Service Discovery Configurations, and Exporter.

Use the integration center of ARMS

  1. Log on to the Application Real-Time Monitoring Service (ARMS) console.

  2. In the left-side navigation pane, click Integration Center. Then, click + Install on the Fluid card in the Application Components section. Install

    If the Fluid monitoring component is already installed on your Prometheus instance, skip the preceding step.

  3. In the upper-right part of the Monitor Fluid panel, select the region where the Kubernetes cluster is deployed.

  4. In the STEP2 section, select the Kubernetes cluster.

  5. In the STEP3 section, configure the parameters and click OK.

    Parameter

    Description

    Exporter Name

    The unique name of the Fluid exporter.

    Metrics scrape interval (seconds)

    The interval at which monitoring data is collected.

    After the Fluid monitoring component is installed, the Fluid card displays Installed 1 Exporter. Click the Fluid card. In the panel that appears, you can view Targets, Metrics, Dashboards, Alerts, Service Discovery Configurations, and Exporter.

View the Fluid dashboard

View the Fluid dashboard from the ACK console (recommended)

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.

  2. On the Clusters page, click the ACK or ACK Serverless cluster in which the Fluid monitoring component is installed. In the left-side navigation pane, choose Operations > Prometheus Monitoring.

  3. On the Prometheus Monitoring page, choose Others > Fluid Control Plane to view the monitoring data.

    In the Fluid dashboard, you can view detailed information about the Fluid control plane components, such as the status of the components, the Fluid controller processing time, the QPS of the Fluid webhook, the request processing latency, and the resource usage of each component. For more information, see Panels.

    • In the Component running status section, you can view the number of Fluid control plane pods that are in the Running state, the number of restarts, and the time of each restart. Component status

    • In the Fluid Controller Detailed Indicators section, you can check whether the Fluid controllers are busy and view information about processing failures and Kubernetes API requests. Fluid Controller Detailed Indicators

    • In the Fluid Webhook Detailed Indicators section, you view the resource usage of the Fluid webhook, the number of processed requests, and the request processing latency. Fluid Webhook Detailed Indicators

    • In the Resource usage section, you can view the resource usage of each Fluid control plane component, the network transmit rate, and the network receive rate. Resource usage

View the Fluid dashboard from the integration center

On the Integration Center page of Prometheus Service or the Integration Center page of ARMS, click the Fluid card. Then, click the Dashboards tab and click Fluid Control Plane at the bottom of the panel to view the monitoring data.

In the Fluid dashboard, you can view detailed information about the Fluid control plane components, such as the status of the components, the Fluid controller processing time, the QPS of the Fluid webhook, the request processing latency, and the resource usage of each component. For more information, see Panels.

  • In the Component running status section, you can view the number of Fluid control plane pods that are in the Running state, the number of restarts, and the time of each restart.

  • In the Fluid Controller Detailed Indicators section, you can check whether the Fluid controllers are busy and view information about processing failures and Kubernetes API requests.

  • In the Fluid Webhook Detailed Indicators section, you view the resource usage of the Fluid webhook, the number of processed requests, and the request processing latency.

  • In the Resource usage section, you can view the resource usage of each Fluid control plane component, the network transmit rate, and the network receive rate.

Metrics

The following table describes the monitoring metrics for the Fluid control plane components.

Metric

Type

Description

dataset_ufs_total_size

Gauge

The size of datasets that are mounted to the existing Dataset objects in the current cluster.

dataset_ufs_file_num

Gauge

The number of datasets that are mounted to the existing Dataset objects in the current cluster.

runtime_setup_error_total

Counter

The number of failures to start up the runtime when the controller reconciles.

runtime_sync_healthcheck_error_total

Counter

The number of runtime health check failures that occur when the controller reconciles.

controller_runtime_reconcile_time_seconds_bucket

Histogram

The duration of the reconciliation process.

controller_runtime_reconcile_errors_total

Counter

The number of reconciliation failures.

controller_runtime_reconcile_total

Counter

The number of successful reconciliations.

controller_runtime_max_concurrent_reconciles

Gauge

The maximum number of concurrent reconciliations supported by the controller.

controller_runtime_active_workers

Gauge

The number of active reconciliations of the controller.

workqueue_adds_total

Counter

The number of Adds events processed by the controller workqueue.

workqueue_depth

Gauge

The length of the controller workqueue.

workqueue_queue_duration_seconds_bucket

Histogram

The amount of time that the pending object has been waiting in the controller workqueue.

workqueue_work_duration_seconds_bucket

Histogram

The distribution of the durations of the tasks that have been completed by the controller.

workqueue_unfinished_work_seconds

Gauge

The total duration of all tasks that are being processed in the controller workqueue.

workqueue_longest_running_processor_seconds

Gauge

The longest duration that the controller has spent to process a task.

rest_client_requests_total

Counter

The number of HTTP requests calculated based on status codes, methods, and hosts.

rest_client_request_duration_seconds_bucket

Histogram

The HTTP response latency calculated based on Verbs and URLs.

controller_runtime_webhook_requests_in_flight

Gauge

The number of requests that are being processed by the webhook.

controller_runtime_webhook_requests_total

Counter

The total number of requests that are processed by the webhook.

controller_runtime_webhook_latency_seconds_bucket

Histogram

The request processing latency of the webhook.

process_cpu_seconds_total

Counter

The CPU uptime.

process_resident_memory_bytes

Gauge

The memory usage.

References