The cloud-controller-manager bridges Kubernetes components and cloud service providers through the Kubernetes API. This document describes the metrics exposed by cloud-controller-manager, explains each monitoring dashboard panel, and provides guidance for diagnosing common metric anomalies.
Prerequisites
Before you begin, ensure that you have completed the setup described in View the monitoring dashboards for control plane components.
Metrics
The following table lists the metrics for cloud-controller-manager. Histogram metrics use bucket-based sampling and are queried with histogram_quantile() in PromQL.
| Metric | Type | Description |
|---|---|---|
ccm_slb_latency_ms |
Histogram | Classical Load Balancer (CLB) synchronization delay. Unit: ms. Bucket thresholds: {100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000}. |
ccm_node_latency_ms |
Histogram | Node synchronization delay. Unit: ms. Same bucket thresholds as ccm_slb_latency_ms. |
ccm_route_latency_ms |
Histogram | Route synchronization delay. Unit: ms. Same bucket thresholds as ccm_slb_latency_ms. |
workqueue_adds_total |
Counter | Number of Adds events processed by the workqueue. |
workqueue_depth |
Gauge | Current length of the workqueue. A sustained high value indicates the controller cannot process tasks in time, leading to task accumulation. |
workqueue_queue_duration_seconds_bucket |
Histogram | Time a task spends waiting in the workqueue before processing. Bucket thresholds: {10^-8, 10^-7, 10^-6, 10^-5, 10^-4, 10^-3, 10^-2, 10^-1, 1, 10}. Unit: seconds. |
memory_utilization_byte |
Gauge | Memory usage. Unit: bytes. |
cpu_utilization_core |
Gauge | CPU capacity in use. Unit: core. |
rest_client_requests_total |
Counter | Number of HTTP requests to kube-apiserver, broken down by status code, method, and host. |
rest_client_request_duration_seconds_bucket |
Histogram | HTTP response latency for requests to kube-apiserver, broken down by verb and URL. |
The following metrics are deprecated. Remove any alerts and monitoring rules that depend on them.
-
cpu_utilization_ratio: CPU utilization -
memory_utilization_ratio: Memory usage
Dashboard usage guide
The dashboards are built from component metrics and Prometheus Query Language (PromQL) queries. Each dashboard focuses on a specific operational concern.
CCM
The CCM dashboard visualizes synchronization latency for the three resource types that cloud-controller-manager manages: routes, nodes, and CLB instances.
| Panel name | PromQL | Description |
|---|---|---|
| Route Synchronization Delay | histogram_quantile($quantile, sum(rate(ccm_route_latencies_duration_milliseconds_bucket[$interval])) by (verb, le)) |
Route synchronization delay. Unit: ms. |
| Node Synchronization Delay | histogram_quantile($quantile, sum(rate(ccm_node_latencies_duration_milliseconds_bucket[$interval])) by (verb, le)) |
Node synchronization delay. Unit: ms. |
| CLB (Classical Load Balancer) Synchronization Delay | histogram_quantile($quantile, sum(rate(ccm_slb_latencies_duration_milliseconds_bucket[$interval])) by (verb, le)) |
CLB synchronization delay. Unit: ms. |
Queue
The Queue dashboard shows the workqueue state: how fast tasks are arriving, how many are pending, and how long tasks wait before processing.
| Panel name | PromQL | Description |
|---|---|---|
| Workqueue Enqueue Rate | sum(rate(workqueue_adds_total{job="ack-cloud-controller-manager"}[$interval])) by (name) |
Rate of Adds events entering the workqueue during the specified interval. |
| Workqueue Depth | workqueue_depth{job="ack-cloud-controller-manager"} |
Current length of the workqueue. |
| Workqueue Processing Delay | histogram_quantile($quantile, sum(rate(workqueue_queue_duration_seconds_bucket{job="ack-cloud-controller-manager"}[$interval])) by (name, le)) |
Time events spend waiting in the workqueue before processing. |
Resources
The Resources dashboard tracks CPU and memory consumption for the cloud-controller-manager process.
| Panel name | PromQL | Description |
|---|---|---|
| Memory Usage | memory_utilization_byte{container="cloud-controller-manager"} |
Memory usage. Unit: bytes. |
| CPU Usage | cpu_utilization_core{container="cloud-controller-manager"}*1000 |
CPU capacity in use. Unit: millicore. |
Kube API
The Kube API dashboard shows the rate and distribution of HTTP requests that cloud-controller-manager sends to kube-apiserver, broken down by HTTP status code range.
| Panel name | PromQL | Description |
|---|---|---|
| Kube API Request QPS |
|
Queries per second (QPS) of HTTP requests that cloud-controller-manager sends to kube-apiserver, broken down by method and HTTP status code. |
Common metric anomalies
CLB synchronization delay
| Threshold | Cause | Action | |
|---|---|---|---|
| Normal | CLB (Classical Load Balancer) Synchronization Delay ≤ 10s | — | — |
| Abnormal | CLB (Classical Load Balancer) Synchronization Delay > 10s | CLB synchronization is taking too long. | Check for anomalous activity in the service. |
Workqueue depth
| Threshold | Cause | Action | |
|---|---|---|---|
| Normal | Workqueue Depth < 10 | — | — |
| Abnormal | Workqueue Depth > 10 | The workqueue holds more items than the controller can process in time. | Reduce the frequency of changes to nodes, Pods, and Services in the cluster. |
What's next
For metrics, dashboard usage, and anomaly guidance for other control plane components, see: