Kubernetes clusters use etcd as a persistent storage device to store cluster state and metadata. As a distributed key-value store, etcd ensures strong consistency and high availability (HA) for cluster data. This topic describes the metrics for the etcd component, explains how to use the dashboard, and provides an analysis of common metric anomalies.
Before you begin
Access the dashboard
For more information, see View the monitoring dashboard for control plane components.
Metric checklist
Metrics are a way for a component to expose its status and parameters. The following table lists the metrics for the etcd component.
Metric | Type | Description |
cpu_utilization_core | Gauge | CPU usage. Unit: cores. |
etcd_server_has_leader | Gauge | etcd uses the Raft consensus algorithm. In Raft, one member in the cluster is elected as the Leader (primary node), and the other members become Followers (secondary nodes). The Leader periodically sends heartbeats to all members to maintain cluster stability. This metric indicates whether a leader exists among the etcd members.
|
etcd_server_is_leader | Gauge | Indicates whether the etcd member is the leader.
|
etcd_server_leader_changes_seen_total | Counter | The number of times the leader has changed for an etcd member over a period of time. |
etcd_mvcc_db_total_size_in_bytes | Gauge | The total size of the etcd member database (DB). |
etcd_mvcc_db_total_size_in_use_in_bytes | Gauge | The actual size in use of the etcd member DB. |
etcd_disk_backend_commit_duration_seconds_bucket | Histogram | The latency of backend commits in etcd. This is the time it takes for data changes to be written to the storage backend and successfully committed. The bucket thresholds are |
etcd_debugging_mvcc_keys_total | Gauge | The total number of keys stored in etcd. |
etcd_server_proposals_committed_total | Gauge | etcd uses the Raft consensus algorithm. In Raft, any action that attempts to change the system state is submitted as a proposal. This metric indicates the number of proposals that have been successfully committed to the Raft log in etcd. |
etcd_server_proposals_applied_total | Gauge | The number of proposals that have been successfully applied or executed. |
etcd_server_proposals_pending | Gauge | The number of proposals that are pending. |
etcd_server_proposals_failed_total | Counter | The number of proposals that have failed. |
memory_utilization_byte | Gauge | Memory usage. Unit: bytes. |
The following resource utilization metrics are deprecated. Remove any alerts or monitoring that depend on these metrics.
cpu_utilization_ratio: CPU utilization.
memory_utilization_ratio: Memory usage.
Dashboard guide
The dashboard is built from component metrics and related Prometheus Query Language (PromQL) queries. The following sections describe the observability display and features of the dashboard.
Observability Display

Feature Analysis
Name | PromQL | Description |
Etcd Health Status |
|
|
Leader Changes In The Last Day | changes(etcd_server_leader_changes_seen_total{job="etcd"}[1d]) | The number of times the leader has changed in the etcd cluster over the last day. |
Memory Usage | memory_utilization_byte{container="etcd"} | Memory usage. Unit: bytes. |
CPU Usage | cpu_utilization_core{container="etcd"}*1000 | CPU usage. Unit: millicores. |
Disk Size | etcd_mvcc_db_total_size_in_bytes | The total size of the etcd backend DB. |
etcd_mvcc_db_total_size_in_use_in_bytes | The actual size in use of the etcd backend DB. | |
Total Key-value Pairs | etcd_debugging_mvcc_keys_total | The total number of key-value (KV) pairs in the etcd cluster. |
Backend Commit Latency | histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket{job="etcd"}[5m])) by (instance, le)) | The backend commit latency. This is the time required for a proposal to be persistently stored in the etcd database. |
Raft Proposal Status | rate(etcd_server_proposals_failed_total{job="etcd"}[1m]) | The rate of failed Raft proposal submissions per minute. |
etcd_server_proposals_pending{job="etcd"} | The total number of pending Raft proposals. | |
etcd_server_proposals_committed_total{job="etcd"} - etcd_server_proposals_applied_total{job="etcd"} | The difference between the number of committed and applied Raft proposals. This indicates the number of proposals that have been committed but not yet executed. |
Common metric anomalies
Etcd Health Status
Normal case | Abnormal case | Description of anomaly |
All three etcd members have a leader, and one of them is the leader. This means | A single member is abnormal. | The abnormal member has |
More than one member is abnormal. | Multiple members have Also, check if any member has |
Backend Commit Latency
Normal case | Abnormal case | Description of anomaly |
The metric is in the range of a few milliseconds to tens of milliseconds. | The latency persists at hundreds of milliseconds or even seconds. | There is an anomaly in disk I/O. |
Raft Proposal Anomalies
Normal case | Abnormal case | Description of anomaly |
The rate of failed Raft proposals is 0. | The number of failed Raft proposals is greater than 0. | Some Raft proposals failed to be submitted. If this number is high, further investigation is required. |
The total number of pending Raft proposals is 0. | The total number of pending Raft proposals is greater than 0. | There is a backlog of submitted Raft proposals. This is usually because the apply speed is slow. You can analyze this in conjunction with the backend commit latency. |
The difference between the number of committed and applied Raft proposals is 0. | The difference between the number of committed and applied proposals is greater than 0. | There are too many client requests, which puts high pressure on etcd. If this value exceeds 5000, etcd rejects subsequent requests and returns |
References
For more information about the metrics, dashboard guides, and common metric anomalies for other control plane components, see kube-apiserver component metrics, kube-scheduler component metrics, Component metrics, and cloud-controller-manager component metrics.