This topic describes the metrics supported by etcd, provides usage notes for the dashboards of etcd, and suggests how to troubleshoot common metric anomalies.
Metrics
Metrics can indicate the status and parameter settings of a component. The following table describes the metrics supported by etcd.
Metric | Type | Description |
cpu_utilization_core | Gauge | The CPU usage. Unit: vCores. |
cpu_utilization_ratio | Gauge | CPU utilization = Number of used vCores/Total number of vCores. Unit: %. |
etcd_server_has_leader | Gauge | Indicates whether the etcd member has a leader. Valid values:
|
etcd_server_is_leader | Gauge | Indicates whether the etcd member is a leader. Valid values:
|
etcd_server_leader_changes_seen_total | Counter | The number of leader changes within a period of time. |
etcd_mvcc_db_total_size_in_bytes | Gauge | The size of the etcd member DB. |
etcd_mvcc_db_total_size_in_use_in_bytes | Gauge | The usage of the etcd member DB. |
etcd_disk_backend_commit_duration_seconds_bucket | Histogram | The etcd backend commit delay. Buckets: |
etcd_debugging_mvcc_keys_total | Gauge | The total number of etcd keys. |
etcd_server_proposals_committed_total | Gauge | The total number of raft proposals committed. |
etcd_server_proposals_applied_total | Gauge | The total number of raft proposals applied. |
etcd_server_proposals_pending | Gauge | The total number of pending raft proposals. |
etcd_server_proposals_failed_total | Counter | The total number of failed raft proposals. |
memory_utilization_byte | Gauge | The memory usage. Unit: bytes. |
memory_utilization_ratio | Gauge | Memory utilization = Amount of used memory/Total amount of memory. Unit: %. |
Usage notes for dashboards
Dashboards are generated based on metrics and Prometheus Query Language (PromQL). The following sections describe the observability and features of the dashboards of etcd.
Observability
Features
Dashboard | PromQL | Description |
Etcd Cluster Healthy |
|
|
Leader Changes for Latest Day | changes(etcd_server_leader_changes_seen_total{job="etcd"}[1d]) | The number of leader changes within the previous day. |
Mem Usage | memory_utilization_byte{container="etcd"} | The memory usage. Unit: bytes. |
CPU Usage | cpu_utilization_core{container="etcd"}*1000 | The CPU usage. Unit: millicores. |
Mem Usage Rate | memory_utilization_ratio{container="etcd"} | The memory utilization. Unit: percentage. |
CPU Usage Rate | cpu_utilization_ratio{container="etcd"} | The CPU utilization. Unit: percentage. |
DB Size |
|
|
kv total | etcd_debugging_mvcc_keys_total | The total number of key-value pairs in the etcd cluster. |
Backend Commit Delay | histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket{job="etcd"}[5m])) by (instance, le)) | The DB commit delay. |
Raft Proposals Status |
|
|
Common metric anomalies
Etcd Cluster Healthy
Normal | Abnormal | Anomaly description |
All three etcd members have a leader and one of the etcd members must be a leader. This means that | One etcd member is abnormal. | This means that |
Multiple etcd members are abnormal. | This means that Check whether |
Backend Commit Delay
Normal | Abnormal | Anomaly description |
The metric indicates a delay of tens of milliseconds. | The metric indicates a delay of hundreds of milliseconds or even several seconds for a period of time. | Disk reads and writes are abnormal. |
Raft Proposals Status
Normal | Abnormal | Anomaly description |
The number of failed raft proposals per minute is 0. | The number of failed raft proposals per minute is greater than 0. | Raft proposals failed. If a large number of raft proposals failed, troubleshoot the issue. |
The number of pending raft proposals is 0. | The number of pending raft proposals is greater than 0. | A large number of raft proposals are pending because raft proposals are applied slowly. Check the Backend Commit Delay metric and troubleshoot the issue. |
The difference between the number of committed raft proposals and the number of applied raft proposals is 0. | The difference between the number of committed raft proposals and the number of applied raft proposals is greater than 0. | The etcd is overwhelmed by client requests. If the difference is greater than 5000, etcd denies subsequent requests and returns the |