Monitor kube-apiserver - Container Service for Kubernetes

This topic describes the metrics supported by kube-apiserver, provides usage notes for the dashboards of kube-apiserver, and suggests how to troubleshoot common metric anomalies.

Metrics

Metrics can indicate the status and parameter settings of a component. The following table describes the metrics supported by kube-apiserver.

Metric	Type	Description
apiserver_request_duration_seconds_bucket	Histogram	The latency between a request sent from a client and a response returned by kube-apiserver. This metric displays the response latency of kube-apiserver when handling different types of requests. Requests are classified based on verbs, groups, versions, resources, subresources, scopes, components, and clients. Histogram buckets: `0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, and 60`. Unit: seconds.
apiserver_request_total	Counter	The numbers of different types of requests received by kube-apiserver. Requests are classified based on verbs, groups, versions, resources, scopes, components, HTTP content types, HTTP codes, and clients.
apiserver_request_no_resourceversion_list_total	Counter	The number of LIST requests that do not include the ResourceVersion parameter. Requests are classified based on groups, versions, resources, scopes, and clients. This metric is used to check whether an excessive number of LIST requests of the quorum read type are sent to kube-apiserver. This can help optimize client behavior.
apiserver_current_inflight_requests	Gauge	The number of requests that are being processed by kube-apiserver. The requests are classified into ReadOnly and Mutating requests.
apiserver_dropped_requests_total	Counter	The number of requests dropped due to throttling. A request is dropped if the HTTP status code `429 'Try again later'` is returned.
apiserver_admission_controller_admission_duration_seconds_bucket	Gauge	The admission controller latency. The histogram is identified by the admission controller name, operation (CREATE, UPDATE, or CONNECT), API resource, operation type (validate or admit), and request denial (true or false). Buckets: `0.005, 0.025, 0.1, 0.5, and 2.5`. Unit: seconds.
apiserver_admission_webhook_admission_duration_seconds_bucket	Gauge	The admission webhook latency. The histogram is identified by the admission controller name, operation (CREATE, UPDATE, or CONNECT), API resource, operation type (validate or admit), and request denial (true or false). Buckets: `0.005, 0.025, 0.1, 0.5, and 2.5`. Unit: seconds.
apiserver_admission_webhook_admission_duration_seconds_count	Counter	The number of requests processed by the admission webhook. The histogram is identified by the admission controller name, operation (CREATE, UPDATE, or CONNECT), API resource, operation type (validate or admit), and request denial (true or false).
cpu_utilization_core	Gauge	The CPU usage. Unit: vCores.
cpu_utilization_ratio	Gauge	CPU utilization = Number of used vCores/Total number of vCores. Unit: %.
memory_utilization_byte	Gauge	The memory usage. Unit: bytes.
memory_utilization_ratio	Gauge	Memory utilization = Amount of used memory/Total amount of memory. Unit: %.
up	Gauge	The availability of kube-apiserver. 1: kube-apiserver is available. 0: kube-apiserver is unavailable.

Usage notes for dashboards

Dashboards are generated based on metrics and Prometheus Query Language (PromQL). The following sections describe the kube-apiserver dashboards for key metrics, cluster-level summary, resource analysis, QPS and latency, admission controller and webhook, and client summary.

In most cases, these dashboards are used in the following sequence:

View the key metrics dashboards to quickly check cluster performance statistics. For more information, see Key metrics.
View the cluster-level summary dashboards to analyze the response latency of kube-apiserver, the number of requests that are being processed by kube-apiserver, and whether request throttling is triggered. For more information, see Cluster-level summary.
View the resource analysis dashboards to check the resource usage of the managed components. For more information, see Resource analysis.
View the QPS and latency dashboards to analyze the QPS and response time in multiple dimensions. For more information, see QPS and latency.
View the Admission controller and webhook dashboards to analyze the QPS and response time of the admission controller and webhook. For more information, see Admission controller and webhook.
View the client summary dashboards to analyze the client QPS in multiple dimensions. For more information, see Client summary.

Filters

Multiple filters are displayed above the dashboards. You can use the following filters to filter requests sent to kube-apiserver based on verbs and resources, modify the quantile, and change the PromQL sampling interval.

To filter requests by verb or resource, use the verb or resource filter. To change the quantile, use the quantile filter. For example, if you select 0.9, 90% of the sample values of a metric are used as sample values in the histogram. A value of 0.9 (P90) can help eliminate the impacts of long-tail samples, which are only a small portion of the total sample values. A value of 0.99 (P99) includes long-tail samples. 筛选框

The following filters are used to set the time period and update interval. 筛选框2

Key metrics

Observability

Feature

Metric	PromQL	Description
API QPS	sum(irate(apiserver_request_total[$interval]))	The QPS of kube-apiserver.
Read Request Success Rate	sum(irate(apiserver_request_total{code=~"20.*",verb=~"GET\|LIST"}[$interval]))/sum(irate(apiserver_request_total{verb=~"GET\|LIST"}[$interval]))	The success rate of read requests sent to kube-apiserver.
Write request success rate	sum(irate(apiserver_request_total{code=~"20.*",verb!~"GET\|LIST\|WATCH\|CONNECT"}[$interval]))/sum(irate(apiserver_request_total{verb!~"GET\|LIST\|WATCH\|CONNECT"}[$interval]))	The success rate of write requests sent to kube-apiserver.
Number of read requests processed	sum(apiserver_current_inflight_requests{requestKind="readOnly"})	The number of read requests that are being processed by kube-apiserver.
Number of write requests processed	sum(apiserver_current_inflight_requests{requestKind="mutating"})	The number of write requests that are being processed by kube-apiserver.
Request Limit Rate	sum(irate(apiserver_dropped_requests_total[$interval]))	The number of requests dropped per second.

Cluster-level summary

Observability

Feature

Metric	PromQL	Description
GET read request delay P[0.9]	histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb="GET",resource!="",subresource!~"log\|proxy"}[$interval])) by (pod, verb, resource, subresource, scope, le))	The response latency to GET requests based on the following dimensions: API server pods, verbs (GET), resources (such as ConfigMaps, pods, and leases), and scopes (such as scopes to namespaces or clusters).
LIST read request delay P[0.9]	histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb="LIST"}[$interval])) by (pod_name, verb, resource, scope, le))	The response latency to LIST requests based on the following dimensions: API server pods, verbs (GET), resources (such as ConfigMaps, pods, and leases), and scopes (such as scopes to namespaces or clusters).
Write request delay P[0.9]	histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb!~"GET\|WATCH\|LIST\|CONNECT"}[$interval])) by (cluster, pod_name, verb, resource, scope, le))	The response latency to mutating requests based on the following dimensions: API server pods, verbs (GET), resources (such as ConfigMaps, pods, and leases), and scopes (such as scopes to namespaces or clusters).
Number of read requests processed	apiserver_current_inflight_requests{request_kind="readOnly"}	The number of read requests that are being processed by kube-apiserver.
Number of write requests processed	apiserver_current_inflight_requests{request_kind="mutating"}	The number of write requests that are being processed by kube-apiserver.
Request Limit Rate	sum(irate(apiserver_dropped_requests_total{request_kind="readOnly"}[$interval])) by (name) sum(irate(apiserver_dropped_requests_total{request_kind="mutating"}[$interval])) by (name)	Whether kube-apiserver triggers request throttling. No data or 0 indicates that request throttling is not triggered.

Resource analysis

Observability

Feature

Metric	PromQL	Description
Memory Usage	memory_utilization_byte{container="kube-apiserver"}	The memory usage of kube-apiserver. Unit: bytes.
CPU Usage	cpu_utilization_core{container="kube-apiserver"}*1000	The CPU usage of kube-apiserver. Unit: millicores.
Memory Usage	memory_utilization_ratio{container="kube-apiserver"}	The memory utilization of kube-apiserver. Unit: %.
CPU Usage	cpu_utilization_ratio{container="kube-apiserver"}	The CPU utilization of kube-apiserver. Unit: %.
Number of resource objects	max by(resource)(apiserver_storage_objects) max by(resource)(etcd_object_counts)	The number of each type of resource object that is managed by Kubernetes. The metric name varies based on Kubernetes version used by your ACK cluster: If your ACK cluster uses Kubernetes 1.22 or later, the metric name is apiserver_storage_objects. If your ACK cluster uses Kubernetes 1.22 or earlier, the metric name is etcd_object_counts.

QPS and latency

Observability

Feature

Metric	PromQL	Description
Analyze QPS [All] P[0.9] by Verb dimension	sum(irate(apiserver_request_total{verb=~"$verb"}[$interval]))by(verb)	The QPS calculated based on verbs.
Analyze QPS [All] P[0.9] by Verb Resource dimension	sum(irate(apiserver_request_total{verb=~"$verb",resource=~"$resource"}[$interval]))by(verb,resource)	The QPS calculated based on verbs and resources.
Analyze request latency by Verb dimension [All] P[0.9]	histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb=~"$verb", verb!~"WATCH\|CONNECT",resource!=""}[$interval])) by (le,verb))	The response latency calculated based on verbs.
Analyze request latency by Verb Resource dimension [All] P[0.9]	histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb=~"$verb", verb!~"WATCH\|CONNECT", resource=~"$resource",resource!=""}[$interval])) by (le,verb,resource))	The response latency calculated based on verbs and resources.
Read request QPS [5m] for non-2xx return values	sum(irate(apiserver_request_total{verb=~"GET\|LIST",resource=~"$resource",code!~"2.*"}[$interval])) by (verb,resource,code)	The QPS of read requests that are answered with status codes other than 2xx.
QPS [5m] for write requests with non-2xx return values	sum(irate(apiserver_request_total{verb!~"GET\|LIST\|WATCH",verb=~"$verb",resource=~"$resource",code!~"2.*"}[$interval])) by (verb,resource,code)	The QPS of write requests that are answered with status codes other than 2xx.
Apiserver to Etcd request latency [5m]	histogram_quantile($quantile, sum(irate(etcd_request_duration_seconds_bucket[$interval])) by (le,operation,type,instance))	The latency of requests from kube-apiserver to etcd.

Admission controller and webhook

Observability

Feature

Metric	PromQL	Description
Admission controller delay [admit]	histogram_quantile($quantile, sum by(operation, name, le, type, rejected) (irate(apiserver_admission_controller_admission_duration_seconds_bucket{type="admit"}[$interval])) )	Statistics about the admit type admission controller, the operations performed, whether the operations are denied, and the duration of the operations. Buckets: `0.005, 0.025, 0.1, 0.5, and 2.5`. Unit: seconds.
Admission Controller Delay [validate]	histogram_quantile($quantile, sum by(operation, name, le, type, rejected) (irate(apiserver_admission_controller_admission_duration_seconds_bucket{type="validate"}[$interval])) )	Statistics about the validate type admission controller, the operations performed, whether the operations are denied, and the duration of the operations. Buckets: `0.005, 0.025, 0.1, 0.5, and 2.5`. Unit: seconds.
Admission Webhook delay [admit]	histogram_quantile($quantile, sum by(operation, name, le, type, rejected) (irate(apiserver_admission_webhook_admission_duration_seconds_bucket{type="admit"}[$interval])) )	Statistics about the admit type admission webhook, the operations performed, whether the operations are denied, and the duration of the operations. Buckets: `0.005, 0.025, 0.1, 0.5, and 2.5`. Unit: seconds.
Admission Webhook Delay [validating]	histogram_quantile($quantile, sum by(operation, name, le, type, rejected) (irate(apiserver_admission_webhook_admission_duration_seconds_bucket{type="validating"}[$interval])) )	Statistics about the validate type admission webhook, the operations performed, whether the operations are denied, and the duration of the operations. Buckets: `0.005, 0.025, 0.1, 0.5, and 2.5`. Unit: seconds.
Admission Webhook Request QPS	sum(irate(apiserver_admission_webhook_admission_duration_seconds_count[$interval]))by(name,operation,type,rejected)	The QPS of the admission webhook.

Client summary

Observability

Feature

Metric	PromQL	Description
Analyze QPS by Client dimension	sum(irate(apiserver_request_total{client!=""}[$interval])) by (client)	The QPS statistics based on clients. This can help you analyze the clients that access kube-apiserver and the relevant QPS values.
Analyze QPS by Verb Resource Client dimension	sum(irate(apiserver_request_total{client!="",verb=~"$verb", resource=~"$resource"}[$interval]))by(verb,resource,client)	The QPS statistics based on verbs, resources, and clients.
Analyze LIST request QPS by Verb Resource Client dimension (no resourceVersion field)	sum(irate(apiserver_request_no_resourceversion_list_total[$interval]))by(resource,client)	The QPS of LIST requests (without the resourceVersion field) based on verbs, resources, and clients. You can analyze and optimize the LIST operations performed by clients based on the LIST requests sent to kube-apiserver and the LIST requests that retrieve data from etcd.

Common metric anomalies

Success rate of read/write requests

Normal	Abnormal	Description	Suggestion
The values of Read Request Success Rate and Write request success rate are close to 100%.	The values of Read Request Success Rate and Write request success rate are low. For example, the success rates are lower than 90%.	A large number of requests are answered with status codes other than 200.	Check Read request QPS [5m] for non-2xx return values and QPS [5m] for write requests with non-2xx return values for request types and resources that cause kube-apiserver to return status codes other than 200. For example, if GET/deployment 404 exists, GET deployment requests that are answered with the status code 404 exist. This decreases the Read Request Success Rate. Check whether the requests that are answered with status codes other than 200 are necessary.

Latency of GET/LIST requests and latency of write requests

Normal	Abnormal	Description	Suggestion
The values of GET read request delay P[0.9], LIST read request delay P[0.9], and Write request delay P[0.9] vary based on the amount of cluster resources and the cluster size. Therefore, no specific thresholds can be used to identify anomalies. All cases are acceptable if your workloads are not adversely affected. For example, if the number of clients that access a specific type of resource increases, the latency of LIST requests increases. In most cases, GET read request delay P[0.9] and Write request delay P[0.9] are shorter than 1 second, and LIST read request delay P[0.9] is shorter than 5 seconds.	GET read request delay P[0.9] and Write request delay P[0.9] are longer than 1 second. LIST read request delay P[0.9] is longer than 5 seconds.	Check whether the response latency increases because of the admission webhook or the increase in clients that access the resources.	Check GET read request delay P[0.9], LIST read request delay P[0.9], and Write request delay P[0.9] for request types and resources that cause kube-apiserver to return status codes other than 200. Note Pod access requests `POST pod/exec` and log retrieval requests `GET pod/log` all create persistent connections. The response latency of these requests is longer than 60 seconds. The upper limit of the `apiserver_request_duration_seconds_bucket` metric is 60 seconds. Response latency values longer than 60 seconds are rounded down to 60 seconds. Therefore, you can ignore these requests when you analyze requests. Therefore, you can ignore these requests when you analyze requests. Refer to Admission webhook latency and check whether the response latency of kube-apiserver increases because the admission webhook processes requests slowly.

Number of in-flight read/write requests and dropped requests

Normal	Abnormal	Description	Suggestion
In most cases, if the values of Number of read requests processed and Number of write requests processed are less than 100 and Request Limit Rate is 0, no anomaly occurs.	Number of read requests processed and Number of write requests processed are greater than 100. Request Limit Rate is greater than 0.	The request queue is full. Check whether the issue is caused by temporary request spikes or the admission webhook. If the number of pending requests exceeds the length of the queue, kube-apiserver triggers request throttling and Request Limit Rate exceeds 0. As a result, the stability of the cluster is affected.	Check QPS and latency and Client summary. Check whether the top requests are necessary. If the requests are generated by workloads, check whether you can reduce the number of similar requests. Refer to Admission webhook latency and check whether kube-apiserver triggers request throttling because the admission webhook processes requests slowly. If Request Limit Rate remains above 0, contact Alibaba Cloud technical support.

Memory/CPU usage

Normal	Abnormal	Description	Suggestion
The value of Memory Usage is lower than 80% and the value of CPU Usage is lower than 90%.	The values of Memory Usage and CPU Usage exceed 90%.	kube-apiserver is overwhelmed and the resource usage is approaching the upper limit. kube-apiserver encounters an out of memory (OOM) error due to insufficient memory resources. kube-apiserver responds slowly and encounters exceptions due to insufficient CPU resources.	Check QPS and latency and Client summary. Check whether the top requests are necessary. If these requests are generated by workloads, check whether you can reduce the number of similar requests. If the cluster maintains a large amount of Kubernetes resources, such as ConfigMaps, Secrets, and persistent volume claims (PVCs), kube-apiserver may not have sufficient resources. Refer to Best practices for accessing control plane components and optimize the cluster access mode.

Admission webhook latency

Normal	Abnormal	Description	Suggestion
The value of Admission Webhook Delay is shorter than 0.5 second.	The value of Admission Webhook Delay remains above 0.5 second.	If the admission webhook cannot respond promptly, the response latency of kube-apiserver increases. Check whether the admission webhook works as expected.	Analyze the admission webhook log and check whether the admission webhook works as expected. If you no longer need the admission webhook, uninstall it.