From 00:00:00 on November 12, 2024 (UTC+8), Managed Service for Prometheus will change the basic metrics about container clusters monitored in Alibaba Cloud. This topic describes the new basic metrics.
By default, only the metrics described in this topic are collected.
If you are using metrics that are no longer supported, add them to custom metrics. For more information, see Manage the custom collection rules of ACK environments. Custom metrics charge fees. For information about the billing, see Instance billing.
cAdvisor (job name: _arms/kubelet/cadvisor)
Metric | Description |
container_cpu_usage_seconds_total | The total CPU time consumed by the container in seconds. |
container_fs_usage_bytes | The number of bytes used by the container file system. |
container_memory_cache | The memory cache size of the container in bytes. |
container_memory_usage_bytes | The amount of memory used by the container in bytes. |
container_memory_working_set_bytes | The memory working set size (WSS) of the container in bytes. |
container_network_receive_bytes_total | The total network traffic received by the container in bytes. |
container_network_transmit_bytes_total | The total network traffic transmitted by the container in bytes. |
container_scrape_error | The number of container metric scraping errors. |
DCGM_CUSTOM_CONTAINER_CP_ALLOCATED | The ratio of the GPU computing power allocated to the container to the total computing power of the GPU. The value ranges from 0 to 1. In exclusive GPU mode or in shared GPU mode in which the container requests only GPU memory, the value of this metric is 0, which indicates that the allocation of GPU computing power is unlimited. For example, if a GPU provides a total of 100 compute units (CUs) of GPU computing power and allocates 30 CUs to a container, the ratio of the GPU computing power allocated to the container is calculated by using the following formula: 30/100 = 0.3. |
DCGM_CUSTOM_CONTAINER_MEM_ALLOCATED | The amount of GPU memory allocated to the container. |
DCGM_CUSTOM_DEV_FB_ALLOCATED | The ratio of the allocated GPU memory to the total memory of the GPU. The value ranges from 0 to 1. |
DCGM_CUSTOM_DEV_FB_TOTAL | The total memory of the GPU. |
DCGM_CUSTOM_DEV_HEALTH | The health status of the GPU. |
DCGM_CUSTOM_PROCESS_DECODE_UTIL | The decoder utilization of GPU threads. |
DCGM_CUSTOM_PROCESS_ENCODE_UTIL | The encoder utilization of GPU threads. |
DCGM_CUSTOM_PROCESS_MEM_COPY_UTIL | The memory copy utilization of GPU threads. |
DCGM_CUSTOM_PROCESS_MEM_USED | The amount of GPU memory used by GPU threads. |
DCGM_CUSTOM_PROCESS_SM_UTIL | The streaming multiprocessor (SM) utilization of GPU threads. |
DCGM_CUSTOM_PROF_MEM_BANDWIDTH_USED | The GPU memory bandwidth used. |
DCGM_CUSTOM_PROF_TENS_TFPS_USED | The tensor core utilization. |
DCGM_FI_DEV_DEC_UTIL | The decoder utilization. |
DCGM_FI_DEV_ENC_UTIL | The encoder utilization. |
DCGM_FI_DEV_FB_FREE | The amount of free frame buffer memory. |
DCGM_FI_DEV_FB_USED | The amount of used frame buffer memory. The value of this metric is the same as the value of Memory-Usage returned by the nvidia-smi command. |
DCGM_FI_DEV_GPU_TEMP | The GPU temperature. |
DCGM_FI_DEV_GPU_UTIL | The GPU utilization within a cycle of 1 second or 1/6 second. The cycle varies based on the GPU model. A cycle is a period of time during which one or more kernel functions remain active. This metric only indicates that one or more kernel functions are occupying GPU resources. The metric does not display detailed GPU usage information. |
DCGM_FI_DEV_MEM_CLOCK | The memory clock speed. |
DCGM_FI_DEV_MEM_COPY_UTIL | The memory bandwidth utilization. For example, the maximum memory bandwidth of NVIDIA V100 is 900 GB/s. If the memory bandwidth used is 450 GB/s, the memory bandwidth utilization is 50%. |
DCGM_FI_DEV_POWER_USAGE | The power usage. |
DCGM_FI_DEV_SM_CLOCK | The SM clock speed. |
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION | The total energy consumed since the driver was last loaded. |
DCGM_FI_DEV_XID_ERRORS | The last XID error that occurred within a period of time. |
DCGM_FI_PROF_DRAM_ACTIVE | The cycle fraction for memory bandwidth utilization when sending data to device memory or receiving data from device memory. The value is an average value within a time interval rather than an instantaneous value. A larger value of this metric indicates higher device memory utilization. If the value is 1 (100%), a DRAM command is executed every cycle within the entire interval. The peak value of the metric can reach 0.8 (80%). If the value of this metric is 0.2 (20%), 20% of the cycles within the time interval are spent reading from or writing to device memory. |
DCGM_FI_PROF_NVLINK_RX_BYTES | The TX rate of NVLink and the RX rate of NVLink. The bytes transmitted or received exclude the header. The value is an average value within a time interval rather than an instantaneous value. For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum NVLink Gen2 bandwidth is 25 GB/s per direction per link. |
DCGM_FI_PROF_NVLINK_TX_BYTES | The total number of bytes sent through NVLink. |
DCGM_FI_PROF_PCIE_RX_BYTES | The TX rate of PCle and the RX rate of PCIe. The bytes transmitted or received include both the header and payload. The value is an average value within a time interval rather than an instantaneous value. For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum PCIe Gen3 bandwidth is 985 MB/s per lane. |
DCGM_FI_PROF_PCIE_TX_BYTES | The TX rate of PCle and the RX rate of PCIe. The bytes transmitted or received include both the header and payload. The value is an average value within a time interval rather than an instantaneous value. For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum PCIe Gen3 bandwidth is 985 MB/s per lane. |
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE | The cycle fraction for the Tensor (HMMA/IMMA) pipe being in the Active state. The value is an average value within a time interval rather than an instantaneous value. A larger value of this metric indicates higher tensor core utilization. If the value is 1 (100%), a Tensor instruction is issued every cycle within the entire interval. One instruction completes in two cycles. If the value of this metric is 0.2 (20%), one of the following conditions may exist: The tensor core utilization of 20% of the SMs within the time interval is 100%. The tensor core utilization of all SMs within the time interval is 20%. The tensor core utilization of all SMs within 20% of the time interval is 100%. Other conditions. |
DCGM_FI_PROF_SM_ACTIVE | The ratio of cycles during which at least one warp on an SM remains active. The value is an average of all SMs. The value does not vary with the number of warps included in the thread block. When a warp is scheduled and resources are allocated to the warp, the warp is considered active. In this case, the status of the warp may be Computing or not Computing; for example, it may be waiting for memory requests or in another non-Computing state. If the value of this metric drops below 0.5, the GPU utilization is low. To ensure high GPU utilization, make sure that the value is greater than 0.8. Assume that a GPU has N SMs. If all SMs in N thread blocks run a kernel function within a time interval, the value of this metric is 1 (100%). If N/5 thread blocks run a kernel function within a time interval, the value of this metric is 0.2. If N thread blocks run a kernel function during 20% of the cycle within a time interval, the value of this metric is 0.2. |
machine_cpu_cores | The number of CPU cores on the machine. |
node_exporter_build_info | The build information about the node exporter. |
nvidia_gpu_duty_cycle | The percentage of time over the past sample period during which the NVIDIA GPU was occupied. |
nvidia_gpu_memory_total_bytes | The total memory of the NVIDIA GPU in bytes. |
nvidia_gpu_memory_used_bytes | The memory used by the NVIDIA GPU in bytes. |
nvidia_gpu_num_devices | The number of NVIDIA GPUs. |
nvidia_gpu_power_usage_milliwatts | The power consumption of the NVIDIA GPU in milliwatts. |
nvidia_gpu_temperature_celsius | The temperature of the NVIDIA GPU in °C. |
rdma_service_monitor_local_ack_timeout_err | The number of timeout errors that occurred in the remote direct memory access (RDMA) network. |
rdma_service_monitor_out_of_seq | The number of out-of-order packets in the RDMA network. |
rdma_service_monitor_packet_seq_err | The number of out-of-order packet errors in the RDMA network. |
rdma_service_monitor_rx_bytes | The throughput received over the RDMA network in bytes. |
rdma_service_monitor_rx_packets | The number of packets received over the RDMA network. |
rdma_service_monitor_tx_bytes | The throughput sent over the RDMA network in bytes. |
rdma_service_monitor_tx_packets | The number of packets sent over the RDMA network. |
up | The connectivity of metric collection. |
ACK ControlPlane APIServer (Control plane components for ACK Pro clusters: APIServer, ETCD, Scheduler, Kube Controller Manager, and Cloud Controller Manager as well as Control plane component for ACK dedicated clusters: APIServer) (job name: apiserver)
Metric | Description |
aggregator_discovery_aggregation_count_total | The count of discovery aggregations performed by the aggregator. |
aggregator_openapi_v2_regeneration_count | The number of regenerations based on OpenAPI 2.0. |
aggregator_openapi_v2_regeneration_duration | The amount of time consumed for regenerations based on OpenAPI 2.0. |
aggregator_unavailable_apiservice | The APIServices that are unavailable to the aggregator. |
aggregator_unavailable_apiservice_count | The count of APIServices that are unavailable to the aggregator. |
aggregator_unavailable_apiservice_total | The total number of APIServices that are unavailable to the aggregator. |
aliyun_prometheus_agent_append_duration_seconds | The additional time spent by the Prometheus agent in seconds. |
aliyun_prometheus_agent_job_discovery_status | The job status that is discovered by the Prometheus agent. |
aliyun_prometheus_agent_scrapes_by_target_total | The total number of target scrapes performed by the Prometheus agent. |
aliyun_prometheus_agent_target_info | The information about targets scraped by the Prometheus agent. |
apiextensions_apiserver_validation_ratcheting_seconds_bucket | The distribution of incremental time intervals for validation in seconds in the APIServer. |
apiextensions_apiserver_validation_ratcheting_seconds_count | The count of incremental time intervals for validation in seconds in the APIServer. |
apiextensions_apiserver_validation_ratcheting_seconds_sum | The sum of incremental time intervals for validation in seconds in the APIServer. |
apiextensions_openapi_v2_regeneration_count | The number of API extension regenerations based on OpenAPI 2.0. |
apiextensions_openapi_v3_regeneration_count | The number of API extension regenerations based on OpenAPI 3.0. |
apiserver_accepted_listall_requests_total | The total number of ListAll requests accepted by the APIServer. |
apiserver_admission_controller_admission_duration_seconds_bucket | The distribution of APIServer admission controller durations in seconds. |
apiserver_admission_controller_admission_duration_seconds_count | The count of APIServer admission controller durations in seconds. |
apiserver_admission_controller_admission_duration_seconds_sum | The sum of APIServer admission controller durations in seconds. |
apiserver_admission_step_admission_duration_seconds_bucket | The distribution of APIServer admission step durations in seconds. |
apiserver_admission_step_admission_duration_seconds_count | The count of APIServer admission step durations per second. |
apiserver_admission_step_admission_duration_seconds_sum | The sum of APIServer admission step durations in seconds. |
apiserver_admission_step_admission_duration_seconds_summary | The summary of APIServer admission step durations in seconds. |
apiserver_admission_step_admission_duration_seconds_summary_count | The summary count of APIServer admission step durations in seconds. |
apiserver_admission_step_admission_duration_seconds_summary_sum | The summary total of APIServer admission step durations in seconds. |
apiserver_admission_webhook_admission_duration_seconds_bucket | The distribution of APIServer admission webhook durations in seconds. |
apiserver_admission_webhook_admission_duration_seconds_count | The count of APIServer admission webhook durations in seconds. |
apiserver_admission_webhook_admission_duration_seconds_sum | The sum of APIServer admission webhook durations in seconds. |
apiserver_admission_webhook_fail_open_count | The count of times that the APIServer admission webhook is configured as fail open. |
apiserver_admission_webhook_rejection_count | The count of requests rejected by the APIServer admission webhook. |
apiserver_admission_webhook_request_total | The total number of requests to the APIServer admission webhook. |
apiserver_audit_error_total | The total number of APIServer audit errors. |
apiserver_audit_event_total | The total number of APIServer audit events. |
apiserver_audit_level_total | The total number of APIServer audit levels. |
apiserver_audit_requests_rejected_total | The total number of rejected APIServer requests. |
apiserver_authorization_decisions_total | The total number of authorization decisions made by the APIServer. |
apiserver_cache_list_fetched_objects_total | The total number of objects obtained by the APIServer cache list. |
apiserver_cache_list_returned_objects_total | The total number of objects returned by the APIServer cache list. |
apiserver_cache_list_total | The total number of operations performed by the APIServer cache list. |
apiserver_cacher_received_events | The number of events received by the APIServer cache. |
apiserver_cacher_sended_events_latency_milliseconds_bucket | The distribution of APIServer event sending latencies in milliseconds. |
apiserver_cacher_sended_events_latency_milliseconds_count | The count of APIServer event sending latencies in milliseconds. |
apiserver_cacher_sended_events_latency_milliseconds_sum | The total of APIServer event sending latencies in milliseconds. |
apiserver_cacher_watcher_channel_length | The watcher channel length of the APIServer cache. |
apiserver_cel_compilation_duration_seconds_bucket | The distribution of APIServer Common Expression Language (CEL) compilation latencies in seconds. |
apiserver_cel_compilation_duration_seconds_count | The count of APIServer CEL compilations. |
apiserver_cel_compilation_duration_seconds_sum | The total time consumed for APIServer CEL compilations in seconds. |
apiserver_cel_evaluation_duration_seconds_bucket | The distribution of APIServer CEL evaluation latencies in seconds. |
apiserver_cel_evaluation_duration_seconds_count | The count of APIServer CEL evaluations. |
apiserver_cel_evaluation_duration_seconds_sum | The total of APIServer CEL evaluation latencies in seconds. |
apiserver_client_certificate_expiration_seconds_bucket | The distribution of remaining seconds until APIServer client certificate expiration. |
apiserver_client_certificate_expiration_seconds_count | The count of remaining seconds until APIServer client certificate expiration. |
apiserver_client_certificate_expiration_seconds_sum | The total remaining seconds until APIServer client certificate expiration. |
apiserver_clusterip_repair_ip_errors_total | The total number of ClusterIP errors fixed by the APIServer. |
apiserver_clusterip_repair_reconcile_errors_total | The total number of ClusterIP reconcile errors fixed by the APIServer. |
apiserver_conversion_webhook_duration_seconds_bucket | The distribution of APIServer conversion webhook latencies in seconds. |
apiserver_conversion_webhook_duration_seconds_count | The count of APIServer conversion webhook calls. |
apiserver_conversion_webhook_duration_seconds_sum | The total of APIServer conversion webhook latencies in seconds. |
apiserver_conversion_webhook_request_total | The total number of APIServer conversion webhook requests. |
apiserver_crd_conversion_webhook_duration_seconds_bucket | The distribution of APIServer Custom Resource Definition (CRD) conversion webhook latencies in seconds. |
apiserver_crd_conversion_webhook_duration_seconds_count | The count of APIServer CRD conversion webhook calls. |
apiserver_crd_conversion_webhook_duration_seconds_sum | The total of APIServer CRD conversion webhook latencies in seconds. |
apiserver_crd_webhook_conversion_duration_seconds_bucket | The distribution of APIServer CRD webhook conversion latencies in seconds. |
apiserver_crd_webhook_conversion_duration_seconds_count | The count of APIServer CRD webhook conversions. |
apiserver_crd_webhook_conversion_duration_seconds_sum | The total of APIServer CRD webhook conversion latencies in seconds. |
apiserver_created_watchers | The number of watchers created by the APIServer. |
apiserver_current_inflight_requests | The number of requests that are being processed by the APIServer. |
apiserver_current_inqueue_requests | The maximum number of queued requests in the APIServer. |
apiserver_dropped_requests_total | The total number of requests dropped by the APIServer. |
apiserver_encryption_config_controller_automatic_reload_failures_total | The number of times that the encryption configuration controller of the APIServer failed to be automatically reloaded. |
apiserver_encryption_config_controller_automatic_reload_success_total | The number of times that the encryption configuration controller of the APIServer was automatically reloaded. |
apiserver_envelope_encryption_dek_cache_fill_percent | The percentage of APIServer envelope encryption Data Encryption Key (DEK) cache filled. |
apiserver_error_watchers | The number of watchers in the Error state in the APIServer. |
apiserver_flowcontrol_current_executing_requests | The number of requests being processed by APIServer rate limiting. |
apiserver_flowcontrol_current_executing_seats | The number of seats occupied by APIServer rate limiting. |
apiserver_flowcontrol_current_inqueue_requests | The number of requests pending in queues in the APF system. |
apiserver_flowcontrol_current_inqueue_seats | The number of seats pending in APIServer rate limiting queues. |
apiserver_flowcontrol_current_limit_seats | The number of seats limited by APIServer rate limiting. |
apiserver_flowcontrol_current_r | The current R value of APIServer rate limiting. |
apiserver_flowcontrol_demand_seats_average | The average number of seats requested by APIServer rate limiting. |
apiserver_flowcontrol_demand_seats_bucket | The distribution of seats requested by APIServer rate limiting. |
apiserver_flowcontrol_demand_seats_count | The count of seats requested by APIServer rate limiting. |
apiserver_flowcontrol_demand_seats_high_watermark | The high watermark of seats requested by APIServer rate limiting. |
apiserver_flowcontrol_demand_seats_smoothed | The smoothed value of seats requested by APIServer rate limiting. |
apiserver_flowcontrol_demand_seats_stdev | The standard deviation of seats requested by APIServer rate limiting. |
apiserver_flowcontrol_demand_seats_sum | The sum of seats requested by APIServer rate limiting. |
apiserver_flowcontrol_dispatch_r | The scheduling R value of APIServer rate limiting. |
apiserver_flowcontrol_dispatched_requests_total | The total number of requests scheduled by APIServer rate limiting. |
apiserver_flowcontrol_latest_s | The recent S value bounds of APIServer rate limiting. |
apiserver_flowcontrol_lower_limit_seats | The lower bound of seats in APIServer rate limiting. |
apiserver_flowcontrol_next_discounted_s_bounds | The next discounted S value bounds of APIServer rate limiting. |
apiserver_flowcontrol_next_s_bounds | The next S value bounds of APIServer rate limiting. |
apiserver_flowcontrol_nominal_limit_seats | The nominal upper bound of seats in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_request_count_samples_bucket | The distribution of priority level request samples in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_request_count_samples_count | The count of priority level request samples in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_request_count_samples_sum | The sum of priority level request samples in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_request_count_watermarks_bucket | The distribution of watermark levels for priority level request samples in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_request_count_watermarks_count | The count of watermark levels for priority level request samples in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_request_count_watermarks_sum | The sum of watermark levels for priority level request samples in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_request_utilization_bucket | The distribution of request utilization samples by priority level in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_request_utilization_count | The count of request utilization samples by priority level in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_request_utilization_sum | The sum of request utilization by priority level in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_seat_count_samples_bucket | The distribution of seat samples for priority level in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_seat_count_samples_count | The count of seat samples for priority level in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_seat_count_samples_sum | The sum of seat samples for priority level in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_seat_count_watermarks_bucket | The distribution of watermark levels for seat samples in APIServer rate limiting by priority level. |
apiserver_flowcontrol_priority_level_seat_count_watermarks_count | The count of watermark levels for seat samples in APIServer rate limiting by priority level. |
apiserver_flowcontrol_priority_level_seat_count_watermarks_sum | The sum of watermark levels for seat samples in APIServer rate limiting by priority level. |
apiserver_flowcontrol_priority_level_seat_utilization_bucket | The distribution of seat utilization samples by priority level in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_seat_utilization_count | The count of seat utilization samples by priority level in APIServer rate limiting. |
apiserver_flowcontrol_priority_level_seat_utilization_sum | The sum of seat utilization by priority level in APIServer rate limiting. |
apiserver_flowcontrol_read_vs_write_current_requests_bucket | The distribution of current read/write requests in APIServer rate limiting. |
apiserver_flowcontrol_read_vs_write_current_requests_count | The count of current read/write requests in APIServer rate limiting. |
apiserver_flowcontrol_read_vs_write_current_requests_sum | The sum of current read/write requests in APIServer rate limiting. |
apiserver_flowcontrol_read_vs_write_request_count_samples_bucket | The distribution of read/write request count samples in APIServer rate limiting. |
apiserver_flowcontrol_read_vs_write_request_count_samples_count | The count of read/write request count samples in APIServer rate limiting. |
apiserver_flowcontrol_read_vs_write_request_count_samples_sum | The sum of read/write request count samples in APIServer rate limiting. |
apiserver_flowcontrol_read_vs_write_request_count_watermarks_bucket | The distribution of read/write request count watermarks in APIServer rate limiting. |
apiserver_flowcontrol_read_vs_write_request_count_watermarks_count | The count of read/write request count watermarks in APIServer rate limiting. |
apiserver_flowcontrol_read_vs_write_request_count_watermarks_sum | The sum of read/write request count watermarks in APIServer rate limiting. |
apiserver_flowcontrol_rejected_requests_total | The total number of requests rejected by APIServer rate limiting. |
apiserver_flowcontrol_request_concurrency_in_use | The count of concurrent requests in APIServer rate limiting. |
apiserver_flowcontrol_request_concurrency_limit | The concurrent request limit in APIServer rate limiting. |
apiserver_flowcontrol_request_dispatch_no_accommodation_total | The total number of requests that could not be accommodated by the scheduling of APIServer rate limiting. |
apiserver_flowcontrol_request_execution_seconds_bucket | The distribution of request latencies in seconds in APIServer rate limiting. |
apiserver_flowcontrol_request_execution_seconds_count | The count of request latencies in seconds in APIServer rate limiting. |
apiserver_flowcontrol_request_execution_seconds_sum | The sum of request latencies in seconds in APIServer rate limiting. |
apiserver_flowcontrol_request_queue_length_after_enqueue_bucket | The distribution of request queue lengths after enqueuing in APIServer rate limiting. |
apiserver_flowcontrol_request_queue_length_after_enqueue_count | The count of request queue lengths after enqueuing in APIServer rate limiting. |
apiserver_flowcontrol_request_queue_length_after_enqueue_sum | The sum of request queue lengths after enqueuing in APIServer rate limiting. |
apiserver_flowcontrol_request_wait_duration_seconds_bucket | The distribution of request waiting durations in seconds in APIServer rate limiting. |
apiserver_flowcontrol_request_wait_duration_seconds_count | The count of request waiting durations in seconds in APIServer rate limiting. |
apiserver_flowcontrol_request_wait_duration_seconds_sum | The sum of request waiting durations in seconds in APIServer rate limiting. |
apiserver_flowcontrol_seat_fair_frac | The fair share ratios determined by the APIServer during the last borrowing adjustment period. |
apiserver_flowcontrol_target_seats | The target number of seats in APIServer rate limiting. |
apiserver_flowcontrol_upper_limit_seats | The upper bound of seats in APIServer rate limiting. |
apiserver_flowcontrol_watch_count_samples_bucket | The distribution of observed samples in APIServer rate limiting. |
apiserver_flowcontrol_watch_count_samples_count | The count of observed samples in APIServer rate limiting. |
apiserver_flowcontrol_watch_count_samples_sum | The sum of observed samples in APIServer rate limiting. |
apiserver_flowcontrol_work_estimated_seats_bucket | The distribution of estimated seats in APIServer rate limiting. |
apiserver_flowcontrol_work_estimated_seats_count | The count of estimated seats in APIServer rate limiting. |
apiserver_flowcontrol_work_estimated_seats_sum | The sum of estimated seats in APIServer rate limiting. |
apiserver_init_events_total | The total number of initialization events in the APIServer. |
apiserver_kube_aggregator_x509_insecure_sha1_total | The number of requests using insecure Secure Hash Algorithm 1 (SHA1) signatures. |
apiserver_kube_aggregator_x509_missing_san_total | The total number of x509 certificates missing Subject Alternative Names (SANs) in APIServer kube-aggregator. |
apiserver_longrunning_gauge | The long-running meter in the APIServer. |
apiserver_longrunning_requests | The long-running requests in the APIServer. |
apiserver_nodeport_repair_reconcile_errors_total | The total number of node port fix reconcile errors in the APIServer. |
apiserver_realtime_watchers | The number of real-time observers in the APIServer. |
apiserver_registered_watchers | The number of registered watchers in the APIServer. |
apiserver_request_aborts_total | The total number of suspended APIServer requests. |
apiserver_request_body_size_bytes_bucket | The distribution of APIServer request body sizes in bytes. |
apiserver_request_body_size_bytes_count | The count of APIServer request body sizes in bytes. |
apiserver_request_body_size_bytes_sum | The sum of APIServer request body sizes in bytes. |
apiserver_request_count | The number of APIServer requests. |
apiserver_request_duration_seconds_bucket | The distribution of APIServer request latencies in seconds |
apiserver_request_duration_seconds_count | The count of APIServer request latencies in seconds |
apiserver_request_duration_seconds_sum | The sum of APIServer request latencies in seconds |
apiserver_request_filter_duration_seconds_bucket | The distribution of request filter latencies in seconds. |
apiserver_request_filter_duration_seconds_count | The count of request filter latencies in seconds. |
apiserver_request_filter_duration_seconds_sum | The sum of request filter latencies in seconds. |
apiserver_request_latencies_summary | The summary of APIServer request latencies. |
apiserver_request_no_resourceversion_list_total | The total number of unversioned LIST requests. |
apiserver_request_post_timeout_total | The total number of timed out POST requests. |
apiserver_request_sli_duration_seconds_bucket | The distribution of Service Level Indicator (SLI) request latencies in seconds. |
apiserver_request_sli_duration_seconds_count | The count of SLI request latencies in seconds. |
apiserver_request_sli_duration_seconds_sum | The sum of SLI request latencies in seconds. |
apiserver_request_slo_duration_seconds_bucket | The distribution of Service Level Objective (SLO) request latencies in seconds. |
apiserver_request_slo_duration_seconds_count | The count of SLO request latencies in seconds. |
apiserver_request_slo_duration_seconds_sum | The sum of SLO request latencies in seconds. |
apiserver_request_terminations_total | The total number of terminated API requests. |
apiserver_request_timestamp_comparison_time_bucket | The distribution of time spent in timestamp comparison of API requests. |
apiserver_request_timestamp_comparison_time_count | The count of API request samples for timestamp comparison. |
apiserver_request_timestamp_comparison_time_sum | The sum of time spent in timestamp comparison of API requests. |
apiserver_request_total | The total number of API requests. |
apiserver_requested_deprecated_apis | The count of APIServer requests for deprecated APIs. |
apiserver_response_sizes_bucket | The distribution of response body sizes of API requests. |
apiserver_response_sizes_count | The count of response body sizes of API requests. |
apiserver_response_sizes_sum | The sum of response body sizes of API requests. |
apiserver_selfrequest_total | The total number of APIServer self-requests. |
apiserver_storage_data_key_generation_duration_seconds_bucket | The distribution of time consumed by the APIServer to generate data keys in seconds. |
apiserver_storage_data_key_generation_duration_seconds_count | The count of time consumed by the APIServer to generate data keys in seconds. |
apiserver_storage_data_key_generation_duration_seconds_sum | The sum of time consumed by the APIServer to generate data keys in seconds. |
apiserver_storage_data_key_generation_failures_total | The total number of data key generation failures. |
apiserver_storage_db_total_size_in_bytes | The total size of APIServer databases in bytes. |
apiserver_storage_decode_errors_total | The total number of decoding errors in the APIServer. |
apiserver_storage_envelope_transformation_cache_misses_total | The total number of envelope conversion cache misses in the APIServer. |
apiserver_storage_events_received_total | The total number of events received by the APIServer. |
apiserver_storage_list_evaluated_objects_total | The total number of evaluated objects in the APIServer storage list. |
apiserver_storage_list_fetched_objects_total | The total number of objects obtained by the APIServer storage list. |
apiserver_storage_list_returned_objects_total | The total number of objects returned by the APIServer storage list. |
apiserver_storage_list_total | The total number of operations performed by the APIServer storage list. |
apiserver_storage_objects | The number of objects stored in the APIServer. |
apiserver_storage_size_bytes | The total size of objects stored in the APIServer. |
apiserver_terminated_watchers_total | The total number of watchers terminated by the APIServer. |
apiserver_tls_handshake_errors_total | The total number of requests with Transport Layer Security (TLS) handshake errors in the APIServer. |
apiserver_too_large_resourceversion_errors | The total number of requests whose resource version is too late in the APIServer. |
apiserver_watch_cache_events_dispatched_total | The total number of cache distribution events observed by the APIServer. |
apiserver_watch_cache_events_received_total | The total number of cache reception events observed by the APIServer. |
apiserver_watch_cache_initializations_total | The total number of cache initializations observed by the APIServer. |
apiserver_watch_cache_read_wait_seconds_bucket | The distribution of cache read waiting durations in seconds observed by the APIServer. |
apiserver_watch_cache_read_wait_seconds_count | The count of cache read waiting durations in seconds observed by the APIServer. |
apiserver_watch_cache_read_wait_seconds_sum | The sum of cache read waiting durations in seconds observed by the APIServer. |
apiserver_watch_cache_watch_cache_initializations_total | The total number of cache initializations observed by the APIServer. |
apiserver_watch_events_sizes_bucket | The distribution of sizes of events observed by the APIServer. |
apiserver_watch_events_sizes_count | The count of sizes of events observed by the APIServer. |
apiserver_watch_events_sizes_sum | The sum of sizes of events observed by the APIServer. |
apiserver_watch_events_total | The total number of events observed by the APIServer. |
apiserver_webhooks_x509_insecure_sha1_total | The number of requests using insecure SHA1 signatures. |
apiserver_webhooks_x509_missing_san_total | The total number of missing SANs in APIServer webhooks. |
authenticated_user_requests | The total number of authenticated user requests. |
authentication_attempts | The number of authentication attempts. |
authentication_duration_seconds_bucket | The distribution of authentication durations in seconds. |
authentication_duration_seconds_count | The count of authentication durations in seconds. |
authentication_duration_seconds_sum | The sum of authentication durations in seconds. |
authentication_token_cache_active_fetch_count | The count of active fetches for the authentication token cache. |
authentication_token_cache_fetch_total | The total number of times the authentication token was retrieved from the cache. |
authentication_token_cache_request_duration_seconds_bucket | The distribution of request durations in seconds for authentication token cache. |
authentication_token_cache_request_duration_seconds_count | The count of request durations in seconds for authentication token cache. |
authentication_token_cache_request_duration_seconds_sum | The sum of request durations in seconds for authentication token cache. |
authentication_token_cache_request_total | The total number of requests for authentication token cache. |
authorization_attempts_total | The total number of authorization attempts. |
authorization_duration_seconds_bucket | The distribution of authorization durations in seconds. |
authorization_duration_seconds_count | The count of authorization durations in seconds. |
authorization_duration_seconds_sum | The sum of authorization durations in seconds. |
cardinality_enforcement_unexpected_categorizations_total | The total number of unexpected classifications in classification execution. |
count | The count details. |
cpu_utilization_core | The CPU utilization of the core. |
disabled_metric_total | The total number of disabled metrics. |
disabled_metrics_total | The total number of disabled metrics. |
etcd_bookmark_counts | The number of ETCD bookmarks. |
etcd_db_total_size_in_bytes | The total size of ETCD databases in bytes. |
etcd_lease_object_counts_bucket | The distribution of objects attached to a single ETCD lease. |
etcd_lease_object_counts_count | The count of objects attached to a single ETCD lease. |
etcd_lease_object_counts_sum | The sum of objects attached to a single ETCD lease. |
etcd_object_counts | The number of ETCD objects. |
etcd_request_duration_seconds_bucket | The distribution of ETCD request latencies in seconds. |
etcd_request_duration_seconds_count | The count of ETCD request latencies in seconds. |
etcd_request_duration_seconds_sum | The sum of ETCD request latencies in seconds. |
etcd_request_errors_total | The total number of failed ETCD requests. |
etcd_requests_total | The total number of ETCD requests. |
etcd_watcher_channel_length | The channel length of the ETCD watcher. |
etcd_watcher_received_events | The number of events received by the ETCD watcher. |
etcd_watcher_sended_events_latency_milliseconds_bucket | The distribution of event sending latencies of the ETCD watcher in milliseconds. |
etcd_watcher_sended_events_latency_milliseconds_count | The count of event sending latencies of the ETCD watcher in milliseconds. |
etcd_watcher_sended_events_latency_milliseconds_sum | The sum of event sending latencies of the ETCD watcher in milliseconds. |
field_validation_request_duration_seconds_bucket | The distribution of field validation request latencies in seconds. |
field_validation_request_duration_seconds_count | The count of field validation request latencies in seconds. |
field_validation_request_duration_seconds_sum | The sum of field validation request latencies in seconds. |
get_token_count | The number of obtained tokens. |
get_token_fail_count | The number of token obtaining failures. |
grpc_client_handled_total | The total number of requests handled by the gRPC client. |
grpc_client_msg_received_total | The total number of messages received by the gRPC client. |
grpc_client_msg_sent_total | The total number of messages sent by the gRPC client. |
grpc_client_started_total | The total number of gRPC client startups. |
http_request_duration_microseconds | The HTTP request latency in microseconds. |
http_request_size_bytes | The HTTP request size in bytes. |
http_requests_total | The total number of HTTP requests. |
http_response_size_bytes | The HTTP response body size in bytes. |
job | The job name. |
job_instance_mode | The job instance mode. |
kube_apiserver_clusterip_allocator_allocated_ips | Kubernetes APIServer: The number of allocated cluster IP addresses. |
kube_apiserver_clusterip_allocator_allocation_errors_total | Kubernetes APIServer: The total number of errors that occurred in cluster IP address allocations. |
kube_apiserver_clusterip_allocator_allocation_total | Kubernetes APIServer: The total number of cluster IP address allocations. |
kube_apiserver_clusterip_allocator_available_ips | Kubernetes APIServer: The number of available cluster IP addresses. |
kube_apiserver_nodeport_allocator_allocated_ports | Kubernetes APIServer: The number of allocated node ports. |
kube_apiserver_nodeport_allocator_allocation_errors_total | Kubernetes APIServer: The total number of errors that occurred in node port allocations. |
kube_apiserver_nodeport_allocator_allocation_total | Kubernetes APIServer: The total number of node port allocations. |
kube_apiserver_nodeport_allocator_available_ports | Kubernetes APIServer: The number of available node ports. |
kube_apiserver_pod_logs_backend_tls_failure_total | Kubernetes APIServer: The total number of pod/log requests that failed due to TLS verification errors. |
kube_apiserver_pod_logs_insecure_backend_total | Kubernetes APIServer: The total number of insecure pod/log requests. |
kube_apiserver_pod_logs_pods_logs_backend_tls_failure_total | Kubernetes APIServer: The total number of pod/log requests that failed due to TLS verification errors. |
kube_apiserver_pod_logs_pods_logs_insecure_backend_total | Kubernetes APIServer: The total number of insecure pod/log requests. |
kubelet_container_log_filesystem_used_bytes | Kubelet: The space of the file system used by container logs in bytes. |
kubelet_node_name | Kubelet: The node name. |
kubelet_pleg_relist_duration_seconds_bucket | Kubelet: The distribution of PLEG relisting durations in seconds. |
kubelet_pod_worker_duration_seconds_bucket | Kubelet: The distribution of Pod worker relisting durations in seconds. |
kubelet_volume_stats_available_bytes | Kubelet: The number of available bytes in the volume. |
kubelet_volume_stats_capacity_bytes | Kubelet: The volume capacity in bytes. |
kubelet_volume_stats_inodes | Kubelet: The number of available inodes in the volume. |
kubelet_volume_stats_inodes_free | Kubelet: The number of idle inodes in the volume. |
kubelet_volume_stats_inodes_used | Kubelet: The number of used inodes in the volume. |
kubelet_volume_stats_used_bytes | Kubelet: The number of used bytes in the volume. |
kubernetes_build_info | The Kubernetes build information. |
kubernetes_feature_enabled | Specifies that Kubernetes features are enabled. |
last_list_all_response_size_in_bytes | The total size of all response bodies in the recent list in bytes. |
memory_utilization_byte | The used memory in bytes. |
node_authorizer_graph_actions_duration_seconds_bucket | Node authorizer: The distribution of graph operation durations in seconds. |
node_authorizer_graph_actions_duration_seconds_count | Node authorizer: The count of graph operation durations in seconds. |
node_authorizer_graph_actions_duration_seconds_sum | Node authorizer: The sum of graph operation durations in seconds. |
pod_security_evaluations_total | The total number of pod security evaluations. |
pod_security_exemptions_total | The total number of pod security exemptions. |
registered_metric_total | The total number of registered metrics. |
registered_metrics_total | The total number of registered metrics. |
rest_client_exec_plugin_certificate_rotation_age_bucket | REST client plug-in: The distribution of certificate rotation ages in seconds. |
rest_client_exec_plugin_certificate_rotation_age_count | REST client plug-in: The count of certificate rotation ages in seconds. |
rest_client_exec_plugin_certificate_rotation_age_sum | REST client plug-in: The sum of certificate rotation ages in seconds. |
rest_client_exec_plugin_ttl_seconds | REST client plug-in: The time to live (TTL) of the certificate in seconds. |
rest_client_request_duration_seconds_bucket | The distribution of REST client request durations in seconds. |
rest_client_request_duration_seconds_count | The count of REST client request durations in seconds. |
rest_client_request_duration_seconds_sum | The sum of REST client request durations in seconds. |
rest_client_request_latency_seconds_bucket | The total of REST client request latencies in seconds. |
rest_client_request_size_bytes_bucket | The distribution of REST client request-body sizes in bytes. |
rest_client_request_size_bytes_count | The count of REST client request-body sizes in bytes. |
rest_client_request_size_bytes_sum | The sum of REST client request-body sizes in bytes. |
rest_client_requests_total | The number of REST client requests. |
rest_client_response_size_bytes_bucket | The distribution of REST client response-body sizes in bytes. |
rest_client_response_size_bytes_count | The count of REST client response-body sizes in bytes. |
rest_client_response_size_bytes_sum | The sum of REST client response-body sizes in bytes. |
rest_client_transport_cache_entries | The number of transport entries of the REST client. |
rest_client_transport_create_calls_total | The total number of transport creation calls of the REST client. |
scheduler_pending_pods | Scheduler: The number of pods to be scheduled. |
scheduler_pod_scheduling_attempts_bucket | Scheduler: The distribution of pod scheduling attempts. |
scheduler_scheduler_cache_size | The scheduler cache size. |
serviceaccount_invalid_legacy_auto_token_uses_total | The total number of uses of invalid legacy automatic service account tokens. |
serviceaccount_legacy_auto_token_uses_total | The total number of uses of legacy automatic service account tokens. |
serviceaccount_legacy_manual_token_uses_total | The total number of uses of legacy manual service account tokens. |
serviceaccount_legacy_tokens_total | The total number of legacy service account tokens. |
serviceaccount_stale_tokens_total | The total number of stale service account tokens. |
serviceaccount_valid_tokens_total | The total number of valid service account tokens. |
ssh_tunnel_open_count | The number of opened Secure Shell (SSH) tunnels. |
ssh_tunnel_open_fail_count | The number of SSH tunnels that failed to be opened. |
up | The connectivity of metric collection. |
watch_cache_capacity | The capacity of the monitoring cache. |
watch_cache_capacity_decrease_total | The increasing capacity of the monitoring cache. |
watch_cache_capacity_increase_total | The decreasing capacity of the monitoring cache. |
workqueue_adds_total | The total number of additions to the work queue. |
workqueue_depth | The work queue depth. |
workqueue_longest_running_processor_seconds | The longest running processor time in the work queue in seconds. |
workqueue_queue_duration_seconds_bucket | The distribution of queueing durations in the work queue in seconds. |
workqueue_queue_duration_seconds_count | The count of queueing durations in the work queue in seconds. |
workqueue_queue_duration_seconds_sum | The sum of queueing durations in the work queue in seconds. |
workqueue_retries_total | The total number of retries in the work queue. |
workqueue_unfinished_work_seconds | The duration of unfinished work in the work queue in seconds. |
workqueue_work_duration_seconds_bucket | The distribution of work durations in the work queue in seconds. |
workqueue_work_duration_seconds_count | The count of work durations in the work queue in seconds. |
workqueue_work_duration_seconds_sum | The sum of work durations in the work queue in seconds. |
Node Exporter (job name: node-exporter)
Metric | Description |
aliyun_prometheus_agent_append_duration_seconds | The duration of the Prometheus agent append operations in seconds. |
aliyun_prometheus_agent_job_discovery_status | The discovery status of the Prometheus agent collection jobs. |
aliyun_prometheus_agent_scrapes_by_target_total | The total number of scrapes by the Prometheus agent per target. |
aliyun_prometheus_agent_target_info | The target information of the Prometheus agent. |
job | The job name. |
node_boot_time_seconds | The node startup duration in seconds. |
node_context_switches_total | The total number of context switches on the node. |
node_cpu_seconds_total | The total CPU time consumed on the node. |
node_disk_io_now | The current disk I/O of the node. |
node_disk_io_time_seconds_total | The total disk I/O duration of the node in seconds. |
node_disk_io_time_weighted_seconds_total | The total weighted disk I/O time of the node in seconds. |
node_disk_read_bytes_total | The total number of bytes read from the disk of the node. |
node_disk_read_time_seconds_total | The total disk read time of the node in seconds. |
node_disk_reads_completed_total | The total number of complete disk reads of the node. |
node_disk_reads_merged_total | The total number of merged disk reads of the node. |
node_disk_write_time_seconds_total | The total disk write time of the node in seconds. |
node_disk_writes_completed_total | The total number of complete disk writes of the node. |
node_disk_writes_merged_total | The total number of merged disk writes of the node. |
node_disk_written_bytes_total | The total number of bytes written to the disk of the node. |
node_exporter_build_info | The build Information of the node exporter. |
node_filefd_allocated | The number of allocated file descriptors of the node. |
node_filefd_maximum | The maximum number of file descriptors of the node. |
node_filesystem_avail_bytes | The available bytes of the node file system. |
node_filesystem_free_bytes | The amount of idle space in the file system of the node in bytes. |
node_filesystem_size_bytes | The total size of the file system of the node in bytes. |
node_intr_total | The total interrupts on the node. |
node_load1 | The 1-minute load on the node. |
node_load15 | The 15-minute load on the node. |
node_load5 | The 5-minute load on the node. |
node_memory_MemAvailable_bytes | The size of available memory on the node (in bytes). |
node_memory_MemFree_bytes | The size of free memory on the node (in bytes). |
node_memory_MemTotal_bytes | The total size of memory on the node (in bytes). |
node_memory_Slab_bytes | The size of Slab memory on the node (in bytes). |
node_memory_SReclaimable_bytes | The size of SReclaimable memory on the node (in bytes). |
node_netstat_Tcp_InErrs | The number of TCP receive errors. |
node_netstat_Tcp_InSegs | The number of TCP segments received. |
node_netstat_Tcp_OutSegs | The number of TCP segments sent. |
node_netstat_Tcp_PassiveOpens | The number of passive TCP connections opened. |
node_netstat_Tcp_RetransSegs | The number of TCP segments retransmitted. |
node_network_receive_bytes_total | The total number of bytes received cumulatively. |
node_network_receive_drop_total | The total number of packets dropped while receiving. |
node_network_receive_errs_total | The total number of receive errors. |
node_network_receive_packets_total | The total number of packets received. |
node_network_transmit_bytes_total | The total number of bytes sent cumulatively. |
node_network_transmit_drop_total | The total number of packets sent but dropped. |
node_network_transmit_errs_total | The total number of send errors. |
node_network_transmit_packets_total | The total number of packets sent. |
node_network_up | Indicates whether the network interface is enabled. |
node_processes_max_processes | The maximum number of processes. |
node_processes_max_threads | The maximum number of threads. |
node_processes_pids | The number of process IDs. |
node_processes_state | The distribution of process states. |
node_processes_threads | The number of threads. |
node_schedstat_running_seconds_total | The total seconds run in scheduling statistics. |
node_sockstat_TCP_alloc | The number of TCP sockets allocated. |
node_sockstat_TCP_inuse | The number of TCP sockets in use. |
node_sockstat_TCP_mem | The amount of memory used by TCP sockets. |
node_sockstat_TCP_mem_bytes | The number of bytes of memory used by TCP sockets. |
node_sockstat_TCP_tw | The number of TCP sockets in the TIME_WAIT state. |
node_time_zone_offset_seconds | The time zone offset in seconds. |
node_timex_offset_seconds | The time offset in seconds. |
node_timex_sync_status | The synchronization status of the clock. |
node_uname_info | The system information (uname). |
node_vmstat_pgfault | The number of page faults in VM statistics. |
node_vmstat_pgmajfault | The number of major page faults in VM statistics. |
node_vmstat_pgpgin | The number of page ins in VM statistics. |
node_vmstat_pgpgout | The number of page outs in VM statistics. |
up | The connectivity of metric collection. |
kube-state-metrics (job name: _kube-state-metrics)
Metric | Description |
kube_configmap_info | The information about the ConfigMap. |
kube_cronjob_annotations | The annotations of the Kubernetes CronJob. |
kube_cronjob_created | The creation time of the Kubernetes CronJob. |
kube_cronjob_info | The information about the Kubernetes CronJob. |
kube_cronjob_labels | The labels of the Kubernetes CronJob. |
kube_cronjob_metadata_resource_version | The metadata resource version of the Kubernetes CronJob. |
kube_cronjob_next_schedule_time | The next schedule time of the Kubernetes CronJob. |
kube_cronjob_spec_failed_job_history_limit | The failed job history limit of the Kubernetes CronJob. |
kube_cronjob_spec_starting_deadline_seconds | The starting deadline seconds of the Kubernetes CronJob. |
kube_cronjob_spec_successful_job_history_limit | The successful job history limit of the Kubernetes CronJob. |
kube_cronjob_spec_suspend | The suspend status of the Kubernetes CronJob. |
kube_cronjob_status_active | The number of active jobs of the Kubernetes CronJob. |
kube_cronjob_status_last_schedule_time | The last schedule time of the Kubernetes CronJob. |
kube_cronjob_status_last_successful_time | The last successful execution time of the Kubernetes CronJob. |
kube_daemonset_created | The creation time of the Kubernetes DaemonSet. |
kube_daemonset_status_current_number_scheduled | The current number of scheduled nodes for the Kubernetes DaemonSet. |
kube_daemonset_status_desired_number_scheduled | The desired number of scheduled nodes for the Kubernetes DaemonSet |
kube_daemonset_status_number_available | The number of available nodes in the Kubernetes DaemonSet. |
kube_daemonset_status_number_misscheduled | The number of missed scheduled nodes in the Kubernetes DaemonSet. |
kube_daemonset_status_number_ready | The number of ready nodes in the Kubernetes DaemonSet. |
kube_daemonset_status_number_unavailable | The number of unavailable nodes in the Kubernetes DaemonSet. |
kube_daemonset_status_updated_number_scheduled | The number of updated scheduled nodes in the Kubernetes DaemonSet |
kube_daemonset_updated_number_scheduled | The number of updated scheduled nodes in the Kubernetes DaemonSet |
kube_deployment_created | The creation time of the Kubernetes Deployment. |
kube_deployment_labels | The labels of the Kubernetes Deployment. |
kube_deployment_metadata_generation | The metadata generation of the Kubernetes Deployment. |
kube_deployment_spec_replicas | The number of replicas specified in the Kubernetes Deployment. |
kube_deployment_spec_strategy_rollingupdate_max_unavailable | The maximum number of unavailable pods during rolling update of the Kubernetes Deployment. |
kube_deployment_status_observed_generation | The observed generation of the Kubernetes Deployment. |
kube_deployment_status_replicas | The total number of replicas in the Kubernetes Deployment. |
kube_deployment_status_replicas_available | The number of available replicas in the Kubernetes Deployment. |
kube_deployment_status_replicas_ready | The number of ready replicas in the Kubernetes Deployment. |
kube_deployment_status_replicas_unavailable | The number of unavailable replicas in the Kubernetes Deployment. |
kube_deployment_status_replicas_updated | The number of updated replicas in the Kubernetes Deployment. |
kube_horizontalpodautoscaler_info | The information about the Kubernetes HorizontalPodAutoscaler. |
kube_horizontalpodautoscaler_labels | The labels of the Kubernetes HorizontalPodAutoscaler. |
kube_horizontalpodautoscaler_metadata_generation | The metadata generation of the Kubernetes HorizontalPodAutoscaler. |
kube_horizontalpodautoscaler_spec_max_replicas | The maximum number of replicas specified in the Kubernetes HorizontalPodAutoscaler. |
kube_horizontalpodautoscaler_spec_min_replicas | The minimum number of replicas specified in the Kubernetes HorizontalPodAutoscaler. |
kube_horizontalpodautoscaler_spec_target_metric | The target metrics of the Kubernetes HorizontalPodAutoscaler. |
kube_horizontalpodautoscaler_status_condition | The status conditions of the Kubernetes HorizontalPodAutoscaler. |
kube_horizontalpodautoscaler_status_current_replicas | The current number of replicas in the Kubernetes HorizontalPodAutoscaler. |
kube_horizontalpodautoscaler_status_desired_replicas | The desired number of replicas in the Kubernetes HorizontalPodAutoscaler. |
kube_hpa_labels | The labels of the Kubernetes HorizontalPodAutoscaler. |
kube_hpa_metadata_generation | The metadata generation of the Kubernetes HorizontalPodAutoscaler. |
kube_hpa_spec_max_replicas | The maximum number of replicas specified in the Kubernetes HorizontalPodAutoscaler. |
kube_hpa_spec_min_replicas | The minimum number of replicas specified in the Kubernetes HorizontalPodAutoscaler. |
kube_hpa_spec_target_metric | The target metrics of the Kubernetes HorizontalPodAutoscaler. |
kube_hpa_status_condition | The status conditions of the Kubernetes HorizontalPodAutoscaler. |
kube_hpa_status_current_replicas | The current number of replicas in the Kubernetes HorizontalPodAutoscaler. |
kube_hpa_status_desired_replicas | The desired number of replicas in the Kubernetes HorizontalPodAutoscaler. |
kube_ingress_info | The information about the Ingress. |
kube_job_created | The information about the Ingress |
kube_job_failed | The total number of failures for the job. |
kube_job_info | The information about the Job. |
kube_job_spec_completions | The number of completed jobs. |
kube_job_status_active | The number of active jobs. |
kube_job_status_failed | The number of failed jobs. |
kube_job_status_succeeded | The number of successful jobs. |
kube_namespace_created | The creation time of the namespace. |
kube_namespace_labels | The labels of the namespace. |
kube_namespace_status_phase | The phase of the namespace status. |
kube_node_info | The information about the node. |
kube_node_labels | The labels of the node. |
kube_node_spec_taint | The taint configurations of the node. |
kube_node_spec_unschedulable | The unschedulable flag of the node. |
kube_node_status_allocatable | The allocatable resources of the node. |
kube_node_status_allocatable_cpu_cores | The allocatable CPU cores of the node. |
kube_node_status_allocatable_memory_bytes | The allocatable memory bytes of the node. |
kube_node_status_allocatable_pods | The allocatable number of Pods on the node. |
kube_node_status_capacity | The capacity of the node. |
kube_node_status_capacity_cpu_cores | The capacity CPU cores of the node. |
kube_node_status_capacity_memory_bytes | The capacity memory bytes of the node. |
kube_node_status_capacity_pods | The capacity number of Pods on the node. |
kube_node_status_condition | The status conditions of the node. |
kube_persistentvolume_status_phase | The phase of the PersistentVolume (PV) status. |
kube_persistentvolumeclaim_info | The information about the PersistentVolumeClaim (PVC). |
kube_persistentvolumeclaim_resource_requests_storage_bytes | The storage resource request of the PVC. |
kube_persistentvolumeclaim_status_phase | The phase of the PVC status. |
kube_pod_completion_time | The completion time of the Pod. |
kube_pod_container_info | The information about the Pod container. |
kube_pod_container_resource_limits | The resource limit of the Pod container. |
kube_pod_container_resource_limits_cpu_cores | The CPU core limit of the Pod container. |
kube_pod_container_resource_limits_memory_bytes | The memory byte limit of the Pod container. |
kube_pod_container_resource_requests | The resource requests of the Pod container. |
kube_pod_container_resource_requests_cpu_cores | The CPU core requests of the Pod container |
kube_pod_container_resource_requests_memory_bytes | The memory byte requests of the Pod container |
kube_pod_container_status_last_terminated_reason | The last termination reason of the Pod container. |
kube_pod_container_status_ready | The ready status of the Pod container. |
kube_pod_container_status_restarts_total | The total number of restarts for the Pod container. |
kube_pod_container_status_running | The running status of the Pod container. |
kube_pod_container_status_terminated | The terminated status of the Pod container. |
kube_pod_container_status_terminated_reason | The termination reason of the Pod container. |
kube_pod_container_status_waiting | The waiting status of the Pod container. |
kube_pod_container_status_waiting_reason | The waiting reason of the Pod container. |
kube_pod_created | The creation time of the Pod. |
kube_pod_deletion_timestamp | The deletion timestamp of the Pod. |
kube_pod_info | The information about the Pod. |
kube_pod_labels | The labels of the Pod. |
kube_pod_owner | The owner of the Pod. |
kube_pod_start_time | The start time of the Pod. |
kube_pod_status_container_ready_time | The container ready time of the Pod status. |
kube_pod_status_initialized_time | The initialization completion time of the Pod status. |
kube_pod_status_phase | The phase of the Pod status. |
kube_pod_status_ready | The ready status of the Pod. |
kube_pod_status_ready_time | The ready time of the Pod. |
kube_pod_status_reason | The reason for the Pod status. |
kube_pod_status_scheduled_time | The scheduling time of the Pod. |
kube_pod_status_unschedulable | The unschedulable flag of the Pod. |
kube_replicaset_owner | The owner of the ReplicaSet. |
kube_replicaset_status_ready_replicas | The number of ready replicas in the ReplicaSet. |
kube_resource_relationship | The relationships between resources. |
kube_resourcequota | The resource quota. |
kube_resourcequota_created | The creation time of the resource quota. |
kube_secret_info | The information about the secret. |
kube_service_info | The information about the service. |
kube_service_spec_type | The type specification of the service. |
kube_service_status_load_balancer_ingress | The load balancer ingress information of the service status. |
kube_statefulset_created | The creation time of the StatefulSet. |
kube_statefulset_metadata_generation | The metadata generation of the StatefulSet. |
kube_statefulset_replicas | The number of replicas in the StatefulSet. |
kube_statefulset_status_replicas | The number of replicas in the state of the StatefulSet. |
kube_statefulset_status_replicas_available | The number of available replicas in the state of the StatefulSet. |
kube_statefulset_status_replicas_ready | The number of ready replicas in the state of the StatefulSet. |
kube_statefulset_status_replicas_updated | The number of updated replicas in the state of the StatefulSet. |
rest_client_requests_total | The number of REST client requests. |
up | The connectivity of metric collection. |
workqueue_adds_total | The total number of additions to the work queue. |
workqueue_depth | The work queue depth. |
workqueue_queue_duration_seconds_bucket | The distribution of queue duration in seconds for the work queue. |
kube-events (job name: _arms/kube-event)
Metric | Description |
aliyun_prometheus_agent_append_duration_seconds | The duration of the Prometheus agent append operations in seconds. |
aliyun_prometheus_agent_job_discovery_status | The discovery status of the Prometheus agent collection jobs. |
aliyun_prometheus_agent_scrape_custom_error | The number of custom collection errors of the Prometheus agent. |
aliyun_prometheus_agent_scrapes_by_target_total | The total number of scrapes by the Prometheus agent per target. |
aliyun_prometheus_agent_target_info | The target information of the Prometheus agent. |
eventer_events_error_total | The total number of event processing errors. |
eventer_events_normal_total | The total number of normal events. |
eventer_events_warning_total | The total number of warning events. |
eventer_exporter_duration_milliseconds_count | The count of samples for exporter duration in milliseconds. |
eventer_exporter_duration_milliseconds_sum | The sum of exporter duration in milliseconds. |
eventer_manager_last_time_seconds | The last operation time of the event manager in seconds. |
eventer_scraper_duration_milliseconds_count | The count of scraper duration in milliseconds. |
eventer_scraper_duration_milliseconds_sum | The sum of scraper duration in milliseconds. |
eventer_scraper_events_total_number | The total number of events scraped. |
eventer_scraper_last_time_seconds | The last execution time of the scraper in seconds. |
up | The connectivity of metric collection. |
CoreDNS (job name: arms-ack-coredns)
Metric | Description |
aliyun_prometheus_agent_append_duration_seconds | The duration of the Prometheus agent append operations in seconds. |
aliyun_prometheus_agent_job_discovery_status | The discovery status of the Prometheus agent collection jobs. |
aliyun_prometheus_agent_scrape_custom_error | The number of custom collection errors of the Prometheus agent. |
aliyun_prometheus_agent_scrapes_by_target_total | The total number of scrapes by the Prometheus agent per target. |
aliyun_prometheus_agent_target_info | The target information of the Prometheus agent. |
coredns_autopath_success_count_total | The total number of successful automatic path resolutions in CoreDNS. |
coredns_autopath_success_total | The total number of successful automatic path resolutions in CoreDNS. |
coredns_build_info | The build information of CoreDNS. |
coredns_cache_drops_total | The total number of cache drops in CoreDNS. |
coredns_cache_entries | The number of cache entries in CoreDNS. |
coredns_cache_evictions_total | The total number of cache evictions in CoreDNS. |
coredns_cache_hits_total | The total number of cache hits in CoreDNS. |
coredns_cache_misses_total | The total number of cache misses in CoreDNS. |
coredns_cache_requests_total | The total number of cache requests in CoreDNS. |
coredns_cache_size | The size of the cache in CoreDNS. |
coredns_dns_do_requests_total | The total number of DNS DO requests in CoreDNS. |
coredns_dns_request_count_total | The total count of DNS requests in CoreDNS. |
coredns_dns_request_duration_seconds_bucket | The percentile of DNS request durations in seconds in CoreDNS. |
coredns_dns_request_duration_seconds_count | The count of DNS request durations in seconds in CoreDNS. |
coredns_dns_request_duration_seconds_sum | The sum of DNS request durations in seconds in CoreDNS. |
coredns_dns_request_size_bytes_bucket | The percentile of DNS request sizes in bytes in CoreDNS. |
coredns_dns_request_size_bytes_count | The count of DNS request sizes in bytes in CoreDNS. |
coredns_dns_request_size_bytes_sum | The sum of DNS request sizes in bytes in CoreDNS. |
coredns_dns_request_type_count_total | The total count of DNS request types in CoreDNS. |
coredns_dns_requests_total | The total number of DNS requests in CoreDNS. |
coredns_dns_response_rcode_count_total | The total count of DNS response codes in CoreDNS. |
coredns_dns_response_size_bytes_bucket | The percentile of DNS response sizes in bytes in CoreDNS. |
coredns_dns_response_size_bytes_count | The count of DNS response sizes in bytes in CoreDNS. |
coredns_dns_response_size_bytes_sum | The sum of DNS response sizes in bytes in CoreDNS. |
coredns_dns_responses_total | The total number of DNS responses in CoreDNS. |
coredns_forward_conn_cache_hits_total | The total number of cache hits for forwarded connections in CoreDNS. |
coredns_forward_conn_cache_misses_total | The total number of cache misses for forwarded connections in CoreDNS. |
coredns_forward_healthcheck_broken_total | The total number of health check failures for forwarded connections in CoreDNS. |
coredns_forward_healthcheck_failure_count_total | The total count of health check failures for forwarded connections in CoreDNS. |
coredns_forward_healthcheck_failures_total | The total number of health check failures for forwarded connections in CoreDNS. |
coredns_forward_max_concurrent_rejects_total | The total number of maximum concurrent rejections for forwarded connections in CoreDNS. |
coredns_forward_request_count_total | The total count of forwarded requests in CoreDNS. |
coredns_forward_request_duration_seconds_bucket | The percentile of forwarded request durations in seconds in CoreDNS. |
coredns_forward_request_duration_seconds_count | The count of forwarded request durations in seconds in CoreDNS. |
coredns_forward_request_duration_seconds_sum | The sum of forwarded request durations in seconds in CoreDNS. |
coredns_forward_requests_total | The total number of forwarded requests in CoreDNS. |
coredns_forward_response_rcode_count_total | The total count of forwarded response codes in CoreDNS. |
coredns_forward_responses_total | The total number of forwarded responses in CoreDNS. |
coredns_forward_sockets_open | The number of open sockets for forwarded connections in CoreDNS. |
coredns_health_request_duration_seconds_bucket | The percentile of health check request durations in seconds in CoreDNS. |
coredns_health_request_duration_seconds_count | The count of health check request durations in seconds in CoreDNS. |
coredns_health_request_duration_seconds_sum | The sum of health check request durations in seconds in CoreDNS. |
coredns_health_request_failures_total | The total number of health check request failures in CoreDNS. |
coredns_hosts_entries | The number of host entries in CoreDNS. |
coredns_hosts_reload_timestamp_seconds | The timestamp of the last host reload in CoreDNS in seconds. |
coredns_kubernetes_dns_programming_duration_seconds_bucket | The percentile of Kubernetes DNS programming durations in seconds in CoreDNS. |
coredns_kubernetes_dns_programming_duration_seconds_count | The count of Kubernetes DNS programming durations in seconds in CoreDNS. |
coredns_kubernetes_dns_programming_duration_seconds_sum | The sum of Kubernetes DNS programming durations in seconds in CoreDNS. |
coredns_local_localhost_requests_total | The total number of localhost requests in CoreDNS. |
coredns_panic_count_total | The total number of panics in CoreDNS. |
coredns_panics_total | The total count of panics in CoreDNS. |
coredns_plugin_enabled | The enabling status of CoreDNS plugins. |
coredns_reload_failed_total | The total number of reload failures in CoreDNS. |
coredns_reload_version_info | The version information of CoreDNS reloads. |
coredns_template_matches_total | The total number of template matches in CoreDNS. |
up | The connectivity of metric collection. |
CSI clusters (job name: k8s-csi-cluster-pv)
Metric | Description |
alibaba_cloud_storage_operator_build_info | The build information about the storage operations system on Alibaba Cloud. |
aliyun_prometheus_agent_append_duration_seconds | The duration of the Prometheus agent append operations in seconds. |
aliyun_prometheus_agent_job_discovery_status | The discovery status of the Prometheus agent collection jobs. |
aliyun_prometheus_agent_scrape_custom_error | The number of custom collection errors of the Prometheus agent. |
aliyun_prometheus_agent_scrapes_by_target_total | The total number of scrapes by the Prometheus agent per target. |
aliyun_prometheus_agent_target_info | The target information of the Prometheus agent. |
cluster_pv_detail_num_total | The total number of detailed PV information in the cluster. |
cluster_pv_status_num_total | The total number of PV states in the cluster. |
cluster_pvc_detail_num_total | The total number of detailed PVC information in the cluster. |
cluster_pvc_status_num_total | The total number of PVC states in the cluster. |
cluster_scrape_collector_duration_seconds | The duration of the cluster scrape collector in seconds. |
cluster_scrape_collector_success | The number of successful scrapes by the cluster collector. |
up | The connectivity of metric collection. |
CSI nodes (job name: k8s-csi-node-pv)
Metric | Description |
alibaba_cloud_csi_driver_build_info | The build information about the Container Storage Interface (CSI) driver. |
aliyun_prometheus_agent_append_duration_seconds | The duration of the Prometheus agent append operations in seconds. |
aliyun_prometheus_agent_job_discovery_status | The discovery status of the Prometheus agent collection jobs. |
aliyun_prometheus_agent_scrape_custom_error | The number of custom collection errors of the Prometheus agent. |
aliyun_prometheus_agent_scrapes_by_target_total | The total number of scrapes by the Prometheus agent per target. |
aliyun_prometheus_agent_target_info | The target information of the Prometheus agent. |
cluster_scrape_collector_duration_seconds | The duration of the cluster scrape collector in seconds. |
cluster_scrape_collector_success | The number of successful scrapes by the cluster collector. |
container_fs_available_bytes | The available bytes of the container file system. |
container_fs_inodes_free | The number of available inodes in the container file system. |
container_fs_inodes_total | The total number of inodes in the container file system. |
container_fs_inodes_used | The number of used inodes in the container file system. |
container_fs_limit_bytes | The limit of bytes in the container file system. |
container_fs_usage_bytes | The used bytes in the container file system. |
ephemeral_storage_pod_available_bytes | The available bytes of ephemeral storage Pod. |
ephemeral_storage_pod_inodes_free | The available inodes of ephemeral storage Pod. |
ephemeral_storage_pod_inodes_total | The total number of inodes in the ephemeral storage Pod. |
ephemeral_storage_pod_inodes_used | The used inodes in the ephemeral storage Pod. |
ephemeral_storage_pod_limit_bytes | The limit of bytes in the ephemeral storage Pod. |
ephemeral_storage_pod_usage_bytes | The used bytes in the ephemeral storage Pod. |
node_volume_backend_posix_access_total_counter | The total counter for Portable Operating System Interface (POSIX) access to the node volume backend. |
node_volume_backend_posix_getattr_total_counter | The total counter for POSIX getattr calls to the node volume backend. |
node_volume_backend_posix_getmode_total_counter | The total counter for POSIX getmode operations to the node volume backend. |
node_volume_backend_posix_link_total_counter | The total counter for POSIX link operations to the node volume backend. |
node_volume_backend_posix_lookup_total_counter | The total counter for POSIX lookup operations to the node volume backend. |
node_volume_backend_posix_mknod_total_counter | The total counter for POSIX mknod operations to the node volume backend. |
node_volume_backend_posix_readdir_total_counter | The total counter for POSIX readdir operations to the node volume backend. |
node_volume_backend_posix_readlink_total_counter | The total counter for POSIX readlink operations to the node volume backend. |
node_volume_backend_posix_remove_total_counter | The total counter for POSIX remove operations to the node volume backend. |
node_volume_backend_posix_rename_total_counter | The total counter for POSIX rename operations to the node volume backend. |
node_volume_backend_posix_setattr_total_counter | The total counter for POSIX setattr operations to the node volume backend. |
node_volume_backend_posix_statfs_total_counter | The total counter for POSIX statfs operations to the node volume backend. |
node_volume_backend_read_bytes_total_counter | The total counter for bytes read from the node volume backend. |
node_volume_backend_read_completed_total_counter | The total number of completed read requests to the node volume backend. |
node_volume_backend_read_time_milliseconds_total_counter | The total milliseconds spent on reads to the node volume backend. |
node_volume_backend_write_bytes_total_counter | The total number of bytes written to the node volume backend. |
node_volume_backend_write_completed_total_counter | The total number of completed write requests to the node volume backend. |
node_volume_backend_write_time_milliseconds_total_counter | The total milliseconds spent on writes to the node volume backend. |
node_volume_capacity_bytes_available | The available capacity of the node volume in bytes. |
node_volume_capacity_bytes_available_counter | The available capacity of the node volume in bytes. |
node_volume_capacity_bytes_total | The total capacity of the node volume in bytes. |
node_volume_capacity_bytes_total_counter | The total capacity of the node volume in bytes (counter). |
node_volume_capacity_bytes_used | The used capacity of the node volume in bytes. |
node_volume_capacity_bytes_used_counter | The used capacity of the node volume in bytes (counter). |
node_volume_hot_spot_head_file_top | The top hot spot files in the node volume. |
node_volume_hot_spot_read_file_top | The top files read in the node volume hot spots. |
node_volume_hot_spot_write_file_top | The top files written in the node volume hot spots. |
node_volume_inode_bytes_available_counter | The counter for available inode bytes in the node volume. |
node_volume_inode_bytes_total_counter | The counter for total inode bytes in the node volume. |
node_volume_inode_bytes_used_counter | The counter for used inode bytes in the node volume. |
node_volume_inodes_available | The number of available inodes in the node volume. |
node_volume_inodes_total | The total number of inodes in the node volume. |
node_volume_inodes_used | The number of used inodes in the node volume. |
node_volume_io_now | The current I/O count in the node volume. |
node_volume_io_time_seconds_total | The total seconds spent on I/O in the node volume. |
node_volume_oss_delete_object_total_counter | The total counter for Object Storage Service (OSS) object deletions in the node volume. |
node_volume_oss_get_object_total_counter | The total counter for OSS object gets in the node volume. |
node_volume_oss_head_object_total_counter | The total counter for OSS object metadata in the node volume. |
node_volume_oss_post_object_total_counter | The total counter for OSS object POSTs in the node volume. |
node_volume_oss_put_object_total_counter | The total counter for OSS object PUTs in the node volume. |
node_volume_posix_access_total_counter | The total counter for POSIX accesses in the node volume. |
node_volume_posix_chmod_total_counter | The total counter for POSIX chmod operations in the node volume. |
node_volume_posix_chown_total_counter | The total counter for POSIX chown operations in the node volume. |
node_volume_posix_create_total_counter | The total counter for POSIX creations in the node volume. |
node_volume_posix_flush_total_counter | The total counter for POSIX flushes in the node volume. |
node_volume_posix_fsync_total_counter | The total counter for POSIX fsyncs in the node volume. |
node_volume_posix_mkdir_total_counter | The total counter for POSIX mkdir operations in the node volume. |
node_volume_posix_open_total_counter | The total counter for POSIX opens in the node volume. |
node_volume_posix_opendir_total_counter | The total counter for POSIX opendir operations in the node volume. |
node_volume_posix_read_total_counter | The total counter for POSIX reads in the node volume. |
node_volume_posix_readdir_total_counter | The total counter for POSIX readdir operations in the node volume. |
node_volume_posix_release_total_counter | The total counter for POSIX releases in the node volume. |
node_volume_posix_rename_total_counter | The total counter for POSIX renames in the node volume. |
node_volume_posix_rmdir_total_counter | The total counter for POSIX rmdir operations in the node volume. |
node_volume_posix_truncate_total_counter | The total counter for POSIX truncate operations in the node volume. |
node_volume_posix_write_total_counter | The total counter for POSIX writes in the node volume. |
node_volume_read_bytes_total | The total number of bytes read from the node volume. |
node_volume_read_bytes_total_counter | The total number of bytes read from the node volume (counter). |
node_volume_read_completed_total | The total number of completed read requests to the node volume. |
node_volume_read_completed_total_counter | The total number of completed read requests to the node volume (counter). |
node_volume_read_merged_total | The total number of merged read operations in the node volume. |
node_volume_read_queue_time_milliseconds_total | The total milliseconds spent on read queue in the node volume. |
node_volume_read_rtt_time_milliseconds_total | The total milliseconds spent on read round-trip time in the node volume. |
node_volume_read_sent_bytes_total | The total number of bytes sent during reads in the node volume. |
node_volume_read_time_milliseconds_total | The total milliseconds spent on reads in the node volume. |
node_volume_read_time_milliseconds_total_counter | The total milliseconds spent on reads in the node volume (counter). |
node_volume_read_timeouts_total | The total number of read timeouts in the node volume. |
node_volume_read_transmissions_total | The total number of read transmissions in the node volume. |
node_volume_vg_free_bytes | The free bytes in the volume group (VG) of the node volume. |
node_volume_vg_size_bytes | The total bytes in the VG of the node volume. |
node_volume_write_bytes_total | The total number of bytes written to the node volume. |
node_volume_write_bytes_total_counter | The total number of bytes written to the node volume (counter). |
node_volume_write_completed_total | The total number of completed write requests to the node volume. |
node_volume_write_completed_total_counter | The total number of completed write requests to the node volume (counter). |
node_volume_write_merged_total | The total number of merged write operations in the node volume. |
node_volume_write_queue_time_milliseconds_total | The total milliseconds spent on write queue in the node volume. |
node_volume_write_recv_bytes_total | The total number of bytes received during writes in the node volume. |
node_volume_write_rtt_time_milliseconds_total | The total milliseconds spent on write round-trip time in the node volume. |
node_volume_write_time_milliseconds_total | The total milliseconds spent on writes in the node volume. |
node_volume_write_time_milliseconds_total_counter | The total milliseconds spent on writes in the node volume (counter). |
node_volume_write_timeouts_total | The total number of write timeouts in the node volume. |
node_volume_write_transmissions_total | The total number of write transmissions in the node volume. |
up | The connectivity of metric collection. |
GPU-Exporter (job name: gpu-exporter)
Metric | Description |
DCGM_CUSTOM_ALLOCATE_MODE | The mode in which the node runs. A value of 0 indicates that no GPU Pods are running on the node. A value of 1 indicates that the GPU Pods on the current node run in an exclusive GPU mode. A value of 2 indicates that the GPU Pods on the current node run in a shared GPU mode. |
DCGM_CUSTOM_CONTAINER_CP_ALLOCATED | The ratio of the GPU computing power allocated to the container to the total computing power of the GPU. The value ranges from 0 to 1. In exclusive GPU mode or in shared GPU mode in which the container requests only GPU memory, the value of this metric is 0, which indicates that the allocation of GPU computing power is unlimited. For example, if a GPU provides a total of 100 compute units (CUs) of GPU computing power and allocates 30 CUs to a container, the ratio of the GPU computing power allocated to the container is calculated by using the following formula: 30/100 = 0.3. |
DCGM_CUSTOM_CONTAINER_MEM_ALLOCATED | The amount of GPU memory allocated to the container. |
DCGM_CUSTOM_DEV_FB_ALLOCATED | The ratio of the allocated GPU memory to the total memory of the GPU. The value ranges from 0 to 1. |
DCGM_CUSTOM_DEV_FB_TOTAL | The total memory of the GPU. |
DCGM_CUSTOM_ILLEGAL_PROCESS_DECODE_UTIL | The illegal process decode utilization. |
DCGM_CUSTOM_ILLEGAL_PROCESS_ENCODE_UTIL | The illegal process encode utilization. |
DCGM_CUSTOM_ILLEGAL_PROCESS_MEM_COPY_UTIL | The memory copy utilization of illegal processes. |
DCGM_CUSTOM_ILLEGAL_PROCESS_MEM_USED | The memory used by illegal processes. |
DCGM_CUSTOM_ILLEGAL_PROCESS_SM_UTIL | The SM utilization of illegal processes. |
DCGM_CUSTOM_PROCESS_DECODE_UTIL | The decoder utilization of GPU threads. |
DCGM_CUSTOM_PROCESS_ENCODE_UTIL | The encoder utilization of GPU threads. |
DCGM_CUSTOM_PROCESS_MEM_COPY_UTIL | The memory copy utilization of GPU threads. |
DCGM_CUSTOM_PROCESS_MEM_USED | The amount of GPU memory used by GPU threads. |
DCGM_CUSTOM_PROCESS_SM_UTIL | The SM utilization of GPU threads. |
DCGM_FI_DEV_APP_MEM_CLOCK | The memory application clock speed. |
DCGM_FI_DEV_APP_SM_CLOCK | The SM application clock speed. |
DCGM_FI_DEV_BAR1_FREE | The remaining Base Address Register 1 (BAR1). |
DCGM_FI_DEV_BAR1_TOTAL | The total size of device BAR1. |
DCGM_FI_DEV_BAR1_USED | The used BAR1. |
DCGM_FI_DEV_BOARD_LIMIT_VIOLATION | The time of the violation due to board limitations. |
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS | The reasons for clock throttling. |
DCGM_FI_DEV_COUNT | The number of devices. |
DCGM_FI_DEV_DEC_UTIL | The decoder utilization. |
DCGM_FI_DEV_ENC_UTIL | The encoder utilization. |
DCGM_FI_DEV_FB_FREE | The amount of free frame buffer memory. |
DCGM_FI_DEV_FB_USED | The amount of used frame buffer memory. The value of this metric is the same as the value of Memory-Usage returned by the nvidia-smi command. |
DCGM_FI_DEV_GPU_TEMP | The GPU temperature. |
DCGM_FI_DEV_GPU_UTIL | The GPU utilization within a cycle of 1 second or 1/6 second. The cycle varies based on the GPU model. A cycle is a period of time during which one or more kernel functions remain active. This metric only indicates that one or more kernel functions are occupying GPU resources. The metric does not display detailed GPU usage information. |
DCGM_FI_DEV_LOW_UTIL_VIOLATION | The time of the violation due to low utilization. |
DCGM_FI_DEV_MEM_CLOCK | The memory clock speed. |
DCGM_FI_DEV_MEM_COPY_UTIL | The memory bandwidth utilization. For example, the maximum memory bandwidth of NVIDIA V100 is 900 GB/s. If the memory bandwidth used is 450 GB/s, the memory bandwidth utilization is 50%. |
DCGM_FI_DEV_MEMORY_TEMP | The memory temperature. |
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL | The total NVLink bandwidth. |
DCGM_FI_DEV_PCIE_REPLAY_COUNTER | The PCIe replay counter. |
DCGM_FI_DEV_POWER_USAGE | The power usage. |
DCGM_FI_DEV_POWER_VIOLATION | The time of the violation due to power limitations. |
DCGM_FI_DEV_PSTATE | The status of the device power. |
DCGM_FI_DEV_RELIABILITY_VIOLATION | The time of the violation due to board reliability. |
DCGM_FI_DEV_RETIRED_DBE | The number of pages retired due to double bit errors. |
DCGM_FI_DEV_RETIRED_PENDING | The number of pages to be retired. These pages are marked as unavailable due to errors in the GPU memory. |
DCGM_FI_DEV_RETIRED_SBE | The number of pages retired due to single bit errors. |
DCGM_FI_DEV_SM_CLOCK | The SM clock speed. |
DCGM_FI_DEV_SYNC_BOOST_VIOLATION | The time of the violation due to synchronous limit raising. |
DCGM_FI_DEV_THERMAL_VIOLATION | The time of the violation due to thermal limitations. |
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION | The total energy consumed since the driver was last loaded. |
DCGM_FI_DEV_VIDEO_CLOCK | The video clock speed. |
DCGM_FI_DEV_XID_ERRORS | The last XID error that occurred within a period of time. |
DCGM_FI_PROF_DRAM_ACTIVE | The cycle fraction for memory bandwidth utilization when sending data to device memory or receiving data from device memory. The value is an average value within a time interval rather than an instantaneous value. A larger value of this metric indicates higher device memory utilization. If the value is 1 (100%), a DRAM command is executed every cycle within the entire interval. The peak value of the metric can reach 0.8 (80%). If the value of this metric is 0.2 (20%), 20% of the cycles within the time interval are spent reading from or writing to device memory. |
DCGM_FI_PROF_GR_ENGINE_ACTIVE | The percentage of time that the Graphics or Compute engines were active within a time interval. The value indicates the average across all Graphics and Compute engines. A Graphics or Compute engine is considered active when a Graphics or Compute context is bound to a thread and the Graphics or Compute context is in a busy state. |
DCGM_FI_PROF_NVLINK_RX_BYTES | The TX rate of NVLink and the RX rate of NVLink. The bytes transmitted or received exclude the header. The value is an average value within a time interval rather than an instantaneous value. For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum NVLink Gen2 bandwidth is 25 GB/s per direction per link. |
DCGM_FI_PROF_NVLINK_TX_BYTES | The total number of bytes sent through NVLink. |
DCGM_FI_PROF_PCIE_RX_BYTES | The TX rate of PCle and the RX rate of PCIe. The bytes transmitted or received include both the header and payload. The value is an average value within a time interval rather than an instantaneous value. For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum PCIe Gen3 bandwidth is 985 MB/s per lane. |
DCGM_FI_PROF_PCIE_TX_BYTES | The TX rate of PCle and the RX rate of PCIe. The bytes transmitted or received include both the header and payload. The value is an average value within a time interval rather than an instantaneous value. For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum PCIe Gen3 bandwidth is 985 MB/s per lane. |
DCGM_FI_PROF_PIPE_FP16_ACTIVE | The fraction of cycles during which the FP16 (half-precision) pipeline was active. The value is an average value within a time interval rather than an instantaneous value. A higher value indicates higher utilization of the FP16 cores. A value of 1 (100%) means that an FP16 instruction was executed every two cycles throughout the entire time interval (for example, on Volta-type cards). If the value of this metric is 0.2 (20%), one of the following conditions may exist: The FP16 core utilization of 20% of the SMs within the time interval is 100%. The FP16 core utilization of all SMs within the time interval is 20%. The FP16 core utilization of all SMs within 20% of the time interval is 100%. Other conditions. |
DCGM_FI_PROF_PIPE_FP32_ACTIVE | The fraction of cycles during which the FMA (Fused Multiply-Add) pipeline was active. The FMA operations include both FP32 (single-precision) and integer operations. The value is an average value within a time interval rather than an instantaneous value. A higher value indicates higher utilization of the FP32 cores. A value of 1 (100%) means that an FP32 instruction was executed every two cycles throughout the entire time interval (for example, on Volta-type cards). If the value of this metric is 0.2 (20%), one of the following conditions may exist: The FP32 core utilization of 20% of the SMs within the time interval is 100%. The FP32 core utilization of all SMs within the time interval is 20%. The FP32 core utilization of all SMs within 20% of the time interval is 100%. Other conditions. |
DCGM_FI_PROF_PIPE_FP64_ACTIVE | The fraction of cycles during which the FP64 (double-precision) pipeline was active. The value is an average value within a time interval rather than an instantaneous value. A higher value indicates higher utilization of the FP64 cores. A value of 1 (100%) means that an FP64 instruction was executed every four cycles throughout the entire time interval (for example, on Volta-type cards). If the value of this metric is 0.2 (20%), one of the following conditions may exist: The FP64 core utilization of 20% of the SMs within the time interval is 100%. The FP64 core utilization of 20% of the SMs within the time interval is 100%. The FP64 core utilization of all SMs within 20% of the time interval is 100%. Other conditions. |
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE | The cycle fraction for the Tensor (HMMA/IMMA) pipe being in the Active state. The value is an average value within a time interval rather than an instantaneous value. A larger value of this metric indicates higher tensor core utilization. If the value is 1 (100%), a Tensor instruction is issued every cycle within the entire interval. One instruction completes in two cycles. If the value of this metric is 0.2 (20%), one of the following conditions may exist: The tensor core utilization of 20% of the SMs within the time interval is 100%. The tensor core utilization of all SMs within the time interval is 20%. The tensor core utilization of all SMs within 20% of the time interval is 100%. Other conditions. |
DCGM_FI_PROF_SM_ACTIVE | The ratio of cycles during which at least one warp on an SM remains active. The value is an average of all SMs. The value does not vary with the number of warps included in the thread block. When a warp is scheduled and resources are allocated to the warp, the warp is considered active. In this case, the status of the warp may be Computing or not Computing; for example, it may be waiting for memory requests or in another non-Computing state. If the value of this metric drops below 0.5, the GPU utilization is low. To ensure high GPU utilization, make sure that the value is greater than 0.8. Assume that a GPU has N SMs. If all SMs in N thread blocks run a kernel function within a time interval, the value of this metric is 1 (100%). If N/5 thread blocks run a kernel function within a time interval, the value of this metric is 0.2. If N thread blocks run a kernel function during 20% of the cycle within a time interval, the value of this metric is 0.2. |
DCGM_FI_PROF_SM_OCCUPANCY | The ratio of warps resident on an SM to the maximum number of warps that can reside on that SM, averaged over all SMs within a time interval. A higher occupancy does not necessarily indicate higher GPU utilization. Only in workloads where GPU memory bandwidth is the limiting factor (DCGM_FI_PROF_DRAM_ACTIVE), does a higher occupancy indicate more effective GPU utilization. |
nvidia_gpu_allocated_num_devices | The number of allocated GPU devices. Warning: Will be deprecated in the future. |
nvidia_gpu_memory_allocated_bytes | The full memory of GPU devices. Warning: Will be deprecated in the future, replaced by DCGM_CUSTOM_DEV_FB_allocated. |
nvidia_gpu_sharing_memory | The memory allocated for GPU sharing. Warning: Will be deprecated in the future, DCGM_CUSTOM_DEV_FB_allocated. |
up | The connectivity of metric collection. |
Cost-Exporter (job name: alibaba-cloud-cost-exporter)
Metric | Description |
deducted_by_cash_coupons | The bill discount amount for the current instance. |
deducted_by_prepaid_card | The prepaid card discount amount for the current instance. |
invoice_discount | The discount amount for the current instance. |
list_price | The unit price for the current instance. |
node_current_price | The actual price of the current node. |
node_payAsYouGo_price | The pay-as-you-go price of the current node. |
node_payByPeriod_price | The subscription price of the current node. |
node_spot_price | The spot price of the current node. |
outstanding_amount | The outstanding amount for the current instance. |
payent_amount | The cash payment amount for the current instance. |
pretax_amount | The payable amount for the current instance. |
pretax_gross_amount | The original amount for the current instance. |
usage | The resource usage for the current instance. |
up | The connectivity of metric collection. |
Ingress (job name: arms-ack-ingress, ingress-ask-default)
Metric | Description |
aliyun_prometheus_agent_append_duration_seconds | The duration of the Prometheus agent append operations in seconds. |
aliyun_prometheus_agent_job_discovery_status | The discovery status of the Prometheus agent collection jobs. |
aliyun_prometheus_agent_scrape_custom_error | The number of custom collection errors of the Prometheus agent. |
aliyun_prometheus_agent_scrapes_by_target_total | The total number of scrapes by the Prometheus agent per target. |
aliyun_prometheus_agent_target_info | The target information of the Prometheus agent. |
nginx_ingress_controller_admission_config_size | The size of the NGINX Ingress controller Admission Config. |
nginx_ingress_controller_admission_render_duration | The rendering duration of the NGINX Ingress controller Admission Config. |
nginx_ingress_controller_admission_render_ingresses | The number of Ingresses rendered by the NGINX Ingress controller. |
nginx_ingress_controller_admission_roundtrip_duration | The round-trip processing duration of the NGINX Ingress controller. |
nginx_ingress_controller_admission_tested_duration | The testing duration of the NGINX Ingress controller. |
nginx_ingress_controller_admission_tested_ingresses | The number of Ingresses tested by the NGINX Ingress controller. |
nginx_ingress_controller_build_info | The build information of the NGINX Ingress controller. |
nginx_ingress_controller_bytes_sent_bucket | The distribution of total bytes sent by the NGINX Ingress controller. |
nginx_ingress_controller_bytes_sent_count | The count of total bytes sent by the NGINX Ingress controller. |
nginx_ingress_controller_bytes_sent_sum | The sum of total bytes sent by the NGINX Ingress controller. |
nginx_ingress_controller_check_errors | The number of check errors in the NGINX Ingress controller. |
nginx_ingress_controller_check_success | The number of successful checks in the NGINX Ingress controller. |
nginx_ingress_controller_config_hash | The configuration hash of the NGINX Ingress controller. |
nginx_ingress_controller_config_last_reload_successful | The success status of the last configuration reload in the NGINX Ingress controller. |
nginx_ingress_controller_config_last_reload_successful_timestamp_seconds | The timestamp of the last successful configuration reload in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_connect_duration_seconds_bucket | The distribution of connection durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_connect_duration_seconds_count | The count of connection durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_connect_duration_seconds_sum | The sum of connection durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_errors | The number of errors in the NGINX Ingress controller. |
nginx_ingress_controller_header_duration_seconds_bucket | The distribution of header processing durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_header_duration_seconds_count | The count of header processing durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_header_duration_seconds_sum | The sum of header processing durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_ingress_upstream_latency_seconds | The upstream latency in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_ingress_upstream_latency_seconds_count | The count of upstream latencies in the NGINX Ingress controller. |
nginx_ingress_controller_ingress_upstream_latency_seconds_sum | The sum of upstream latencies in the NGINX Ingress controller. |
nginx_ingress_controller_leader_election_status | The leader election status of the NGINX Ingress controller. |
nginx_ingress_controller_nginx_process_connections | The number of connections in the nginx process of the NGINX Ingress controller. |
nginx_ingress_controller_nginx_process_connections_total | The total number of connections in the nginx process of the NGINX Ingress controller. |
nginx_ingress_controller_nginx_process_cpu_seconds_total | The total CPU utilization in seconds of the nginx process in the NGINX Ingress controller. |
nginx_ingress_controller_nginx_process_num_procs | The number of nginx processes in the NGINX Ingress controller. |
nginx_ingress_controller_nginx_process_oldest_start_time_seconds | The oldest start time in seconds of the nginx process in the NGINX Ingress controller. |
nginx_ingress_controller_nginx_process_read_bytes_total | The total number of bytes read by the nginx process in the NGINX Ingress controller. |
nginx_ingress_controller_nginx_process_requests_total | The total number of requests processed by the nginx process in the NGINX Ingress controller. |
nginx_ingress_controller_nginx_process_resident_memory_bytes | The resident memory size in bytes of the nginx process in the NGINX Ingress controller. |
nginx_ingress_controller_nginx_process_virtual_memory_bytes | The amount of virtual memory that is used by an NGINX process in bytes. |
nginx_ingress_controller_nginx_process_write_bytes_total | The virtual memory size in bytes of the nginx process in the NGINX Ingress controller. |
nginx_ingress_controller_orphan_ingress | The number of orphaned Ingresses in the NGINX Ingress controller. |
nginx_ingress_controller_request_duration_seconds_bucket | The distribution of request durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_request_duration_seconds_count | The count of request durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_request_duration_seconds_sum | The sum of request durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_request_size_bucket | The distribution of request sizes in the NGINX Ingress controller. |
nginx_ingress_controller_request_size_count | The count of request sizes in the NGINX Ingress controller. |
nginx_ingress_controller_request_size_sum | The sum of request sizes in the NGINX Ingress controller. |
nginx_ingress_controller_requests | The total number of requests in the NGINX Ingress controller. |
nginx_ingress_controller_response_duration_seconds_bucket | The distribution of response durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_response_duration_seconds_count | The count of response durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_response_duration_seconds_sum | The sum of response durations in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_response_size_bucket | The distribution of response sizes in the NGINX Ingress controller. |
nginx_ingress_controller_response_size_count | The count of response sizes in the NGINX Ingress controller. |
nginx_ingress_controller_response_size_sum | The sum of response sizes in the NGINX Ingress controller. |
nginx_ingress_controller_ssl_certificate_info | The SSL certificate information in the NGINX Ingress controller. |
nginx_ingress_controller_ssl_expire_time_seconds | The expiration time of the SSL certificate in the NGINX Ingress controller in seconds. |
nginx_ingress_controller_success | The number of successes in the NGINX Ingress controller. |
up | The connectivity of metric collection. |
Koordinator (job name: kube-system, koordlet-metrics-podmonitor, or koord-manager-metrics-service)
Metric | Description |
aliyun_prometheus_agent_append_duration_seconds | The duration of the Prometheus agent append operations in seconds. |
aliyun_prometheus_agent_scrapes_by_target_total | The total number of scrapes by the Prometheus agent per target. |
aliyun_prometheus_agent_target_info | The target information of the Prometheus agent. |
koord_manager_recommender_recommendation_workload_target | The recommended specification metric for workload in the resource profiling feature. |
koordlet_container_resource_limits | The limit metric for container resources. |
koordlet_container_resource_requests | The request metric for container resources. |
koordlet_node_priority_resource_reclaimable | The priority metric for node resources. |
koordlet_node_resource_allocatable | The allocatable resource metric for the node. |
slo_manager_recommender_recommendation_workload_target | The resource specifications that are recommended based on the workload by the resource profiling feature. This metric is discontinued. |
up | The connectivity of metric collection. |
ETCD (job name: etcd)
Metric | Description |
aliyun_prometheus_agent_append_duration_seconds | The duration of the Prometheus agent append operations in seconds. |
aliyun_prometheus_agent_job_discovery_status | The discovery status of the Prometheus agent collection jobs. |
aliyun_prometheus_agent_scrape_custom_error | The number of custom collection errors of the Prometheus agent. |
aliyun_prometheus_agent_scrapes_by_target_total | The total number of scrapes by the Prometheus agent per target. |
aliyun_prometheus_agent_target_info | The target information of the Prometheus agent. |
etcd_cluster_version | The version of the cluster. |
etcd_debugging_auth_revision | The authentication revision number for ETCD debugging. |
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket | The distribution of ETCD debugging disk backend commit rebalance duration in seconds. |
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_count | The count of ETCD debugging disk backend commit rebalance duration in seconds. |
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_sum | The sum of ETCD debugging disk backend commit rebalance duration in seconds. |
etcd_debugging_disk_backend_commit_spill_duration_seconds_bucket | The distribution of ETCD debugging disk backend commit spill duration. |
etcd_debugging_disk_backend_commit_spill_duration_seconds_count | The count of ETCD debugging disk backend commit spill duration. |
etcd_debugging_disk_backend_commit_spill_duration_seconds_sum | The sum of ETCD debugging disk backend commit spill duration. |
etcd_debugging_disk_backend_commit_write_duration_seconds_bucket | The distribution of ETCD debugging disk backend commit write duration in seconds. |
etcd_debugging_disk_backend_commit_write_duration_seconds_count | The count of ETCD debugging disk backend commit write duration in seconds. |
etcd_debugging_disk_backend_commit_write_duration_seconds_sum | The sum of ETCD debugging disk backend commit write duration in seconds. |
etcd_debugging_lease_granted_total | The total number of lease grants in ETCD debugging. |
etcd_debugging_lease_renewed_total | The total number of lease renewals in ETCD debugging. |
etcd_debugging_lease_revoked_total | The total number of lease revocations in ETCD debugging. |
etcd_debugging_lease_ttl_total_bucket | The distribution of lease TTLs in ETCD debugging. |
etcd_debugging_lease_ttl_total_count | The count of lease TTLs in ETCD debugging. |
etcd_debugging_lease_ttl_total_sum | The sum of lease TTLs in ETCD debugging. |
etcd_debugging_mvcc_compact_revision | The compaction revision number for ETCD debugging MVCC. |
etcd_debugging_mvcc_current_revision | The current revision version for ETCD debugging MVCC. |
etcd_debugging_mvcc_db_compaction_keys_total | The total number of keys compressed in the ETCD debugging MVCC database. |
etcd_debugging_mvcc_db_compaction_last | The last compaction time for the ETCD debugging MVCC database. |
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket | The distribution of MVCC database compaction pause durations in milliseconds for ETCD debugging. |
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_count | The count of MVCC database compaction pause durations in milliseconds for ETCD debugging. |
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_sum | The sum of MVCC database compaction pause durations in milliseconds for ETCD debugging. |
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket | The distribution of MVCC database compaction total durations in milliseconds for ETCD debugging. |
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_count | The count of MVCC database compaction total durations in milliseconds for ETCD debugging. |
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_sum | The sum of MVCC database compaction total durations in milliseconds for ETCD debugging. |
etcd_debugging_mvcc_db_total_size_in_bytes | The total size of the MVCC database in bytes for ETCD debugging. |
etcd_debugging_mvcc_delete_total | The total number of delete operations in ETCD debugging MVCC. |
etcd_debugging_mvcc_events_total | The total number of events in ETCD debugging. |
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket | The distribution of MVCC index compaction pause durations in milliseconds for ETCD debugging. |
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_count | The count of MVCC index compaction pause durations in milliseconds for ETCD debugging. |
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_sum | The sum of MVCC index compaction pause durations in milliseconds for ETCD debugging. |
etcd_debugging_mvcc_keys_total | The total number of keys in ETCD debugging MVCC. |
etcd_debugging_mvcc_pending_events_total | The total number of pending events in ETCD debugging MVCC. |
etcd_debugging_mvcc_put_total | The total number of put operations in ETCD debugging MVCC. |
etcd_debugging_mvcc_range_total | The total number of range queries in ETCD MVCC. |
etcd_debugging_mvcc_slow_watcher_total | The total number of slow watchers in ETCD debugging. |
etcd_debugging_mvcc_total_put_size_in_bytes | The total size of MVCC puts in bytes for ETCD debugging. |
etcd_debugging_mvcc_txn_total | The total number of MVCC transactions in ETCD debugging. |
etcd_debugging_mvcc_watch_stream_total | The total number of snapshot streams in ETCD debugging. |
etcd_debugging_mvcc_watcher_total | The total number of watchers in ETCD debugging. |
etcd_debugging_server_lease_expired_total | The total number of expired leases in ETCD debugging. |
etcd_debugging_snap_save_marshalling_duration_seconds_bucket | The distribution of snapshot save marshalling durations in seconds for ETCD debugging. |
etcd_debugging_snap_save_marshalling_duration_seconds_count | The count of snapshot save marshalling durations in seconds for ETCD debugging. |
etcd_debugging_snap_save_marshalling_duration_seconds_sum | The sum of snapshot save marshalling durations in seconds for ETCD debugging. |
etcd_debugging_snap_save_total_duration_seconds_bucket | The distribution of snapshot save durations in seconds for ETCD debugging. |
etcd_debugging_snap_save_total_duration_seconds_count | The count of snapshot save durations in seconds for ETCD debugging. |
etcd_debugging_snap_save_total_duration_seconds_sum | The sum of snapshot save durations in seconds for ETCD debugging. |
etcd_debugging_store_expires_total | The total number of expired items in ETCD debugging storage. |
etcd_debugging_store_reads_total | The total number of reads in ETCD debugging storage. |
etcd_debugging_store_watch_requests_total | The total number of watch requests in ETCD debugging storage. |
etcd_debugging_store_watchers | The total number of watchers in ETCD debugging storage. |
etcd_debugging_store_writes_total | The total number of writes in ETCD debugging storage. |
etcd_disk_backend_commit_duration_seconds_bucket | The distribution of disk backend commit durations in seconds for ETCD. |
etcd_disk_backend_commit_duration_seconds_count | The count of disk backend commit durations in seconds for ETCD. |
etcd_disk_backend_commit_duration_seconds_sum | The sum of disk backend commit durations in seconds for ETCD. |
etcd_disk_backend_defrag_duration_seconds_bucket | The distribution of disk backend defragmentation durations in seconds for ETCD. |
etcd_disk_backend_defrag_duration_seconds_count | The count of disk backend defragmentation durations in seconds for ETCD. |
etcd_disk_backend_defrag_duration_seconds_sum | The sum of disk backend defragmentation durations in seconds for ETCD. |
etcd_disk_backend_snapshot_duration_seconds_bucket | The distribution of disk backend snapshot durations in seconds for ETCD. |
etcd_disk_backend_snapshot_duration_seconds_count | The count of disk backend snapshot durations in seconds for ETCD. |
etcd_disk_backend_snapshot_duration_seconds_sum | The sum of disk backend snapshot durations in seconds for ETCD. |
etcd_disk_defrag_inflight | The number of ongoing disk defragmentations in ETCD. |
etcd_disk_wal_fsync_duration_seconds_bucket | The distribution of WAL sync durations in seconds for ETCD disk. |
etcd_disk_wal_fsync_duration_seconds_count | The count of WAL sync durations in seconds for ETCD disk. |
etcd_disk_wal_fsync_duration_seconds_sum | The sum of WAL sync durations in seconds for ETCD disk. |
etcd_disk_wal_write_bytes_total | The total number of bytes written to the WAL in ETCD disk. |
etcd_grpc_proxy_cache_hits_total | The total number of cache hits in the ETCD gRPC proxy. |
etcd_grpc_proxy_cache_keys_total | The total number of cache keys in the ETCD gRPC proxy. |
etcd_grpc_proxy_cache_misses_total | The total number of cache misses in the ETCD gRPC proxy. |
etcd_grpc_proxy_events_coalescing_total | The total number of event coalescings in the ETCD gRPC proxy. |
etcd_grpc_proxy_watchers_coalescing_total | The total number of watcher coalescings in the ETCD gRPC proxy. |
etcd_mvcc_db_open_read_transactions | The number of open read transactions in the ETCD MVCC database. |
etcd_mvcc_db_total_size_in_bytes | The total size of the MVCC database in bytes for ETCD. |
etcd_mvcc_db_total_size_in_use_in_bytes | The total size in use of the MVCC database in bytes for ETCD. |
etcd_mvcc_delete_total | The total number of deletes in ETCD MVCC. |
etcd_mvcc_hash_duration_seconds_bucket | The distribution of MVCC hash durations in seconds for ETCD. |
etcd_mvcc_hash_duration_seconds_count | The count of MVCC hash durations in seconds for ETCD. |
etcd_mvcc_hash_duration_seconds_sum | The sum of MVCC hash durations in seconds for ETCD. |
etcd_mvcc_hash_rev_duration_seconds_bucket | The distribution of MVCC hash revision durations in seconds for ETCD. |
etcd_mvcc_hash_rev_duration_seconds_count | The count of MVCC hash revision durations in seconds for ETCD. |
etcd_mvcc_hash_rev_duration_seconds_sum | The sum of MVCC hash revision durations in seconds for ETCD. |
etcd_mvcc_put_total | The total number of put operations in ETCD MVCC. |
etcd_mvcc_range_total | The total number of range queries in ETCD MVCC. |
etcd_mvcc_txn_total | The total number of MVCC transactions in ETCD. |
etcd_network_active_peers | The number of active peers in the ETCD network. |
etcd_network_client_grpc_received_bytes_total | The total number of bytes received by the ETCD network client via gRPC. |
etcd_network_client_grpc_sent_bytes_total | The total number of bytes sent by the ETCD network client via gRPC. |
etcd_network_disconnected_peers_total | The total number of disconnected peers in the ETCD network. |
etcd_network_peer_received_bytes_total | The total number of bytes received by the ETCD network peer. |
etcd_network_peer_received_failures_total | The total number of receive failures in the ETCD network peer. |
etcd_network_peer_round_trip_time_seconds_bucket | The distribution of round trip times for the ETCD network peer in seconds. |
etcd_network_peer_round_trip_time_seconds_count | The count of round trip times for the ETCD network peer in seconds. |
etcd_network_peer_round_trip_time_seconds_sum | The sum of round trip times for the ETCD network peer in seconds. |
etcd_network_peer_sent_bytes_total | The total number of bytes sent by the ETCD network peer. |
etcd_network_peer_sent_failures_total | The total number of send failures by the ETCD network peer. |
etcd_network_server_stream_failures_total | The total number of stream failures in the ETCD network server. |
etcd_network_snapshot_receive_inflights_total | The number of concurrent snapshot receive requests in the ETCD network. |
etcd_network_snapshot_receive_success | The number of successful snapshot receives in the ETCD network. |
etcd_network_snapshot_receive_total_duration_seconds_bucket | The distribution of snapshot receive durations in seconds for the ETCD network. |
etcd_network_snapshot_receive_total_duration_seconds_count | The count of snapshot receive durations in seconds for the ETCD network. |
etcd_network_snapshot_receive_total_duration_seconds_sum | The sum of snapshot receive durations in seconds for the ETCD network. |
etcd_network_snapshot_send_inflights_total | The number of concurrent snapshot send requests in the ETCD network. |
etcd_network_snapshot_send_success | The number of successful snapshot sends in the ETCD network. |
etcd_network_snapshot_send_total_duration_seconds_bucket | The distribution of snapshot send durations in seconds for the ETCD network. |
etcd_network_snapshot_send_total_duration_seconds_count | The count of snapshot send durations in seconds for the ETCD network. |
etcd_network_snapshot_send_total_duration_seconds_sum | The sum of snapshot send durations in seconds for the ETCD network. |
etcd_server_apply_duration_seconds_bucket | The distribution of application durations in seconds for the ETCD server. |
etcd_server_apply_duration_seconds_count | The count of application durations in seconds for the ETCD server. |
etcd_server_apply_duration_seconds_sum | The sum of application durations in seconds for the ETCD server. |
etcd_server_client_requests_total | The total number of client requests to the ETCD server. |
etcd_server_go_version | The Go version of the ETCD server. |
etcd_server_has_leader | Indicates whether a leader exists in the ETCD server. |
etcd_server_health_failures | The number of health check failures in the ETCD server. |
etcd_server_health_success | The number of successful health checks in the ETCD server. |
etcd_server_heartbeat_send_failures_total | The total number of heartbeat send failures in the ETCD server. |
etcd_server_id | The ID of the ETCD server. |
etcd_server_is_leader | Indicates whether the ETCD server is a leader. |
etcd_server_is_learner | Indicates whether the ETCD server is a learner. |
etcd_server_leader_changes_seen_total | The total number of leader changes witnessed by the ETCD server. |
etcd_server_learner_promote_successes | The number of successful learner promotions in the ETCD server. |
etcd_server_proposals_applied_total | The total number of applied proposals in the ETCD server. |
etcd_server_proposals_committed_total | The total number of committed proposals in the ETCD server. |
etcd_server_proposals_failed_total | The total number of failed proposals in the ETCD server. |
etcd_server_proposals_pending | The total number of pending proposals in the ETCD server. |
etcd_server_quota_backend_bytes | The backend storage quota in bytes for the ETCD server. |
etcd_server_read_indexes_failed_total | The total number of read index failures in the ETCD server. |
etcd_server_slow_apply_total | The total number of slow applications in the ETCD server. |
etcd_server_slow_read_indexes_total | The total number of slow read indexes in the ETCD server. |
etcd_server_snapshot_apply_in_progress_total | The total number of snapshots being applied in the ETCD server. |
etcd_server_version | The version of the ETCD server. |
etcd_snap_db_fsync_duration_seconds_bucket | The distribution of ETCD snapshot database fsync durations in seconds. |
etcd_snap_db_fsync_duration_seconds_count | The count of ETCD snapshot database fsync durations in seconds. |
etcd_snap_db_fsync_duration_seconds_sum | The sum of ETCD snapshot database fsync durations in seconds. |
etcd_snap_db_save_total_duration_seconds_bucket | The distribution of ETCD snapshot database save durations in seconds. |
etcd_snap_db_save_total_duration_seconds_count | The count of ETCD snapshot database save durations in seconds. |
etcd_snap_db_save_total_duration_seconds_sum | The sum of ETCD snapshot database save durations in seconds. |
etcd_snap_fsync_duration_seconds_bucket | The distribution of ETCD snapshot fsync durations in seconds. |
etcd_snap_fsync_duration_seconds_count | The count of ETCD snapshot fsync durations in seconds. |
etcd_snap_fsync_duration_seconds_sum | The sum of ETCD snapshot fsync durations in seconds. |
grpc_server_handled_total | The total number of requests handled by the gRPC server. |
grpc_server_msg_received_total | The total number of requests received by the gRPC server. |
grpc_server_msg_sent_total | The total number of requests sent by the gRPC server. |
grpc_server_started_total | The total number of times the gRPC server has started. |
os_fd_limit | The file descriptor limit of the operating system. |
os_fd_used | The number of file descriptors used by the operating system. |
up | The connectivity of metric collection. |
Scheduler (job name: ack-scheduler)
Metric | Description |
aggregator_discovery_aggregation_count_total | The count of discovery aggregations performed by the aggregator. |
aliyun_prometheus_agent_append_duration_seconds | The duration of the Prometheus agent append operations in seconds. |
aliyun_prometheus_agent_job_discovery_status | The discovery status of the Prometheus agent collection jobs. |
aliyun_prometheus_agent_scrape_custom_error | The number of custom collection errors of the Prometheus agent. |
aliyun_prometheus_agent_scrapes_by_target_total | The total number of scrapes by the Prometheus agent per target. |
aliyun_prometheus_agent_target_info | The target information of the Prometheus agent. |
apiserver_audit_event_total | The total number of APIServer audit events. |
apiserver_audit_requests_rejected_total | The total number of APIServer audit request rejections. |
apiserver_client_certificate_expiration_seconds_bucket | The distribution of remaining seconds until APIServer client certificate expiration. |
apiserver_client_certificate_expiration_seconds_count | The count of remaining seconds until APIServer client certificate expiration. |
apiserver_client_certificate_expiration_seconds_sum | The sum of remaining seconds until APIServer client certificate expiration. |
apiserver_delegated_authn_request_duration_seconds_bucket | The distribution of delegated authentication request durations in seconds for the APIServer. |
apiserver_delegated_authn_request_duration_seconds_count | The count of delegated authentication request durations in seconds for the APIServer. |
apiserver_delegated_authn_request_duration_seconds_sum | The sum of delegated authentication request durations in seconds for the APIServer. |
apiserver_delegated_authn_request_total | The total number of delegated authentication requests for the APIServer. |
apiserver_delegated_authz_request_duration_seconds_bucket | The distribution of delegated authorization request durations in seconds for the APIServer. |
apiserver_delegated_authz_request_duration_seconds_count | The count of delegated authorization request durations in seconds for the APIServer. |
apiserver_delegated_authz_request_duration_seconds_sum | The sum of delegated authorization request durations in seconds for the APIServer. |
apiserver_delegated_authz_request_total | The total number of delegated authorization requests to the API server. |
apiserver_encryption_config_controller_automatic_reload_failures_total | The total number of automatic reload failures for the APIServer encryption configuration controller. |
apiserver_encryption_config_controller_automatic_reload_success_total | The total number of successful automatic reloads for the APIServer encryption configuration controller. |
apiserver_envelope_encryption_dek_cache_fill_percent | The percentage of envelope encryption data encryption keys (DEKs) cache fill for the APIServer. |
apiserver_storage_data_key_generation_duration_seconds_bucket | The distribution of data key generation durations for the APIServer storage. |
apiserver_storage_data_key_generation_duration_seconds_count | The count of data key generation durations for the APIServer storage. |
apiserver_storage_data_key_generation_duration_seconds_sum | The sum of data key generation durations for the APIServer storage. |
apiserver_storage_data_key_generation_failures_total | The total number of data key generation failures for the APIServer storage. |
apiserver_storage_envelope_transformation_cache_misses_total | The total number of envelope transformation cache misses for the APIServer storage. |
apiserver_webhooks_x509_insecure_sha1_total | The total count of insecure SHA1 usage in X509 certificates for APIServer Webhooks. |
apiserver_webhooks_x509_missing_san_total | The total count of missing SANs in X509 certificates for APIServer Webhooks. |
authenticated_user_requests | The number of authenticated user requests. |
authentication_attempts | The number of authentication attempts. |
authentication_duration_seconds_bucket | The distribution of authentication durations in seconds. |
authentication_duration_seconds_count | The count of authentication durations in seconds. |
authentication_duration_seconds_sum | The sum of authentication durations in seconds. |
authentication_token_cache_active_fetch_count | The count of active fetches for the authentication token cache. |
authentication_token_cache_fetch_total | The total number of fetches for the authentication token cache. |
authentication_token_cache_request_duration_seconds_bucket | The distribution of request durations in seconds for the authentication token cache. |
authentication_token_cache_request_duration_seconds_count | The count of request durations in seconds for the authentication token cache. |
authentication_token_cache_request_duration_seconds_sum | The sum of request durations in seconds for the authentication token cache. |
authentication_token_cache_request_total | The total number of requests for the authentication token cache. |
authorization_attempts_total | The total number of authorization attempts. |
authorization_duration_seconds_bucket | The distribution of authorization durations in seconds. |
authorization_duration_seconds_count | The count of authorization durations in seconds. |
authorization_duration_seconds_sum | The sum of authorization durations in seconds. |
cardinality_enforcement_unexpected_categorizations_total | The total number of unexpected categorizations during cardinality enforcement. |
kubernetes_build_info | The Kubernetes build information. |
kubernetes_feature_enabled | The Kubernetes enabled features. |
leader_election_master_status | The master status of leader election. |
registered_metric_total | The total number of registered metrics. |
registered_metrics_total | The total number of registered metrics. |
rest_client_exec_plugin_certificate_rotation_age_bucket | The distribution of certificate rotation age for REST client exec plugin. |
rest_client_exec_plugin_certificate_rotation_age_count | The count of certificate rotation age for REST client exec plugin. |
rest_client_exec_plugin_certificate_rotation_age_sum | The sum of certificate rotation age for REST client exec plugin. |
rest_client_rate_limiter_duration_seconds_bucket | The distribution of rate limiter durations for REST client. |
rest_client_rate_limiter_duration_seconds_count | The count of rate limiter durations for REST client. |
rest_client_rate_limiter_duration_seconds_sum | The sum of rate limiter durations for REST client. |
rest_client_request_duration_seconds_bucket | The distribution of request durations in seconds for REST client. |
rest_client_request_duration_seconds_count | The count of request durations in seconds for REST client. |
rest_client_request_duration_seconds_sum | The sum of request durations in seconds for REST client. |
rest_client_request_retries_total | The total number of request retries for REST client. |
rest_client_request_size_bytes_bucket | The distribution of request sizes in bytes for REST client. |
rest_client_request_size_bytes_count | The count of request sizes in bytes for REST client. |
rest_client_request_size_bytes_sum | The sum of request sizes in bytes for REST client. |
rest_client_requests_total | The total number of requests for REST client. |
rest_client_response_size_bytes_bucket | The distribution of response sizes in bytes for REST client. |
rest_client_response_size_bytes_count | The count of response sizes in bytes for REST client. |
rest_client_response_size_bytes_sum | The sum of response sizes in bytes for REST client. |
rest_client_transport_cache_entries | The number of transport cache entries for REST client. |
rest_client_transport_create_calls_total | The total number of transport create calls for REST client. |
scheduler_binding_duration_seconds_bucket | The distribution of binding durations in seconds for the scheduler. |
scheduler_binding_duration_seconds_count | The count of binding durations in seconds for the scheduler. |
scheduler_binding_duration_seconds_sum | The sum of binding durations in seconds for the scheduler. |
scheduler_e2e_scheduling_duration_seconds_bucket | The distribution of end-to-end scheduling durations for the scheduler. |
scheduler_e2e_scheduling_duration_seconds_count | The count of end-to-end scheduling durations for the scheduler. |
scheduler_e2e_scheduling_duration_seconds_sum | The sum of end-to-end scheduling durations for the scheduler. |
scheduler_framework_extension_point_duration_seconds_bucket | The distribution of extension point durations for the scheduler framework. |
scheduler_framework_extension_point_duration_seconds_count | The count of extension point durations for the scheduler framework. |
scheduler_framework_extension_point_duration_seconds_sum | The sum of extension point durations for the scheduler framework. |
scheduler_goroutines | The number of goroutines for the scheduler. |
scheduler_pending_pods | The number of pending pods for the scheduler. |
scheduler_plugin_evaluation_total | The total number of plugin evaluations for the scheduler. |
scheduler_plugin_execution_duration_seconds_bucket | The distribution of execution durations in seconds for the scheduler plugins. |
scheduler_plugin_execution_duration_seconds_count | The count of execution durations in seconds for the scheduler plugins. |
scheduler_plugin_execution_duration_seconds_sum | The sum of execution durations in seconds for the scheduler plugins. |
scheduler_pod_preemption_victims_bucket | The distribution of preemption victims for the scheduler. |
scheduler_pod_preemption_victims_count | The count of preemption victims for the scheduler. |
scheduler_pod_preemption_victims_sum | The sum of preemption victims for the scheduler. |
scheduler_pod_scheduling_attempts_bucket | The distribution of pod scheduling attempts for the scheduler. |
scheduler_pod_scheduling_attempts_count | The count of pod scheduling attempts for the scheduler. |
scheduler_pod_scheduling_attempts_sum | The sum of pod scheduling attempts for the scheduler. |
scheduler_pod_scheduling_duration_seconds_bucket | The distribution of pod scheduling durations in seconds for the scheduler. |
scheduler_pod_scheduling_duration_seconds_count | The count of pod scheduling durations in seconds for the scheduler. |
scheduler_pod_scheduling_duration_seconds_sum | The sum of pod scheduling durations in seconds for the scheduler. |
scheduler_pod_scheduling_sli_duration_seconds_bucket | The distribution of SLI durations for pod scheduling. |
scheduler_pod_scheduling_sli_duration_seconds_count | The count of SLI durations for pod scheduling. |
scheduler_pod_scheduling_sli_duration_seconds_sum | The sum of SLI durations for pod scheduling. |
scheduler_preemption_attempts_total | The total number of preemption attempts for the scheduler. |
scheduler_preemption_victims_bucket | The distribution of preemption victims for the scheduler. |
scheduler_preemption_victims_count | The count of preemption victims for the scheduler. |
scheduler_preemption_victims_sum | The sum of preemption victims for the scheduler. |
scheduler_queue_incoming_pods_total | The total number of incoming pods for the scheduler. |
scheduler_schedule_attempts_total | The total number of scheduling attempts for the scheduler. |
scheduler_scheduler_cache_size | The scheduler cache size. |
scheduler_scheduler_goroutines | The number of goroutines for the scheduler. |
scheduler_scheduling_algorithm_duration_seconds_bucket | The distribution of scheduling algorithm durations in seconds. |
scheduler_scheduling_algorithm_duration_seconds_count | The count of scheduling algorithm durations in seconds. |
scheduler_scheduling_algorithm_duration_seconds_sum | The sum of scheduling algorithm durations in seconds. |
scheduler_scheduling_algorithm_predicate_evaluation_seconds_bucket | The distribution of predicate evaluation seconds for the scheduling algorithm. |
scheduler_scheduling_algorithm_predicate_evaluation_seconds_count | The count of predicate evaluation seconds for the scheduling algorithm. |
scheduler_scheduling_algorithm_predicate_evaluation_seconds_sum | The sum of predicate evaluation seconds for the scheduling algorithm. |
scheduler_scheduling_algorithm_preemption_evaluation_seconds_bucket | The distribution of preemption evaluation seconds for the scheduling algorithm. |
scheduler_scheduling_algorithm_preemption_evaluation_seconds_count | The count of preemption evaluation seconds for the scheduling algorithm. |
scheduler_scheduling_algorithm_preemption_evaluation_seconds_sum | The sum of preemption evaluation seconds for the scheduling algorithm. |
scheduler_scheduling_algorithm_priority_evaluation_seconds_bucket | The distribution of priority evaluation durations in seconds for the scheduling algorithm. |
scheduler_scheduling_algorithm_priority_evaluation_seconds_count | The count of priority evaluation durations in seconds for the scheduling algorithm. |
scheduler_scheduling_algorithm_priority_evaluation_seconds_sum | The sum of priority evaluation durations in seconds for the scheduling algorithm. |
scheduler_scheduling_attempt_duration_seconds_bucket | The distribution of scheduling attempt durations. |
scheduler_scheduling_attempt_duration_seconds_count | The count of scheduling attempt durations. |
scheduler_scheduling_attempt_duration_seconds_sum | The sum of scheduling attempt durations. |
scheduler_scheduling_duration_seconds | The distribution of scheduling durations in seconds. |
scheduler_scheduling_duration_seconds_count | The count of scheduling durations in seconds. |
scheduler_scheduling_duration_seconds_sum | The sum of scheduling durations in seconds. |
scheduler_total_preemption_attempts | The total number of preemption attempts by the scheduler. |
scheduler_unschedulable_pods | The number of unscheduled pods by the scheduler. |
scheduler_volume_scheduling_duration_seconds_bucket | The distribution of volume scheduling durations in seconds. |
scheduler_volume_scheduling_duration_seconds_count | The count of volume scheduling durations in seconds. |
scheduler_volume_scheduling_duration_seconds_sum | The sum of volume scheduling durations in seconds. |
scheduler_volume_scheduling_stage_error_total | The number of errors that are returned during volume scheduling. |
scrape_duration_seconds | The scrape duration in seconds. |
scrape_samples_post_metric_relabeling | The number of scraped samples after metric relabeling. |
scrape_samples_scraped | The number of scraped samples. |
scrape_series_added | The number of new series added during the scrape. |
up | The connectivity of metric collection. |
workqueue_adds_total | The total number of additions to the work queue. |
workqueue_depth | The work queue depth. |
workqueue_longest_running_processor_seconds | The longest running processor duration in seconds for the work queue. |
workqueue_queue_duration_seconds_bucket | The distribution of queue durations in seconds for the work queue. |
workqueue_queue_duration_seconds_count | The count of queue durations in seconds for the work queue. |
workqueue_queue_duration_seconds_sum | The sum of queue durations in seconds for the work queue. |
workqueue_retries_total | The total number of retries in the work queue. |
workqueue_unfinished_work_seconds | The unfinished work duration in seconds for the work queue. |
workqueue_work_duration_seconds_bucket | The distribution of work durations for the work queue. |
workqueue_work_duration_seconds_count | The count of work durations for the work queue. |
workqueue_work_duration_seconds_sum | The sum of work durations for the work queue. |