All Products
Search
Document Center

Managed Service for Prometheus:Container cluster basic metrics

Last Updated:Dec 16, 2024

This topic describes the basic metrics that Managed Service for Prometheus provides for container clusters.

Note
  • Managed Service for Prometheus charges fees based on the amount of written observable data or the number of data reports. The metrics are classified into two types: basic metrics and custom metrics. Custom metrics refer to non-basic metrics. Basic metrics are free of charge. You are charged for custom metrics starting from January 6, 2020.

  • Managed Service for Prometheus will change the basic metrics provided for container clusters from 00:00:00 on November 12, 2024 (UTC+8). The following tables describe the new basic metrics.

The default set of basic metrics collected for container clusters is confined to the range specified in this topic.

Metrics that fall outside the scope of this topic are considered custom metrics and are subject to charges. For more information, see Billing overview.

cAdvisor (job name: _arms/kubelet/cadvisor)

Metric

Description

container_cpu_usage_seconds_total

The total CPU time consumed by the container in seconds.

container_fs_usage_bytes

The number of bytes used by the container file system.

container_memory_cache

The memory cache size of the container in bytes.

container_memory_usage_bytes

The amount of memory used by the container in bytes.

container_memory_working_set_bytes

The memory working set size (WSS) of the container in bytes.

container_network_receive_bytes_total

The total network traffic received by the container in bytes.

container_network_transmit_bytes_total

The total network traffic transmitted by the container in bytes.

container_scrape_error

The number of container metric scraping errors.

DCGM_CUSTOM_CONTAINER_CP_ALLOCATED

The ratio of the GPU computing power allocated to the container to the total computing power of the GPU. The value ranges from 0 to 1. In exclusive GPU mode or in shared GPU mode in which the container requests only GPU memory, the value of this metric is 0, which indicates that the allocation of GPU computing power is unlimited. For example, if a GPU provides a total of 100 compute units (CUs) of GPU computing power and allocates 30 CUs to a container, the ratio of the GPU computing power allocated to the container is calculated by using the following formula: 30/100 = 0.3.

DCGM_CUSTOM_CONTAINER_MEM_ALLOCATED

The amount of GPU memory allocated to the container.

DCGM_CUSTOM_DEV_FB_ALLOCATED

The ratio of the allocated GPU memory to the total memory of the GPU. The value ranges from 0 to 1.

DCGM_CUSTOM_DEV_FB_TOTAL

The total memory of the GPU.

DCGM_CUSTOM_DEV_HEALTH

The health status of the GPU.

DCGM_CUSTOM_PROCESS_DECODE_UTIL

The decoder utilization of GPU threads.

DCGM_CUSTOM_PROCESS_ENCODE_UTIL

The encoder utilization of GPU threads.

DCGM_CUSTOM_PROCESS_MEM_COPY_UTIL

The memory copy utilization of GPU threads.

DCGM_CUSTOM_PROCESS_MEM_USED

The amount of GPU memory used by GPU threads.

DCGM_CUSTOM_PROCESS_SM_UTIL

The streaming multiprocessor (SM) utilization of GPU threads.

DCGM_CUSTOM_PROF_MEM_BANDWIDTH_USED

The GPU memory bandwidth used.

DCGM_CUSTOM_PROF_TENS_TFPS_USED

The tensor core utilization.

DCGM_FI_DEV_DEC_UTIL

The decoder utilization.

DCGM_FI_DEV_ENC_UTIL

The encoder utilization.

DCGM_FI_DEV_FB_FREE

The amount of free frame buffer memory.

DCGM_FI_DEV_FB_USED

The amount of used frame buffer memory. The value of this metric is the same as the value of Memory-Usage returned by the nvidia-smi command.

DCGM_FI_DEV_GPU_TEMP

The GPU temperature.

DCGM_FI_DEV_GPU_UTIL

The GPU utilization within a cycle of 1 second or 1/6 second. The cycle varies based on the GPU model. A cycle is a period of time during which one or more kernel functions remain active. This metric only indicates that one or more kernel functions are occupying GPU resources. The metric does not display detailed GPU usage information.

DCGM_FI_DEV_MEM_CLOCK

The memory clock speed.

DCGM_FI_DEV_MEM_COPY_UTIL

The memory bandwidth utilization. For example, the maximum memory bandwidth of NVIDIA V100 is 900 GB/s. If the memory bandwidth used is 450 GB/s, the memory bandwidth utilization is 50%.

DCGM_FI_DEV_POWER_USAGE

The power usage.

DCGM_FI_DEV_SM_CLOCK

The SM clock speed.

DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION

The total energy consumed since the driver was last loaded.

DCGM_FI_DEV_XID_ERRORS

The last XID error that occurred within a period of time.

DCGM_FI_PROF_DRAM_ACTIVE

The cycle fraction for memory bandwidth utilization when sending data to device memory or receiving data from device memory.

The value is an average value within a time interval rather than an instantaneous value.

A larger value of this metric indicates higher device memory utilization.

If the value is 1 (100%), a DRAM command is executed every cycle within the entire interval. The peak value of the metric can reach 0.8 (80%).

If the value of this metric is 0.2 (20%), 20% of the cycles within the time interval are spent reading from or writing to device memory.

DCGM_FI_PROF_NVLINK_RX_BYTES

The TX rate of NVLink and the RX rate of NVLink. The bytes transmitted or received exclude the header.

The value is an average value within a time interval rather than an instantaneous value.

For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum NVLink Gen2 bandwidth is 25 GB/s per direction per link.

DCGM_FI_PROF_NVLINK_TX_BYTES

The total number of bytes sent through NVLink.

DCGM_FI_PROF_PCIE_RX_BYTES

The TX rate of PCle and the RX rate of PCIe. The bytes transmitted or received include both the header and payload.

The value is an average value within a time interval rather than an instantaneous value.

For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum PCIe Gen3 bandwidth is 985 MB/s per lane.

DCGM_FI_PROF_PCIE_TX_BYTES

The TX rate of PCle and the RX rate of PCIe. The bytes transmitted or received include both the header and payload.

The value is an average value within a time interval rather than an instantaneous value.

For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum PCIe Gen3 bandwidth is 985 MB/s per lane.

DCGM_FI_PROF_PIPE_TENSOR_ACTIVE

The cycle fraction for the Tensor (HMMA/IMMA) pipe being in the Active state.

The value is an average value within a time interval rather than an instantaneous value.

A larger value of this metric indicates higher tensor core utilization.

If the value is 1 (100%), a Tensor instruction is issued every cycle within the entire interval. One instruction completes in two cycles.

If the value of this metric is 0.2 (20%), one of the following conditions may exist:

The tensor core utilization of 20% of the SMs within the time interval is 100%.

The tensor core utilization of all SMs within the time interval is 20%.

The tensor core utilization of all SMs within 20% of the time interval is 100%.

Other conditions.

DCGM_FI_PROF_SM_ACTIVE

The ratio of cycles during which at least one warp on an SM remains active. The value is an average of all SMs. The value does not vary with the number of warps included in the thread block. When a warp is scheduled and resources are allocated to the warp, the warp is considered active. In this case, the status of the warp may be Computing or not Computing; for example, it may be waiting for memory requests or in another non-Computing state. If the value of this metric drops below 0.5, the GPU utilization is low. To ensure high GPU utilization, make sure that the value is greater than 0.8. Assume that a GPU has N SMs. If all SMs in N thread blocks run a kernel function within a time interval, the value of this metric is 1 (100%). If N/5 thread blocks run a kernel function within a time interval, the value of this metric is 0.2. If N thread blocks run a kernel function during 20% of the cycle within a time interval, the value of this metric is 0.2.

machine_cpu_cores

The number of CPU cores on the machine.

machine_memory_bytes

The machine memory in bytes.

node_exporter_build_info

The build information about the node exporter.

nvidia_gpu_duty_cycle

The percentage of time over the past sample period during which the NVIDIA GPU was occupied.

nvidia_gpu_memory_total_bytes

The total memory of the NVIDIA GPU in bytes.

nvidia_gpu_memory_used_bytes

The memory used by the NVIDIA GPU in bytes.

nvidia_gpu_num_devices

The number of NVIDIA GPUs.

nvidia_gpu_power_usage_milliwatts

The power consumption of the NVIDIA GPU in milliwatts.

nvidia_gpu_temperature_celsius

The temperature of the NVIDIA GPU in °C.

rdma_service_monitor_local_ack_timeout_err

The number of timeout errors that occurred in the remote direct memory access (RDMA) network.

rdma_service_monitor_out_of_seq

The number of out-of-order packets in the RDMA network.

rdma_service_monitor_packet_seq_err

The number of out-of-order packet errors in the RDMA network.

rdma_service_monitor_rx_bytes

The throughput received over the RDMA network in bytes.

rdma_service_monitor_rx_packets

The number of packets received over the RDMA network.

rdma_service_monitor_tx_bytes

The throughput sent over the RDMA network in bytes.

rdma_service_monitor_tx_packets

The number of packets sent over the RDMA network.

up

The connectivity of metric collection.

ACK ControlPlane APIServer (Control plane components for ACK Pro clusters: APIServer, ETCD, Scheduler, Kube Controller Manager, and Cloud Controller Manager as well as Control plane component for ACK dedicated clusters: APIServer) (job name: apiserver)

Metric

Description

aggregator_discovery_aggregation_count_total

The count of discovery aggregations performed by the aggregator.

aggregator_openapi_v2_regeneration_count

The number of regenerations based on OpenAPI 2.0.

aggregator_openapi_v2_regeneration_duration

The amount of time consumed for regenerations based on OpenAPI 2.0.

aggregator_unavailable_apiservice

The APIServices that are unavailable to the aggregator.

aggregator_unavailable_apiservice_count

The count of APIServices that are unavailable to the aggregator.

aggregator_unavailable_apiservice_total

The total number of APIServices that are unavailable to the aggregator.

aliyun_prometheus_agent_append_duration_seconds

The additional time spent by the Prometheus agent in seconds.

aliyun_prometheus_agent_job_discovery_status

The job status that is discovered by the Prometheus agent.

aliyun_prometheus_agent_scrapes_by_target_total

The total number of target scrapes performed by the Prometheus agent.

aliyun_prometheus_agent_target_info

The information about targets scraped by the Prometheus agent.

apiextensions_apiserver_validation_ratcheting_seconds_bucket

The distribution of incremental time intervals for validation in seconds in the APIServer.

apiextensions_apiserver_validation_ratcheting_seconds_count

The count of incremental time intervals for validation in seconds in the APIServer.

apiextensions_apiserver_validation_ratcheting_seconds_sum

The sum of incremental time intervals for validation in seconds in the APIServer.

apiextensions_openapi_v2_regeneration_count

The number of API extension regenerations based on OpenAPI 2.0.

apiextensions_openapi_v3_regeneration_count

The number of API extension regenerations based on OpenAPI 3.0.

apiserver_accepted_listall_requests_total

The total number of ListAll requests accepted by the APIServer.

apiserver_admission_controller_admission_duration_seconds_bucket

The distribution of APIServer admission controller durations in seconds.

apiserver_admission_controller_admission_duration_seconds_count

The count of APIServer admission controller durations in seconds.

apiserver_admission_controller_admission_duration_seconds_sum

The sum of APIServer admission controller durations in seconds.

apiserver_admission_step_admission_duration_seconds_bucket

The distribution of APIServer admission step durations in seconds.

apiserver_admission_step_admission_duration_seconds_count

The count of APIServer admission step durations per second.

apiserver_admission_step_admission_duration_seconds_sum

The sum of APIServer admission step durations in seconds.

apiserver_admission_step_admission_duration_seconds_summary

The summary of APIServer admission step durations in seconds.

apiserver_admission_step_admission_duration_seconds_summary_count

The summary count of APIServer admission step durations in seconds.

apiserver_admission_step_admission_duration_seconds_summary_sum

The summary total of APIServer admission step durations in seconds.

apiserver_admission_webhook_admission_duration_seconds_bucket

The distribution of APIServer admission webhook durations in seconds.

apiserver_admission_webhook_admission_duration_seconds_count

The count of APIServer admission webhook durations in seconds.

apiserver_admission_webhook_admission_duration_seconds_sum

The sum of APIServer admission webhook durations in seconds.

apiserver_admission_webhook_fail_open_count

The count of times that the APIServer admission webhook is configured as fail open.

apiserver_admission_webhook_rejection_count

The count of requests rejected by the APIServer admission webhook.

apiserver_admission_webhook_request_total

The total number of requests to the APIServer admission webhook.

apiserver_audit_error_total

The total number of APIServer audit errors.

apiserver_audit_event_total

The total number of APIServer audit events.

apiserver_audit_level_total

The total number of APIServer audit levels.

apiserver_audit_requests_rejected_total

The total number of rejected APIServer requests.

apiserver_authorization_decisions_total

The total number of authorization decisions made by the APIServer.

apiserver_cache_list_fetched_objects_total

The total number of objects obtained by the APIServer cache list.

apiserver_cache_list_returned_objects_total

The total number of objects returned by the APIServer cache list.

apiserver_cache_list_total

The total number of operations performed by the APIServer cache list.

apiserver_cacher_received_events

The number of events received by the APIServer cache.

apiserver_cacher_sended_events_latency_milliseconds_bucket

The distribution of APIServer event sending latencies in milliseconds.

apiserver_cacher_sended_events_latency_milliseconds_count

The count of APIServer event sending latencies in milliseconds.

apiserver_cacher_sended_events_latency_milliseconds_sum

The total of APIServer event sending latencies in milliseconds.

apiserver_cacher_watcher_channel_length

The watcher channel length of the APIServer cache.

apiserver_cel_compilation_duration_seconds_bucket

The distribution of APIServer Common Expression Language (CEL) compilation latencies in seconds.

apiserver_cel_compilation_duration_seconds_count

The count of APIServer CEL compilations.

apiserver_cel_compilation_duration_seconds_sum

The total time consumed for APIServer CEL compilations in seconds.

apiserver_cel_evaluation_duration_seconds_bucket

The distribution of APIServer CEL evaluation latencies in seconds.

apiserver_cel_evaluation_duration_seconds_count

The count of APIServer CEL evaluations.

apiserver_cel_evaluation_duration_seconds_sum

The total of APIServer CEL evaluation latencies in seconds.

apiserver_client_certificate_expiration_seconds_bucket

The distribution of remaining seconds until APIServer client certificate expiration.

apiserver_client_certificate_expiration_seconds_count

The count of remaining seconds until APIServer client certificate expiration.

apiserver_client_certificate_expiration_seconds_sum

The total remaining seconds until APIServer client certificate expiration.

apiserver_clusterip_repair_ip_errors_total

The total number of ClusterIP errors fixed by the APIServer.

apiserver_clusterip_repair_reconcile_errors_total

The total number of ClusterIP reconcile errors fixed by the APIServer.

apiserver_conversion_webhook_duration_seconds_bucket

The distribution of APIServer conversion webhook latencies in seconds.

apiserver_conversion_webhook_duration_seconds_count

The count of APIServer conversion webhook calls.

apiserver_conversion_webhook_duration_seconds_sum

The total of APIServer conversion webhook latencies in seconds.

apiserver_conversion_webhook_request_total

The total number of APIServer conversion webhook requests.

apiserver_crd_conversion_webhook_duration_seconds_bucket

The distribution of APIServer Custom Resource Definition (CRD) conversion webhook latencies in seconds.

apiserver_crd_conversion_webhook_duration_seconds_count

The count of APIServer CRD conversion webhook calls.

apiserver_crd_conversion_webhook_duration_seconds_sum

The total of APIServer CRD conversion webhook latencies in seconds.

apiserver_crd_webhook_conversion_duration_seconds_bucket

The distribution of APIServer CRD webhook conversion latencies in seconds.

apiserver_crd_webhook_conversion_duration_seconds_count

The count of APIServer CRD webhook conversions.

apiserver_crd_webhook_conversion_duration_seconds_sum

The total of APIServer CRD webhook conversion latencies in seconds.

apiserver_created_watchers

The number of watchers created by the APIServer.

apiserver_current_inflight_requests

The number of requests that are being processed by the APIServer.

apiserver_current_inqueue_requests

The maximum number of queued requests in the APIServer.

apiserver_dropped_requests_total

The total number of requests dropped by the APIServer.

apiserver_encryption_config_controller_automatic_reload_failures_total

The number of times that the encryption configuration controller of the APIServer failed to be automatically reloaded.

apiserver_encryption_config_controller_automatic_reload_success_total

The number of times that the encryption configuration controller of the APIServer was automatically reloaded.

apiserver_envelope_encryption_dek_cache_fill_percent

The percentage of APIServer envelope encryption Data Encryption Key (DEK) cache filled.

apiserver_error_watchers

The number of watchers in the Error state in the APIServer.

apiserver_flowcontrol_current_executing_requests

The number of requests being processed by APIServer rate limiting.

apiserver_flowcontrol_current_executing_seats

The number of seats occupied by APIServer rate limiting.

apiserver_flowcontrol_current_inqueue_requests

The number of requests pending in queues in the APF system.

apiserver_flowcontrol_current_inqueue_seats

The number of seats pending in APIServer rate limiting queues.

apiserver_flowcontrol_current_limit_seats

The number of seats limited by APIServer rate limiting.

apiserver_flowcontrol_current_r

The current R value of APIServer rate limiting.

apiserver_flowcontrol_demand_seats_average

The average number of seats requested by APIServer rate limiting.

apiserver_flowcontrol_demand_seats_bucket

The distribution of seats requested by APIServer rate limiting.

apiserver_flowcontrol_demand_seats_count

The count of seats requested by APIServer rate limiting.

apiserver_flowcontrol_demand_seats_high_watermark

The high watermark of seats requested by APIServer rate limiting.

apiserver_flowcontrol_demand_seats_smoothed

The smoothed value of seats requested by APIServer rate limiting.

apiserver_flowcontrol_demand_seats_stdev

The standard deviation of seats requested by APIServer rate limiting.

apiserver_flowcontrol_demand_seats_sum

The sum of seats requested by APIServer rate limiting.

apiserver_flowcontrol_dispatch_r

The scheduling R value of APIServer rate limiting.

apiserver_flowcontrol_dispatched_requests_total

The total number of requests scheduled by APIServer rate limiting.

apiserver_flowcontrol_latest_s

The recent S value bounds of APIServer rate limiting.

apiserver_flowcontrol_lower_limit_seats

The lower bound of seats in APIServer rate limiting.

apiserver_flowcontrol_next_discounted_s_bounds

The next discounted S value bounds of APIServer rate limiting.

apiserver_flowcontrol_next_s_bounds

The next S value bounds of APIServer rate limiting.

apiserver_flowcontrol_nominal_limit_seats

The nominal upper bound of seats in APIServer rate limiting.

apiserver_flowcontrol_priority_level_request_count_samples_bucket

The distribution of priority level request samples in APIServer rate limiting.

apiserver_flowcontrol_priority_level_request_count_samples_count

The count of priority level request samples in APIServer rate limiting.

apiserver_flowcontrol_priority_level_request_count_samples_sum

The sum of priority level request samples in APIServer rate limiting.

apiserver_flowcontrol_priority_level_request_count_watermarks_bucket

The distribution of watermark levels for priority level request samples in APIServer rate limiting.

apiserver_flowcontrol_priority_level_request_count_watermarks_count

The count of watermark levels for priority level request samples in APIServer rate limiting.

apiserver_flowcontrol_priority_level_request_count_watermarks_sum

The sum of watermark levels for priority level request samples in APIServer rate limiting.

apiserver_flowcontrol_priority_level_request_utilization_bucket

The distribution of request utilization samples by priority level in APIServer rate limiting.

apiserver_flowcontrol_priority_level_request_utilization_count

The count of request utilization samples by priority level in APIServer rate limiting.

apiserver_flowcontrol_priority_level_request_utilization_sum

The sum of request utilization by priority level in APIServer rate limiting.

apiserver_flowcontrol_priority_level_seat_count_samples_bucket

The distribution of seat samples for priority level in APIServer rate limiting.

apiserver_flowcontrol_priority_level_seat_count_samples_count

The count of seat samples for priority level in APIServer rate limiting.

apiserver_flowcontrol_priority_level_seat_count_samples_sum

The sum of seat samples for priority level in APIServer rate limiting.

apiserver_flowcontrol_priority_level_seat_count_watermarks_bucket

The distribution of watermark levels for seat samples in APIServer rate limiting by priority level.

apiserver_flowcontrol_priority_level_seat_count_watermarks_count

The count of watermark levels for seat samples in APIServer rate limiting by priority level.

apiserver_flowcontrol_priority_level_seat_count_watermarks_sum

The sum of watermark levels for seat samples in APIServer rate limiting by priority level.

apiserver_flowcontrol_priority_level_seat_utilization_bucket

The distribution of seat utilization samples by priority level in APIServer rate limiting.

apiserver_flowcontrol_priority_level_seat_utilization_count

The count of seat utilization samples by priority level in APIServer rate limiting.

apiserver_flowcontrol_priority_level_seat_utilization_sum

The sum of seat utilization by priority level in APIServer rate limiting.

apiserver_flowcontrol_read_vs_write_current_requests_bucket

The distribution of current read/write requests in APIServer rate limiting.

apiserver_flowcontrol_read_vs_write_current_requests_count

The count of current read/write requests in APIServer rate limiting.

apiserver_flowcontrol_read_vs_write_current_requests_sum

The sum of current read/write requests in APIServer rate limiting.

apiserver_flowcontrol_read_vs_write_request_count_samples_bucket

The distribution of read/write request count samples in APIServer rate limiting.

apiserver_flowcontrol_read_vs_write_request_count_samples_count

The count of read/write request count samples in APIServer rate limiting.

apiserver_flowcontrol_read_vs_write_request_count_samples_sum

The sum of read/write request count samples in APIServer rate limiting.

apiserver_flowcontrol_read_vs_write_request_count_watermarks_bucket

The distribution of read/write request count watermarks in APIServer rate limiting.

apiserver_flowcontrol_read_vs_write_request_count_watermarks_count

The count of read/write request count watermarks in APIServer rate limiting.

apiserver_flowcontrol_read_vs_write_request_count_watermarks_sum

The sum of read/write request count watermarks in APIServer rate limiting.

apiserver_flowcontrol_rejected_requests_total

The total number of requests rejected by APIServer rate limiting.

apiserver_flowcontrol_request_concurrency_in_use

The count of concurrent requests in APIServer rate limiting.

apiserver_flowcontrol_request_concurrency_limit

The concurrent request limit in APIServer rate limiting.

apiserver_flowcontrol_request_dispatch_no_accommodation_total

The total number of requests that could not be accommodated by the scheduling of APIServer rate limiting.

apiserver_flowcontrol_request_execution_seconds_bucket

The distribution of request latencies in seconds in APIServer rate limiting.

apiserver_flowcontrol_request_execution_seconds_count

The count of request latencies in seconds in APIServer rate limiting.

apiserver_flowcontrol_request_execution_seconds_sum

The sum of request latencies in seconds in APIServer rate limiting.

apiserver_flowcontrol_request_queue_length_after_enqueue_bucket

The distribution of request queue lengths after enqueuing in APIServer rate limiting.

apiserver_flowcontrol_request_queue_length_after_enqueue_count

The count of request queue lengths after enqueuing in APIServer rate limiting.

apiserver_flowcontrol_request_queue_length_after_enqueue_sum

The sum of request queue lengths after enqueuing in APIServer rate limiting.

apiserver_flowcontrol_request_wait_duration_seconds_bucket

The distribution of request waiting durations in seconds in APIServer rate limiting.

apiserver_flowcontrol_request_wait_duration_seconds_count

The count of request waiting durations in seconds in APIServer rate limiting.

apiserver_flowcontrol_request_wait_duration_seconds_sum

The sum of request waiting durations in seconds in APIServer rate limiting.

apiserver_flowcontrol_seat_fair_frac

The fair share ratios determined by the APIServer during the last borrowing adjustment period.

apiserver_flowcontrol_target_seats

The target number of seats in APIServer rate limiting.

apiserver_flowcontrol_upper_limit_seats

The upper bound of seats in APIServer rate limiting.

apiserver_flowcontrol_watch_count_samples_bucket

The distribution of observed samples in APIServer rate limiting.

apiserver_flowcontrol_watch_count_samples_count

The count of observed samples in APIServer rate limiting.

apiserver_flowcontrol_watch_count_samples_sum

The sum of observed samples in APIServer rate limiting.

apiserver_flowcontrol_work_estimated_seats_bucket

The distribution of estimated seats in APIServer rate limiting.

apiserver_flowcontrol_work_estimated_seats_count

The count of estimated seats in APIServer rate limiting.

apiserver_flowcontrol_work_estimated_seats_sum

The sum of estimated seats in APIServer rate limiting.

apiserver_init_events_total

The total number of initialization events in the APIServer.

apiserver_kube_aggregator_x509_insecure_sha1_total

The number of requests using insecure Secure Hash Algorithm 1 (SHA1) signatures.

apiserver_kube_aggregator_x509_missing_san_total

The total number of x509 certificates missing Subject Alternative Names (SANs) in APIServer kube-aggregator.

apiserver_longrunning_gauge

The long-running meter in the APIServer.

apiserver_longrunning_requests

The long-running requests in the APIServer.

apiserver_nodeport_repair_reconcile_errors_total

The total number of node port fix reconcile errors in the APIServer.

apiserver_realtime_watchers

The number of real-time observers in the APIServer.

apiserver_registered_watchers

The number of registered watchers in the APIServer.

apiserver_request_aborts_total

The total number of suspended APIServer requests.

apiserver_request_body_size_bytes_bucket

The distribution of APIServer request body sizes in bytes.

apiserver_request_body_size_bytes_count

The count of APIServer request body sizes in bytes.

apiserver_request_body_size_bytes_sum

The sum of APIServer request body sizes in bytes.

apiserver_request_count

The number of APIServer requests.

apiserver_request_duration_seconds_bucket

The distribution of APIServer request latencies in seconds

apiserver_request_duration_seconds_count

The count of APIServer request latencies in seconds

apiserver_request_duration_seconds_sum

The sum of APIServer request latencies in seconds

apiserver_request_filter_duration_seconds_bucket

The distribution of request filter latencies in seconds.

apiserver_request_filter_duration_seconds_count

The count of request filter latencies in seconds.

apiserver_request_filter_duration_seconds_sum

The sum of request filter latencies in seconds.

apiserver_request_latencies_summary

The summary of APIServer request latencies.

apiserver_request_no_resourceversion_list_total

The total number of unversioned LIST requests.

apiserver_request_post_timeout_total

The total number of timed out POST requests.

apiserver_request_sli_duration_seconds_bucket

The distribution of Service Level Indicator (SLI) request latencies in seconds.

apiserver_request_sli_duration_seconds_count

The count of SLI request latencies in seconds.

apiserver_request_sli_duration_seconds_sum

The sum of SLI request latencies in seconds.

apiserver_request_slo_duration_seconds_bucket

The distribution of Service Level Objective (SLO) request latencies in seconds.

apiserver_request_slo_duration_seconds_count

The count of SLO request latencies in seconds.

apiserver_request_slo_duration_seconds_sum

The sum of SLO request latencies in seconds.

apiserver_request_terminations_total

The total number of terminated API requests.

apiserver_request_timestamp_comparison_time_bucket

The distribution of time spent in timestamp comparison of API requests.

apiserver_request_timestamp_comparison_time_count

The count of API request samples for timestamp comparison.

apiserver_request_timestamp_comparison_time_sum

The sum of time spent in timestamp comparison of API requests.

apiserver_request_total

The total number of API requests.

apiserver_requested_deprecated_apis

The count of APIServer requests for deprecated APIs.

apiserver_response_sizes_bucket

The distribution of response body sizes of API requests.

apiserver_response_sizes_count

The count of response body sizes of API requests.

apiserver_response_sizes_sum

The sum of response body sizes of API requests.

apiserver_selfrequest_total

The total number of APIServer self-requests.

apiserver_storage_data_key_generation_duration_seconds_bucket

The distribution of time consumed by the APIServer to generate data keys in seconds.

apiserver_storage_data_key_generation_duration_seconds_count

The count of time consumed by the APIServer to generate data keys in seconds.

apiserver_storage_data_key_generation_duration_seconds_sum

The sum of time consumed by the APIServer to generate data keys in seconds.

apiserver_storage_data_key_generation_failures_total

The total number of data key generation failures.

apiserver_storage_db_total_size_in_bytes

The total size of APIServer databases in bytes.

apiserver_storage_decode_errors_total

The total number of decoding errors in the APIServer.

apiserver_storage_envelope_transformation_cache_misses_total

The total number of envelope conversion cache misses in the APIServer.

apiserver_storage_events_received_total

The total number of events received by the APIServer.

apiserver_storage_list_evaluated_objects_total

The total number of evaluated objects in the APIServer storage list.

apiserver_storage_list_fetched_objects_total

The total number of objects obtained by the APIServer storage list.

apiserver_storage_list_returned_objects_total

The total number of objects returned by the APIServer storage list.

apiserver_storage_list_total

The total number of operations performed by the APIServer storage list.

apiserver_storage_objects

The number of objects stored in the APIServer.

apiserver_storage_size_bytes

The total size of objects stored in the APIServer.

apiserver_terminated_watchers_total

The total number of watchers terminated by the APIServer.

apiserver_tls_handshake_errors_total

The total number of requests with Transport Layer Security (TLS) handshake errors in the APIServer.

apiserver_too_large_resourceversion_errors

The total number of requests whose resource version is too late in the APIServer.

apiserver_watch_cache_events_dispatched_total

The total number of cache distribution events observed by the APIServer.

apiserver_watch_cache_events_received_total

The total number of cache reception events observed by the APIServer.

apiserver_watch_cache_initializations_total

The total number of cache initializations observed by the APIServer.

apiserver_watch_cache_read_wait_seconds_bucket

The distribution of cache read waiting durations in seconds observed by the APIServer.

apiserver_watch_cache_read_wait_seconds_count

The count of cache read waiting durations in seconds observed by the APIServer.

apiserver_watch_cache_read_wait_seconds_sum

The sum of cache read waiting durations in seconds observed by the APIServer.

apiserver_watch_cache_watch_cache_initializations_total

The total number of cache initializations observed by the APIServer.

apiserver_watch_events_sizes_bucket

The distribution of sizes of events observed by the APIServer.

apiserver_watch_events_sizes_count

The count of sizes of events observed by the APIServer.

apiserver_watch_events_sizes_sum

The sum of sizes of events observed by the APIServer.

apiserver_watch_events_total

The total number of events observed by the APIServer.

apiserver_webhooks_x509_insecure_sha1_total

The number of requests using insecure SHA1 signatures.

apiserver_webhooks_x509_missing_san_total

The total number of missing SANs in APIServer webhooks.

authenticated_user_requests

The total number of authenticated user requests.

authentication_attempts

The number of authentication attempts.

authentication_duration_seconds_bucket

The distribution of authentication durations in seconds.

authentication_duration_seconds_count

The count of authentication durations in seconds.

authentication_duration_seconds_sum

The sum of authentication durations in seconds.

authentication_token_cache_active_fetch_count

The count of active fetches for the authentication token cache.

authentication_token_cache_fetch_total

The total number of times the authentication token was retrieved from the cache.

authentication_token_cache_request_duration_seconds_bucket

The distribution of request durations in seconds for authentication token cache.

authentication_token_cache_request_duration_seconds_count

The count of request durations in seconds for authentication token cache.

authentication_token_cache_request_duration_seconds_sum

The sum of request durations in seconds for authentication token cache.

authentication_token_cache_request_total

The total number of requests for authentication token cache.

authorization_attempts_total

The total number of authorization attempts.

authorization_duration_seconds_bucket

The distribution of authorization durations in seconds.

authorization_duration_seconds_count

The count of authorization durations in seconds.

authorization_duration_seconds_sum

The sum of authorization durations in seconds.

cardinality_enforcement_unexpected_categorizations_total

The total number of unexpected classifications in classification execution.

count

The count details.

cpu_utilization_core

The CPU utilization of the core.

disabled_metric_total

The total number of disabled metrics.

disabled_metrics_total

The total number of disabled metrics.

etcd_bookmark_counts

The number of ETCD bookmarks.

etcd_db_total_size_in_bytes

The total size of ETCD databases in bytes.

etcd_lease_object_counts_bucket

The distribution of objects attached to a single ETCD lease.

etcd_lease_object_counts_count

The count of objects attached to a single ETCD lease.

etcd_lease_object_counts_sum

The sum of objects attached to a single ETCD lease.

etcd_object_counts

The number of ETCD objects.

etcd_request_duration_seconds_bucket

The distribution of ETCD request latencies in seconds.

etcd_request_duration_seconds_count

The count of ETCD request latencies in seconds.

etcd_request_duration_seconds_sum

The sum of ETCD request latencies in seconds.

etcd_request_errors_total

The total number of failed ETCD requests.

etcd_requests_total

The total number of ETCD requests.

etcd_watcher_channel_length

The channel length of the ETCD watcher.

etcd_watcher_received_events

The number of events received by the ETCD watcher.

etcd_watcher_sended_events_latency_milliseconds_bucket

The distribution of event sending latencies of the ETCD watcher in milliseconds.

etcd_watcher_sended_events_latency_milliseconds_count

The count of event sending latencies of the ETCD watcher in milliseconds.

etcd_watcher_sended_events_latency_milliseconds_sum

The sum of event sending latencies of the ETCD watcher in milliseconds.

field_validation_request_duration_seconds_bucket

The distribution of field validation request latencies in seconds.

field_validation_request_duration_seconds_count

The count of field validation request latencies in seconds.

field_validation_request_duration_seconds_sum

The sum of field validation request latencies in seconds.

get_token_count

The number of obtained tokens.

get_token_fail_count

The number of token obtaining failures.

go_cgo_go_to_c_calls_calls_total

The total number of C function calls made by cgo.

go_cpu_classes_gc_mark_assist_cpu_seconds_total

The total CPU seconds spent on garbage collection (GC) mark assistance by Go.

go_cpu_classes_gc_mark_dedicated_cpu_seconds_total

The total CPU seconds spent on dedicated GC marking by Go.

go_cpu_classes_gc_mark_idle_cpu_seconds_total

The total CPU seconds spent on idle GC marking by Go.

go_cpu_classes_gc_pause_cpu_seconds_total

The total CPU seconds spent on GC pauses by Go.

go_cpu_classes_gc_total_cpu_seconds_total

The total CPU seconds spent on GC by Go.

go_cpu_classes_idle_cpu_seconds_total

The total CPU idle time in Go.

go_cpu_classes_scavenge_assist_cpu_seconds_total

The total CPU seconds spent on GC assist scanning by Go.

go_cpu_classes_scavenge_background_cpu_seconds_total

The total CPU seconds spent on background GC scanning by Go.

go_cpu_classes_scavenge_total_cpu_seconds_total

The total CPU seconds spent on GC by Go.

go_cpu_classes_total_cpu_seconds_total

The total CPU seconds.

go_cpu_classes_user_cpu_seconds_total

The user CPU time.

go_gc_cycles_automatic_gc_cycles_total

The total number of automatic GC cycles.

go_gc_cycles_forced_gc_cycles_total

The total number of forced GC cycles.

go_gc_cycles_total_gc_cycles_total

The total number of GC cycles.

go_gc_duration_seconds

The GC pause time in seconds.

go_gc_duration_seconds_count

The count of GC pause time in seconds.

go_gc_duration_seconds_sum

The sum of GC pause time in seconds.

go_gc_gogc_percent

The GO GC target percentage.

go_gc_gomemlimit_bytes

The GC memory limit in bytes.

go_gc_heap_allocs_by_size_bytes_bucket

The distribution of allocated heap memory sizes in bytes.

go_gc_heap_allocs_by_size_bytes_count

The count of allocated heap memory sizes in bytes.

go_gc_heap_allocs_by_size_bytes_sum

The sum of allocated heap memory sizes in bytes.

go_gc_heap_allocs_by_size_bytes_total_bucket

The distribution of all allocated heap memory sizes in bytes.

go_gc_heap_allocs_by_size_bytes_total_count

The count of all allocated heap memory sizes in bytes.

go_gc_heap_allocs_by_size_bytes_total_sum

The sum of all allocated heap memory sizes in bytes.

go_gc_heap_allocs_bytes_total

The total number of bytes allocated on the heap.

go_gc_heap_allocs_objects_total

The total number of objects allocated on the heap.

go_gc_heap_frees_by_size_bytes_bucket

The distribution of released heap memory sizes in bytes.

go_gc_heap_frees_by_size_bytes_count

The count of released heap memory sizes in bytes.

go_gc_heap_frees_by_size_bytes_sum

The sum of released heap memory sizes in bytes.

go_gc_heap_frees_by_size_bytes_total_bucket

The distribution of all released heap memory sizes in bytes.

go_gc_heap_frees_by_size_bytes_total_count

The count of all released heap memory sizes in bytes.

go_gc_heap_frees_by_size_bytes_total_sum

The sum of all released heap memory sizes in bytes.

go_gc_heap_frees_bytes_total

The total number of bytes released from the heap.

go_gc_heap_frees_objects_total

The total number of objects released from the heap.

go_gc_heap_goal_bytes

The expected heap size in bytes.

go_gc_heap_live_bytes

The heap memory occupied by live objects in bytes.

go_gc_heap_objects_objects

The number of objects that occupy the heap memory.

go_gc_heap_tiny_allocs_objects_total

The total number of tiny object allocations.

go_gc_limiter_last_enabled_gc_cycle

The last GC cycle enabled.

go_gc_pauses_seconds_bucket

The distribution of GC pause durations.

go_gc_pauses_seconds_count

The count of GC pause durations.

go_gc_pauses_seconds_sum

The sum of GC pause durations.

go_gc_pauses_seconds_total_bucket

The distribution of all GC pause durations.

go_gc_pauses_seconds_total_count

The count of all GC pause durations.

go_gc_pauses_seconds_total_sum

The sum of all GC pause durations.

go_gc_scan_globals_bytes

The number of bytes scanned in global variables.

go_gc_scan_heap_bytes

The number of bytes scanned in the heap.

go_gc_scan_stack_bytes

The number of bytes scanned in the stack.

go_gc_scan_total_bytes

The total number of scanned bytes.

go_gc_stack_starting_size_bytes

The initial stack size in bytes.

go_godebug_non_default_behavior_execerrdot_events_total

The count of non-default behavior debug events related to the execerrdot debug setting.

go_godebug_non_default_behavior_gocachehash_events_total

The count of non-default behavior debug events related to the gocachehash debug setting.

go_godebug_non_default_behavior_gocachetest_events_total

The count of non-default behavior debug events related to the gocachetest debug setting.

go_godebug_non_default_behavior_gocacheverify_events_total

The count of non-default behavior debug events related to the gocacheverify debug setting.

go_godebug_non_default_behavior_gotypesalias_events_total

The count of non-default behavior debug events related to the gotypesalias debug setting.

go_godebug_non_default_behavior_http2client_events_total

The count of non-default behavior debug events related to the http2client debug setting.

go_godebug_non_default_behavior_http2server_events_total

The count of non-default behavior debug events related to the http2server debug setting.

go_godebug_non_default_behavior_httplaxcontentlength_events_total

The count of non-default behavior debug events related to the httplaxcontentlength debug setting.

go_godebug_non_default_behavior_httpmuxgo121_events_total

The count of non-default behavior debug events related to the httpmuxgo121 debug setting.

go_godebug_non_default_behavior_installgoroot_events_total

The count of non-default behavior debug events related to the installgoroot debug setting.

go_godebug_non_default_behavior_jstmpllitinterp_events_total

The count of non-default behavior debug events related to the jstmpllitinterp debug setting.

go_godebug_non_default_behavior_multipartmaxheaders_events_total

The count of non-default behavior debug events related to the multipartmaxheaders debug setting.

go_godebug_non_default_behavior_multipartmaxparts_events_total

The count of non-default behavior debug events related to the multipartmaxparts debug setting.

go_godebug_non_default_behavior_multipathtcp_events_total

The count of non-default behavior debug events related to the multipathtcp debug setting.

go_godebug_non_default_behavior_panicnil_events_total

The count of non-default behavior debug events related to the panicnil debug setting.

go_godebug_non_default_behavior_randautoseed_events_total

The count of non-default behavior debug events related to the randautoseed debug setting.

go_godebug_non_default_behavior_tarinsecurepath_events_total

The count of non-default behavior debug events related to the tarinsecurepath debug setting.

go_godebug_non_default_behavior_tls10server_events_total

The count of non-default behavior debug events related to the tls10server debug setting.

go_godebug_non_default_behavior_tlsmaxrsasize_events_total

The count of non-default behavior debug events related to the tlsmaxrsasize debug setting.

go_godebug_non_default_behavior_tlsrsakex_events_total

The count of non-default behavior debug events related to the tlsrsakex debug setting.

go_godebug_non_default_behavior_tlsunsafeekm_events_total

The count of non-default behavior debug events related to the tlsunsafeekm debug setting.

go_godebug_non_default_behavior_x509sha1_events_total

The count of non-default behavior debug events related to the x509sha1 debug setting.

go_godebug_non_default_behavior_x509usefallbackroots_events_total

The count of non-default behavior debug events related to the x509usefallbackroots debug setting.

go_godebug_non_default_behavior_x509usepolicies_events_total

The count of non-default behavior debug events related to the x509usepolicies debug setting.

go_godebug_non_default_behavior_zipinsecurepath_events_total

The count of non-default behavior debug events related to the zipinsecurepath debug setting.

go_goroutines

The number of goroutines.

go_info

The operating system information.

go_memory_classes_heap_free_bytes

The amount of idle heap memory in bytes.

go_memory_classes_heap_objects_bytes

The amount of heap memory occupied by objects in bytes.

go_memory_classes_heap_released_bytes

The amount of heap memory released in bytes.

go_memory_classes_heap_stacks_bytes

The amount of memory reserved for the stack in bytes.

go_memory_classes_heap_unused_bytes

The amount of heap memory not used in bytes.

go_memory_classes_metadata_mcache_free_bytes

The amount of idle memory in mcache in bytes.

go_memory_classes_metadata_mcache_inuse_bytes

The amount of memory in use in mcache in bytes.

go_memory_classes_metadata_mspan_free_bytes

The amount of idle memory in mspan in bytes.

go_memory_classes_metadata_mspan_inuse_bytes

The amount of memory in use in mspan in bytes.

go_memory_classes_metadata_other_bytes

The amount of memory occupied by other metadata in bytes.

go_memory_classes_os_stacks_bytes

The amount of memory reserved for the operating system stack in bytes.

go_memory_classes_other_bytes

The amount of memory used for other purposes in bytes.

go_memory_classes_profiling_buckets_bytes

The bytes used by profiling buckets.

go_memory_classes_total_bytes

The total memory in bytes.

go_memstats_alloc_bytes

The amount of memory allocated in bytes.

go_memstats_alloc_bytes_total

The cumulative amount of memory allocated in bytes.

go_memstats_buck_hash_sys_bytes

The amount of memory used by hash tables in the operating system in bytes.

go_memstats_frees_total

The total number of releases.

go_memstats_gc_cpu_fraction

The GC CPU utilization (%).

go_memstats_gc_sys_bytes

The amount of memory used by GC in the operating system in bytes.

go_memstats_heap_alloc_bytes

The amount of heap memory allocated in bytes.

go_memstats_heap_idle_bytes

The amount of idle heap memory in bytes.

go_memstats_heap_inuse_bytes

The amount of heap memory in use in bytes.

go_memstats_heap_objects

The number of objects allocated on the heap.

go_memstats_heap_released_bytes

The amount of heap memory released in bytes.

go_memstats_heap_sys_bytes

The amount of memory allocated to the heap by the operating system in bytes.

go_memstats_last_gc_time_seconds

The last GC duration in seconds.

go_memstats_lookups_total

The total number of lookups.

go_memstats_mallocs_total

The total number of allocations.

go_memstats_mcache_inuse_bytes

The amount of memory in use in mcache in bytes.

go_memstats_mcache_sys_bytes

The amount of memory allocated to mcache by the operating system in bytes.

go_memstats_mspan_inuse_bytes

The amount of memory in use in mspan in bytes.

go_memstats_mspan_sys_bytes

The amount of memory allocated to mspan by the operating system in bytes.

go_memstats_next_gc_bytes

The number of bytes to be released at the next GC in bytes.

go_memstats_other_sys_bytes

The amount of memory allocated for other purposes by the operating system in bytes.

go_memstats_stack_inuse_bytes

The amount of stack memory in use in bytes.

go_memstats_stack_sys_bytes

The amount of memory allocated to the stack by the operating system in bytes.

go_memstats_sys_bytes

The total memory allocated by the operating system in bytes.

go_sched_gomaxprocs_threads

The number of threads determined by GOMAXPROCS.

go_sched_goroutines_goroutines

The number of goroutines.

go_sched_latencies_seconds_bucket

The distribution of scheduling latencies in seconds.

go_sched_latencies_seconds_count

The count of scheduling latencies in seconds.

go_sched_latencies_seconds_sum

The sum of scheduling latencies in seconds.

go_sched_pauses_stopping_gc_seconds_bucket

The distribution of stop-the-world GC pause durations in seconds.

go_sched_pauses_stopping_gc_seconds_count

The count of stop-the-world GC pause durations in seconds.

go_sched_pauses_stopping_gc_seconds_sum

The sum of stop-the-world GC pause durations in seconds.

go_sched_pauses_stopping_other_seconds_bucket

The distribution of other GC pause durations for other specific stops in seconds.

go_sched_pauses_stopping_other_seconds_count

The count of other GC pause durations for other specific stops in seconds.

go_sched_pauses_stopping_other_seconds_sum

The sum of other GC pause durations for other specific stops in seconds.

go_sched_pauses_total_gc_seconds_bucket

The distribution of all GC pause durations in seconds.

go_sched_pauses_total_gc_seconds_count

The count of all GC pause durations in seconds.

go_sched_pauses_total_gc_seconds_sum

The sum of all GC pause durations in seconds.

go_sched_pauses_total_other_seconds_bucket

The distribution of other GC pause durations for all other stops in seconds.

go_sched_pauses_total_other_seconds_count

The count of other GC pause durations for all other stops in seconds.

go_sched_pauses_total_other_seconds_sum

The cumulative sum of all goroutine pause durations caused by non-major activities in the scheduler in seconds.

go_sync_mutex_wait_total_seconds_total

The total waiting duration for Mutex locks in seconds.

go_threads

The number of Go threads.

grpc_client_handled_total

The total number of requests handled by the gRPC client.

grpc_client_msg_received_total

The total number of messages received by the gRPC client.

grpc_client_msg_sent_total

The total number of messages sent by the gRPC client.

grpc_client_started_total

The total number of gRPC client startups.

hidden_metric_total

The total number of hidden metrics.

hidden_metrics_total

The total number of hidden metrics.

http_request_duration_microseconds

The HTTP request latency in microseconds.

http_request_size_bytes

The HTTP request size in bytes.

http_requests_total

The total number of HTTP requests.

http_response_size_bytes

The HTTP response body size in bytes.

job

The job name.

job_instance_mode

The job instance mode.

kube_apiserver_clusterip_allocator_allocated_ips

Kubernetes APIServer: The number of allocated cluster IP addresses.

kube_apiserver_clusterip_allocator_allocation_errors_total

Kubernetes APIServer: The total number of errors that occurred in cluster IP address allocations.

kube_apiserver_clusterip_allocator_allocation_total

Kubernetes APIServer: The total number of cluster IP address allocations.

kube_apiserver_clusterip_allocator_available_ips

Kubernetes APIServer: The number of available cluster IP addresses.

kube_apiserver_nodeport_allocator_allocated_ports

Kubernetes APIServer: The number of allocated node ports.

kube_apiserver_nodeport_allocator_allocation_errors_total

Kubernetes APIServer: The total number of errors that occurred in node port allocations.

kube_apiserver_nodeport_allocator_allocation_total

Kubernetes APIServer: The total number of node port allocations.

kube_apiserver_nodeport_allocator_available_ports

Kubernetes APIServer: The number of available node ports.

kube_apiserver_pod_logs_backend_tls_failure_total

Kubernetes APIServer: The total number of pod/log requests that failed due to TLS verification errors.

kube_apiserver_pod_logs_insecure_backend_total

Kubernetes APIServer: The total number of insecure pod/log requests.

kube_apiserver_pod_logs_pods_logs_backend_tls_failure_total

Kubernetes APIServer: The total number of pod/log requests that failed due to TLS verification errors.

kube_apiserver_pod_logs_pods_logs_insecure_backend_total

Kubernetes APIServer: The total number of insecure pod/log requests.

kubelet_container_log_filesystem_used_bytes

Kubelet: The space of the file system used by container logs in bytes.

kubelet_node_name

Kubelet: The node name.

kubelet_pleg_relist_duration_seconds_bucket

Kubelet: The distribution of PLEG relisting durations in seconds.

kubelet_pod_worker_duration_seconds_bucket

Kubelet: The distribution of Pod worker relisting durations in seconds.

kubelet_volume_stats_available_bytes

Kubelet: The number of available bytes in the volume.

kubelet_volume_stats_capacity_bytes

Kubelet: The volume capacity in bytes.

kubelet_volume_stats_inodes

Kubelet: The number of available inodes in the volume.

kubelet_volume_stats_inodes_free

Kubelet: The number of idle inodes in the volume.

kubelet_volume_stats_inodes_used

Kubelet: The number of used inodes in the volume.

kubelet_volume_stats_used_bytes

Kubelet: The number of used bytes in the volume.

kubernetes_build_info

The Kubernetes build information.

kubernetes_feature_enabled

Specifies that Kubernetes features are enabled.

last_list_all_response_size_in_bytes

The total size of all response bodies in the recent list in bytes.

memory_utilization_byte

The used memory in bytes.

node_authorizer_graph_actions_duration_seconds_bucket

Node authorizer: The distribution of graph operation durations in seconds.

node_authorizer_graph_actions_duration_seconds_count

Node authorizer: The count of graph operation durations in seconds.

node_authorizer_graph_actions_duration_seconds_sum

Node authorizer: The sum of graph operation durations in seconds.

pod_security_evaluations_total

The total number of pod security evaluations.

pod_security_exemptions_total

The total number of pod security exemptions.

process_cpu_seconds_total

The total process CPU seconds.

process_max_fds

The maximum number of file descriptors for the process.

process_open_fds

The number of file descriptors opened by the process.

process_resident_memory_bytes

The resident memory size of the process in bytes.

process_start_time_seconds

The process startup duration in seconds.

process_virtual_memory_bytes

The number of virtual memory bytes for the process.

process_virtual_memory_max_bytes

The maximum number of virtual memory bytes for the process.

registered_metric_total

The total number of registered metrics.

registered_metrics_total

The total number of registered metrics.

rest_client_exec_plugin_certificate_rotation_age_bucket

REST client plug-in: The distribution of certificate rotation ages in seconds.

rest_client_exec_plugin_certificate_rotation_age_count

REST client plug-in: The count of certificate rotation ages in seconds.

rest_client_exec_plugin_certificate_rotation_age_sum

REST client plug-in: The sum of certificate rotation ages in seconds.

rest_client_exec_plugin_ttl_seconds

REST client plug-in: The time to live (TTL) of the certificate in seconds.

rest_client_request_duration_seconds_bucket

The distribution of REST client request durations in seconds.

rest_client_request_duration_seconds_count

The count of REST client request durations in seconds.

rest_client_request_duration_seconds_sum

The sum of REST client request durations in seconds.

rest_client_request_latency_seconds_bucket

The total of REST client request latencies in seconds.

rest_client_request_size_bytes_bucket

The distribution of REST client request-body sizes in bytes.

rest_client_request_size_bytes_count

The count of REST client request-body sizes in bytes.

rest_client_request_size_bytes_sum

The sum of REST client request-body sizes in bytes.

rest_client_requests_total

The number of REST client requests.

rest_client_response_size_bytes_bucket

The distribution of REST client response-body sizes in bytes.

rest_client_response_size_bytes_count

The count of REST client response-body sizes in bytes.

rest_client_response_size_bytes_sum

The sum of REST client response-body sizes in bytes.

rest_client_transport_cache_entries

The number of transport entries of the REST client.

rest_client_transport_create_calls_total

The total number of transport creation calls of the REST client.

scheduler_pending_pods

Scheduler: The number of pods to be scheduled.

scheduler_pod_scheduling_attempts_bucket

Scheduler: The distribution of pod scheduling attempts.

scheduler_scheduler_cache_size

The scheduler cache size.

scrape_duration_seconds

The scrape duration in seconds.

scrape_samples_post_metric_relabeling

The number of scraped samples after metric relabeling.

scrape_samples_scraped

The number of scraped samples.

scrape_series_added

The number of new series added during the scrape.

serviceaccount_invalid_legacy_auto_token_uses_total

The total number of uses of invalid legacy automatic service account tokens.

serviceaccount_legacy_auto_token_uses_total

The total number of uses of legacy automatic service account tokens.

serviceaccount_legacy_manual_token_uses_total

The total number of uses of legacy manual service account tokens.

serviceaccount_legacy_tokens_total

The total number of legacy service account tokens.

serviceaccount_stale_tokens_total

The total number of stale service account tokens.

serviceaccount_valid_tokens_total

The total number of valid service account tokens.

ssh_tunnel_open_count

The number of opened Secure Shell (SSH) tunnels.

ssh_tunnel_open_fail_count

The number of SSH tunnels that failed to be opened.

up

The connectivity of metric collection.

watch_cache_capacity

The capacity of the monitoring cache.

watch_cache_capacity_decrease_total

The increasing capacity of the monitoring cache.

watch_cache_capacity_increase_total

The decreasing capacity of the monitoring cache.

workqueue_adds_total

The total number of additions to the work queue.

workqueue_depth

The work queue depth.

workqueue_longest_running_processor_seconds

The longest running processor time in the work queue in seconds.

workqueue_queue_duration_seconds_bucket

The distribution of queueing durations in the work queue in seconds.

workqueue_queue_duration_seconds_count

The count of queueing durations in the work queue in seconds.

workqueue_queue_duration_seconds_sum

The sum of queueing durations in the work queue in seconds.

workqueue_retries_total

The total number of retries in the work queue.

workqueue_unfinished_work_seconds

The duration of unfinished work in the work queue in seconds.

workqueue_work_duration_seconds_bucket

The distribution of work durations in the work queue in seconds.

workqueue_work_duration_seconds_count

The count of work durations in the work queue in seconds.

workqueue_work_duration_seconds_sum

The sum of work durations in the work queue in seconds.

Node Exporter (job name: node-exporter)

Metric

Description

ALERTS

The alerts.

ALERTS_FOR_STATE

The number of alerts based on status.

aliyun_prometheus_agent_append_duration_seconds

The duration of the Prometheus agent append operations in seconds.

aliyun_prometheus_agent_job_discovery_status

The discovery status of the Prometheus agent collection jobs.

aliyun_prometheus_agent_scrapes_by_target_total

The total number of scrapes by the Prometheus agent per target.

aliyun_prometheus_agent_target_info

The target information of the Prometheus agent.

count

The Go-specific count details.

go_gc_duration_seconds

The Go GC pause duration in seconds.

go_gc_duration_seconds_count

The Go GC pause duration in seconds.

go_gc_duration_seconds_sum

The total Go GC pause duration in seconds.

go_goroutines

The number of goroutines.

go_info

The Go-specific information.

go_memstats_alloc_bytes

The amount of memory allocated in bytes.

go_memstats_alloc_bytes_total

The cumulative amount of memory allocated in bytes.

go_memstats_buck_hash_sys_bytes

The amount of memory used by hash tables in the operating system in bytes.

go_memstats_frees_total

The total number of releases.

go_memstats_gc_cpu_fraction

The GC CPU utilization (%).

go_memstats_gc_sys_bytes

The amount of memory used by GC in the operating system in bytes.

go_memstats_heap_alloc_bytes

The amount of heap memory allocated in bytes.

go_memstats_heap_idle_bytes

The amount of idle heap memory in bytes.

go_memstats_heap_inuse_bytes

The amount of heap memory in use in bytes.

go_memstats_heap_objects

The number of objects allocated on the heap.

go_memstats_heap_released_bytes

The amount of heap memory released in bytes.

go_memstats_heap_sys_bytes

The amount of memory allocated to the heap by the operating system in bytes.

go_memstats_last_gc_time_seconds

The last GC duration in seconds.

go_memstats_lookups_total

The total number of lookups.

go_memstats_mallocs_total

The total number of allocations.

go_memstats_mcache_inuse_bytes

The amount of memory in use in mcache in bytes.

go_memstats_mcache_sys_bytes

The amount of memory allocated to mcache by the operating system in bytes.

go_memstats_mspan_inuse_bytes

The amount of memory in use in mspan in bytes.

go_memstats_mspan_sys_bytes

The amount of memory allocated to mspan by the operating system in bytes.

go_memstats_next_gc_bytes

The number of bytes to be released at the next GC in bytes.

go_memstats_other_sys_bytes

The amount of memory allocated for other purposes by the operating system in bytes.

go_memstats_stack_inuse_bytes

The amount of stack memory in use in bytes.

go_memstats_stack_sys_bytes

The amount of memory allocated to the stack by the operating system in bytes.

go_memstats_sys_bytes

The total memory allocated by the operating system in bytes.

go_threads

The number of threads.

instance

The instance.

instance_device

The instance device.

job

The job name.

k8s_node_cpu_utilization

The CPU utilization of Kubernetes nodes.

k8s_node_disk_utilization

The disk usage of Kubernetes nodes.

k8s_node_memory_utilization

The memory usage of Kubernetes nodes.

node_arp_entries

The number of Address Resolution Protocol (ARP) entries on the node.

node_boot_time_seconds

The node startup duration in seconds.

node_context_switches_total

The total number of context switches on the node.

node_cooling_device_cur_state

The current state of the cooling device of the node.

node_cooling_device_max_state

The maximum state of the cooling device of the node.

node_cpu_core_throttles_total

The total number of CPU core throttling events on the node.

node_cpu_frequency_max_hertz

The maximum CPU frequency of the node in Hertz.

node_cpu_frequency_min_hertz

The minimum CPU frequency of the node in Hertz.

node_cpu_guest_seconds_total

The total virtual machine time of the node CPU.

node_cpu_package_throttles_total

The total number of CPU package throttling events on the node.

node_cpu_scaling_frequency_hertz

The dynamic CPU frequency of the node in Hz.

node_cpu_scaling_frequency_max_hertz

The maximum dynamic CPU frequency of the node in Hz.

node_cpu_scaling_frequency_min_hertz

The minimum dynamic CPU frequency of the node in Hz.

node_cpu_scaling_governor

The dynamic CPU governor of the node.

node_cpu_seconds_total

The total CPU time consumed on the node.

node_disk_device_mapper_info

The DeviceMapper information of the node.

node_disk_discard_time_seconds_total

The total disk discard time of the node in seconds.

node_disk_discarded_sectors_total

The total disk discard sectors of the node.

node_disk_discards_completed_total

The total completed disk discards of the node.

node_disk_discards_merged_total

The total merged disk discards of the node.

node_disk_filesystem_info

The file system information of the node.

node_disk_flush_requests_time_seconds_total

The total flush request duration of the node in seconds.

node_disk_flush_requests_total

The total number of flush requests of the node.

node_disk_info

The node disk information.

node_disk_io_now

The current disk I/O of the node.

node_disk_io_time_seconds_total

The total disk I/O duration of the node in seconds.

node_disk_io_time_weighted_seconds_total

The total weighted disk I/O time of the node in seconds.

node_disk_read_bytes_total

The total number of bytes read from the disk of the node.

node_disk_read_time_seconds_total

The total disk read time of the node in seconds.

node_disk_reads_completed_total

The total number of complete disk reads of the node.

node_disk_reads_merged_total

The total number of merged disk reads of the node.

node_disk_write_time_seconds_total

The total disk write time of the node in seconds.

node_disk_writes_completed_total

The total number of complete disk writes of the node.

node_disk_writes_merged_total

The total number of merged disk writes of the node.

node_disk_written_bytes_total

The total number of bytes written to the disk of the node.

node_dmi_info

The Desktop Management Interface (DMI) information of the node.

node_edac_correctable_errors_total

The total number of correctable memory errors of the node.

node_edac_csrow_correctable_errors_total

The total number of correctable memory errors in chip-select rows of the node.

node_edac_csrow_uncorrectable_errors_total

The total number of uncorrectable memory errors in chip-select rows of the node.

node_edac_uncorrectable_errors_total

The total number of uncorrectable memory errors of the node.

node_entropy_available_bits

The number of bits of available entropy of the node.

node_entropy_pool_size_bits

The number of bits of the entropy pool of the node.

node_exporter_build_info

The build Information of the node exporter.

node_filefd_allocated

The number of allocated file descriptors of the node.

node_filefd_maximum

The maximum number of file descriptors of the node.

node_filesystem_avail_bytes

The available bytes of the node file system.

node_filesystem_device_error

The number of device errors in the file system of the node.

node_filesystem_files

The number of files in the file system of the node.

node_filesystem_files_free

The number of idle files in the file system of the node.

node_filesystem_free_bytes

The amount of idle space in the file system of the node in bytes.

node_filesystem_readonly

The read-only state of the file system of the node.

node_filesystem_size_bytes

The total size of the file system of the node in bytes.

node_forks_total

The total number of process forks of the node.

node_infiniband_excessive_buffer_overrun_errors_total

The total number of InfiniBand excessive buffer overflow errors on the node.

node_infiniband_info

The InfiniBand information of the node.

node_infiniband_link_downed_total

The total number of InfiniBand link down events on the node.

node_infiniband_link_error_recovery_total

The total number of InfiniBand link error recoveries on the node.

node_infiniband_local_link_integrity_errors_total

The total number of InfiniBand local link integrity errors of the node.

node_infiniband_multicast_packets_received_total

The total number of InfiniBand multicast packets received on the node.

node_infiniband_multicast_packets_transmitted_total

The total number of InfiniBand multicast packets sent from the node.

node_infiniband_physical_state_id

The physical state ID of the InfiniBand port on the node.

node_infiniband_port_constraint_errors_received_total

The total number of InfiniBand port constraint error received on the node.

node_infiniband_port_constraint_errors_transmitted_total

The total number of InfiniBand port constraint error sent from the node.

node_infiniband_port_data_received_bytes_total

The total bytes of data received by the InfiniBand port of the node.

node_infiniband_port_data_transmitted_bytes_total

The total data bytes sent on the node InfiniBand port.

node_infiniband_port_discards_transmitted_total

The total discarded sends on the node InfiniBand port.

node_infiniband_port_errors_received_total

The total errors received on the node InfiniBand port.

node_infiniband_port_packets_received_total

The total number of packets received by the InfiniBand port of the node.

node_infiniband_port_packets_transmitted_total

The total number of packets sent by the InfiniBand port of the node.

node_infiniband_port_receive_remote_physical_errors_total

The total remote physical errors received on the node InfiniBand port.

node_infiniband_port_receive_switch_relay_errors_total

The total switch relay errors received on the node InfiniBand port.

node_infiniband_port_transmit_wait_total

The total send waits on the node InfiniBand port.

node_infiniband_rate_bytes_per_second

The InfiniBand port rate in bytes per second on the node.

node_infiniband_state_id

The state ID of the InfiniBand port of the node.

node_infiniband_symbol_error_total

The total number of InfiniBand symbol errors of the node.

node_infiniband_unicast_packets_received_total

The total number of unicast packets received on the InfiniBand port of the node.

node_infiniband_unicast_packets_transmitted_total

The total number of unicast packets sent by the InfiniBand port of the node.

node_infiniband_vl15_dropped_total

The total VL15 discards on the node InfiniBand port.

node_intr_total

The total interrupts on the node.

node_load1

The 1-minute load on the node.

node_load15

The 15-minute load on the node.

node_load5

The 5-minute load on the node.

node_memory_Active_anon_bytes

The size of anonymous active memory on the node in bytes.

node_memory_Active_bytes

The size of active memory on the node in bytes.

node_memory_Active_file_bytes

The size of active file memory on the node (in bytes).

node_memory_AnonHugePages_bytes

The size of anonymous huge pages on the node (in bytes).

node_memory_AnonPages_bytes

The size of anonymous pages on the node (in bytes).

node_memory_Bounce_bytes

The size of bounce pages on the node (in bytes).

node_memory_Buffers_bytes

The size of buffers memory on the node (in bytes).

node_memory_Cached_bytes

The size of cached memory on the node (in bytes).

node_memory_CmaFree_bytes

The size of Contiguous Memory Allocator (CMA) free memory on the node (in bytes).

node_memory_CmaTotal_bytes

The total size of CMA memory on the node (in bytes).

node_memory_CommitLimit_bytes

The commit limit of memory on the node (in bytes).

node_memory_Committed_AS_bytes

The committed address space of memory on the node (in bytes).

node_memory_DirectMap1G_bytes

The size of 1 GB direct map memory on the node (in bytes).

node_memory_DirectMap2M_bytes

The size of 2 MB direct map memory on the node (in bytes).

node_memory_DirectMap4k_bytes

The size of 4 KB direct map memory on the node (in bytes).

node_memory_Dirty_bytes

The size of dirty memory on the node (in bytes).

node_memory_DupText_bytes

The size of duplicate text memory on the node (in bytes).

node_memory_FileHugePages_bytes

The size of file huge pages memory on the node (in bytes).

node_memory_FilePmdMapped_bytes

The size of physically allocated memory via file mapping on the node (in bytes).

node_memory_HardwareCorrupted_bytes

The size of hardware corrupted memory on the node (in bytes).

node_memory_HugePages_Free

The number of free huge pages on the node.

node_memory_HugePages_Rsvd

The number of reserved huge pages on the node.

node_memory_HugePages_Surp

The number of surplus huge pages on the node.

node_memory_HugePages_Total

The total number of huge pages on the node.

node_memory_Hugepagesize_bytes

The size of huge pages on the node (in bytes).

node_memory_Hugetlb_bytes

The size of Hugetlb memory on the node (in bytes).

node_memory_Inactive_anon_bytes

The size of inactive anonymous memory on the node (in bytes).

node_memory_Inactive_bytes

The size of inactive memory on the node (in bytes).

node_memory_Inactive_file_bytes

The size of inactive file memory on the node (in bytes).

node_memory_KernelStack_bytes

The size of KernelStack memory on the node (in bytes).

node_memory_KReclaimable_bytes

The size of KReclaimable memory on the node (in bytes).

node_memory_Mapped_bytes

The size of mapped memory on the node (in bytes).

node_memory_MemAvailable_bytes

The size of available memory on the node (in bytes).

node_memory_MemFree_bytes

The size of free memory on the node (in bytes).

node_memory_MemTotal_bytes

The total size of memory on the node (in bytes).

node_memory_MemZeroed_bytes

The size of zeroed memory on the node (in bytes).

node_memory_Mlocked_bytes

The size of locked memory on the node (in bytes).

node_memory_NFS_Unstable_bytes

The size of unstable NFS memory on the node (in bytes).

node_memory_PageTables_bytes

The size of page table memory on the node (in bytes).

node_memory_Percpu_bytes

The size of per-CPU memory on the node (in bytes).

node_memory_Shmem_bytes

The size of shared memory on the node (in bytes).

node_memory_ShmemHugePages_bytes

The size of shared huge pages memory on the node (in bytes).

node_memory_ShmemPmdMapped_bytes

The size of shared memory page middle directory (PMD) mapping on the node (in bytes).

node_memory_Slab_bytes

The size of Slab memory on the node (in bytes).

node_memory_SReclaimable_bytes

The size of SReclaimable memory on the node (in bytes).

node_memory_SUnreclaim_bytes

The size of SUnreclaim memory on the node (in bytes).

node_memory_SwapCached_bytes

The size of cached swap space on the node (in bytes).

node_memory_SwapFree_bytes

The size of free swap space on the node (in bytes).

node_memory_SwapTotal_bytes

The total size of swap space on the node (in bytes).

node_memory_Unevictable_bytes

The size of unevictable memory on the node (in bytes).

node_memory_VmallocChunk_bytes

The size of vmallocChunk memory on the node (in bytes).

node_memory_VmallocTotal_bytes

The total size of vmalloc memory on the node (in bytes).

node_memory_VmallocUsed_bytes

The size of used vmalloc memory on the node (in bytes).

node_memory_Writeback_bytes

The size of writeback memory on the node (in bytes).

node_memory_WritebackTmp_bytes

The size of temporary writeback memory on the node (in bytes).

node_netstat_Icmp_InErrors

The number of Internet Control Message Protocol (ICMP) receive errors on the node.

node_netstat_Icmp_InMsgs

The number of received ICMP messages.

node_netstat_Icmp_OutMsgs

The number of sent ICMP messages.

node_netstat_Icmp6_InErrors

The number of ICMPv6 receive errors.

node_netstat_Icmp6_InMsgs

The number of ICMPv6 messages received.

node_netstat_Icmp6_OutMsgs

The number of ICMPv6 messages sent.

node_netstat_Ip_Forwarding

The status of IP forwarding.

node_netstat_Ip6_InOctets

The number of bytes received over IPv6.

node_netstat_Ip6_OutOctets

The number of bytes sent over IPv6.

node_netstat_IpExt_InOctets

The number of bytes received for IP extended statistics.

node_netstat_IpExt_OutOctets

The number of bytes sent for IP extended statistics.

node_netstat_Tcp_ActiveOpens

The number of bytes received for IP extended statistics.

node_netstat_Tcp_CurrEstab

The current number of established TCP connections.

node_netstat_Tcp_InErrs

The number of TCP receive errors.

node_netstat_Tcp_InSegs

The number of TCP segments received.

node_netstat_Tcp_OutRsts

The number of TCP resets sent.

node_netstat_Tcp_OutSegs

The number of TCP segments sent.

node_netstat_Tcp_PassiveOpens

The number of passive TCP connections opened.

node_netstat_Tcp_RetransSegs

The number of TCP segments retransmitted.

node_netstat_TcpExt_ListenDrops

The number of TCP connections dropped from the listen queue.

node_netstat_TcpExt_ListenOverflows

The number of times the listen queue overflowed.

node_netstat_TcpExt_SyncookiesFailed

The number of times SYN_COOKIE validation failed.

node_netstat_TcpExt_SyncookiesRecv

The number of SYN_COOKIES received.

node_netstat_TcpExt_SyncookiesSent

The number of SYN_COOKIES sent.

node_netstat_TcpExt_TCPOFOQueue

The number of OFOs in the TCP send queue.

node_netstat_TcpExt_TCPSynRetrans

The number of TCP SYN retransmissions.

node_netstat_TcpExt_TCPTimeouts

The number of TCP timeouts.

node_netstat_Udp_InDatagrams

The number of UDP datagrams received.

node_netstat_Udp_InErrors

The number of UDP receive errors.

node_netstat_Udp_NoPorts

The number of UDP packets with unreachable destination ports.

node_netstat_Udp_OutDatagrams

The number of UDP datagrams sent.

node_netstat_Udp_RcvbufErrors

The number of UDP receive buffer errors.

node_netstat_Udp_SndbufErrors

The number of UDP send buffer errors.

node_netstat_Udp6_InDatagrams

The number of IPv6 UDP datagrams received.

node_netstat_Udp6_InErrors

The number of IPv6 UDP packets with unreachable destination ports.

node_netstat_Udp6_NoPorts

The number of IPv6 UDP packets with unreachable destination ports.

node_netstat_Udp6_OutDatagrams

The number of IPv6 UDP datagrams sent.

node_netstat_Udp6_RcvbufErrors

The number of IPv6 UDP receive buffer errors.

node_netstat_Udp6_SndbufErrors

The number of IPv6 UDP send buffer errors.

node_netstat_UdpLite_InErrors

The number of UDP Lite receive errors.

node_netstat_UdpLite6_InErrors

The number of IPv6 UDP Lite receive errors.

node_network_address_assign_type

The assignment type of the network address.

node_network_carrier

The information about the network carrier.

node_network_carrier_changes_total

The information about the network carrier.

node_network_carrier_down_changes_total

The total number of network carrier downgrade changes.

node_network_carrier_up_changes_total

The total number of network carrier upgrade changes.

node_network_device_id

The dormant state of the network.

node_network_dormant

The status of network dormancy.

node_network_flags

The network flags.

node_network_iface_id

The network interface ID.

node_network_iface_link

The link state of the network interface.

node_network_iface_link_mode

The link mode of the network interface.

node_network_info

The information about the network interface.

node_network_mtu_bytes

The maximum transmission unit size in bytes on the network.

node_network_name_assign_type

The assignment type of the network name.

node_network_net_dev_group

The network device group to which the network device belongs.

node_network_protocol_type

The network protocol type.

node_network_receive_bytes_total

The total number of bytes received cumulatively.

node_network_receive_compressed_total

The total number of compressed packets received.

node_network_receive_drop_total

The total number of packets dropped while receiving.

node_network_receive_errs_total

The total number of receive errors.

node_network_receive_fifo_total

The total number of receive first-in, first-out (FIFO) buffer errors while receiving.

node_network_receive_frame_total

The total number of frame alignment errors while receiving.

node_network_receive_multicast_total

The total number of multicast packets received.

node_network_receive_nohandler_total

The total number of receptions without a handler.

node_network_receive_packets_total

The total number of packets received.

node_network_speed_bytes

The network speed in bytes.

node_network_transmit_bytes_total

The total number of bytes sent cumulatively.

node_network_transmit_carrier_total

The total number of packets sent but lost due to ISP-related issues.

node_network_transmit_colls_total

The total number of transmission collisions.

node_network_transmit_compressed_total

The total number of compressed packets sent.

node_network_transmit_drop_total

The total number of packets sent but dropped.

node_network_transmit_errs_total

The total number of send errors.

node_network_transmit_fifo_total

The total number of FIFO buffer errors while sending.

node_network_transmit_packets_total

The total number of packets sent.

node_network_transmit_queue_length

The length of the send queue.

node_network_up

Indicates whether the network interface is enabled.

node_nf_conntrack_entries

The number of entries in the connection tracking table.

node_nf_conntrack_entries_limit

The limit of entries in the connection tracking table.

node_nf_conntrack_stat_drop

The limit of entries in the connection tracking table.

node_nf_conntrack_stat_early_drop

The early drop count for connection tracking.

node_nf_conntrack_stat_found

The success find count for connection tracking.

node_nf_conntrack_stat_ignore

The ignore count for connection tracking.

node_nf_conntrack_stat_insert

The insert count for connection tracking.

node_nf_conntrack_stat_insert_failed

The insert failure count for connection tracking.

node_nf_conntrack_stat_invalid

The invalid count for connection tracking.

node_nf_conntrack_stat_search_restart

The search restart count for connection tracking.

node_nfs_connections_total

The total number of NFS connections.

node_nfs_packets_total

The total number of NFS packets.

node_nfs_requests_total

The total number of NFS requests.

node_nfs_rpc_authentication_refreshes_total

The total number of NFS Remote Procedure Call (RPC) authentication refreshes.

node_nfs_rpc_retransmissions_total

The total number of NFS RPC retransmissions.

node_nfs_rpcs_total

The total number of NFS RPCs.

node_nfsd_connections_total

The total number of connections to the NFS server.

node_nfsd_disk_bytes_read_total

The total number of bytes read from the disk by the NFS server.

node_nfsd_disk_bytes_written_total

The total number of bytes written to the disk by the NFS server.

node_nfsd_file_handles_stale_total

The total number of stale file handles on the NFS server.

node_nfsd_packets_total

The total number of packets processed by the NFS server.

node_nfsd_read_ahead_cache_not_found_total

The total number of times the read-ahead cache of the NFS server was not found.

node_nfsd_read_ahead_cache_size_blocks

The size of blocks in the read-ahead cache of the NFS server.

node_nfsd_reply_cache_hits_total

The total number of hits in the NFS server reply cache.

node_nfsd_reply_cache_misses_total

The total number of misses in the NFS server reply cache.

node_nfsd_reply_cache_nocache_total

The total number of no-cache situations in the NFS server reply cache.

node_nfsd_requests_total

The total number of requests to the NFS server.

node_nfsd_rpc_errors_total

The total number of RPC errors on the NFS server.

node_nfsd_server_rpcs_total

The total number of RPCs processed by the NFS server.

node_nfsd_server_threads

The number of threads on the NFS server.

node_nvme_info

The information about Non-Volatile Memory Express (NVMe).

node_os_info

The information about the operating system.

node_os_version

The version of the operating system.

node_pressure_cpu_waiting_seconds_total

The total seconds the CPU has spent waiting under pressure.

node_pressure_io_stalled_seconds_total

The total seconds the I/O has been stalled under pressure.

node_pressure_io_waiting_seconds_total

The total seconds the I/O has spent waiting under pressure.

node_pressure_memory_stalled_seconds_total

The total seconds memory has been stalled under pressure.

node_pressure_memory_waiting_seconds_total

The total seconds memory has spent waiting under pressure.

node_processes_max_processes

The maximum number of processes.

node_processes_max_threads

The maximum number of threads.

node_processes_pids

The number of process IDs.

node_processes_state

The distribution of process states.

node_processes_threads

The number of threads.

node_procs_blocked

The number of blocked processes.

node_procs_running

The number of running processes.

node_schedstat_running_seconds_total

The total seconds run in scheduling statistics.

node_schedstat_timeslices_total

The total number of time slices in scheduling statistics.

node_schedstat_waiting_seconds_total

The total seconds waited in scheduling statistics.

node_scrape_collector_duration_seconds

The duration of the scrape collector in seconds.

node_scrape_collector_success

The number of successful scrapes by the collector.

node_selinux_enabled

Indicates whether Security-Enhanced Linux (SELinux) is enabled.

node_sockstat_FRAG_inuse

The number of FRAG sockets in use.

node_sockstat_FRAG_memory

The amount of memory occupied by FRAG sockets.

node_sockstat_FRAG6_inuse

The number of FRAG6 sockets in use.

node_sockstat_FRAG6_memory

The amount of memory occupied by FRAG6 sockets.

node_sockstat_RAW_inuse

The number of RAW sockets in use.

node_sockstat_RAW6_inuse

The number of RAW6 sockets in use.

node_sockstat_sockets_used

The total number of sockets in use.

node_sockstat_TCP_alloc

The number of TCP sockets allocated.

node_sockstat_TCP_inuse

The number of TCP sockets in use.

node_sockstat_TCP_mem

The amount of memory used by TCP sockets.

node_sockstat_TCP_mem_bytes

The number of bytes of memory used by TCP sockets.

node_sockstat_TCP_orphan

The number of orphaned TCP sockets.

node_sockstat_TCP_tw

The number of TCP sockets in the TIME_WAIT state.

node_sockstat_TCP6_inuse

The number of TCP6 sockets in use.

node_sockstat_UDP_inuse

The number of UDP sockets in use.

node_sockstat_UDP_mem

The amount of memory used by UDP sockets.

node_sockstat_UDP_mem_bytes

The number of bytes of memory used by UDP sockets.

node_sockstat_UDP6_inuse

The number of IPv6 UDP sockets in use.

node_sockstat_UDPLITE_inuse

The number of UDP-Lite sockets in use.

node_sockstat_UDPLITE6_inuse

The number of UDP-Lite6 sockets in use.

node_softnet_backlog_len

The length of the soft interrupt queue.

node_softnet_cpu_collision_total

The total number of CPU collisions in soft interrupts.

node_softnet_dropped_total

The total number of soft interrupts dropped.

node_softnet_flow_limit_count_total

The total number of flow limit counts in soft interrupts.

node_softnet_processed_total

The total number of soft interrupts processed.

node_softnet_received_rps_total

The total receive rate per second of soft interrupts.

node_softnet_times_squeezed_total

The total number of times soft interrupts were squeezed.

node_textfile_scrape_error

The number of text file scrape errors.

node_thermal_zone_temp

The temperature of the thermal zone.

node_time_clocksource_available_info

The available clock source information.

node_time_clocksource_current_info

The information about the current clock source.

node_time_seconds

The number of seconds since the system started.

node_time_zone_offset_seconds

The time zone offset in seconds.

node_timex_estimated_error_seconds

The estimated time error in seconds.

node_timex_frequency_adjustment_ratio

The frequency adjustment ratio of the system clock.

node_timex_loop_time_constant

The time adjustment loop constant.

node_timex_maxerror_seconds

The maximum error in seconds.

node_timex_offset_seconds

The time offset in seconds.

node_timex_pps_calibration_total

The total number of pulse per second (PPS) calibrations.

node_timex_pps_error_total

The total number of PPS errors.

node_timex_pps_frequency_hertz

The PPS frequency in Hz.

node_timex_pps_jitter_seconds

The PPS jitter in seconds.

node_timex_pps_jitter_total

The cumulative PPS jitter.

node_timex_pps_shift_seconds

The PPS offset in seconds.

node_timex_pps_stability_exceeded_total

The number of times PPS stability exceeded limits.

node_timex_pps_stability_hertz

The PPS stability frequency in hertz.

node_timex_status

The status of clock time adjustments.

node_timex_sync_status

The synchronization status of the clock.

node_timex_tai_offset_seconds

The International Atomic Time (TAI) offset in seconds.

node_timex_tick_seconds

The tick interval of the clock in seconds.

node_udp_queues

The statistics of UDP queues.

node_uname_info

The system information (uname).

node_vmstat_oom_kill

The number of out-of-memory (OOM) kills in VM statistics.

node_vmstat_pgfault

The number of page faults in VM statistics.

node_vmstat_pgmajfault

The number of major page faults in VM statistics.

node_vmstat_pgpgin

The number of page ins in VM statistics.

node_vmstat_pgpgout

The number of page outs in VM statistics.

node_vmstat_pswpin

The number of swap page ins in VM statistics.

node_vmstat_pswpout

The number of swap page outs in VM statistics.

node_xfs_allocation_btree_compares_total

The total number of B-tree comparisons for XFS allocation.

node_xfs_allocation_btree_lookups_total

The total number of B-tree lookups for XFS allocation.

node_xfs_allocation_btree_records_deleted_total

The total number of B-tree records deleted for XFS allocation.

node_xfs_allocation_btree_records_inserted_total

The total number of B-tree records inserted for XFS allocation.

node_xfs_block_map_btree_compares_total

The total number of B-tree comparisons for XFS block mapping.

node_xfs_block_map_btree_lookups_total

The total number of B-tree lookups for XFS block mapping.

node_xfs_block_map_btree_records_deleted_total

The total number of B-tree records deleted for XFS block mapping.

node_xfs_block_map_btree_records_inserted_total

The total number of B-tree records inserted for XFS block mapping.

node_xfs_block_mapping_extent_list_compares_total

The total number of extent list comparisons for XFS block mapping.

node_xfs_block_mapping_extent_list_deletions_total

The total number of extent list deletions for XFS block mapping.

node_xfs_block_mapping_extent_list_insertions_total

The number of extent list insertions for a file system.

node_xfs_block_mapping_extent_list_lookups_total

The total number of extent list lookups for XFS block mapping.

node_xfs_block_mapping_reads_total

The total number of reads for XFS block mapping.

node_xfs_block_mapping_unmaps_total

The total number of unmappings for XFS block mapping.

node_xfs_block_mapping_writes_total

The total number of writes for XFS block mapping.

node_xfs_directory_operation_create_total

The total number of directory creation operations in XFS.

node_xfs_directory_operation_getdents_total

The total number of directory entry retrieval operations in XFS.

node_xfs_directory_operation_lookup_total

The total number of directory lookup operations in XFS.

node_xfs_directory_operation_remove_total

The total number of directory removal operations in XFS.

node_xfs_extent_allocation_blocks_allocated_total

The total number of blocks allocated in XFS.

node_xfs_extent_allocation_blocks_freed_total

The total number of blocks freed in XFS.

node_xfs_extent_allocation_extents_allocated_total

The total number of extents allocated in XFS.

node_xfs_extent_allocation_extents_freed_total

The total number of extents freed in XFS.

node_xfs_inode_operation_attempts_total

The total number of attempts at inode operations in XFS.

node_xfs_inode_operation_attribute_changes_total

The total number of attribute change operations on inodes in XFS.

node_xfs_inode_operation_duplicates_total

The total number of duplicate operations on inodes in XFS.

node_xfs_inode_operation_found_total

The total number of hits in inode operations in XFS.

node_xfs_inode_operation_missed_total

The total number of misses in inode operations in XFS.

node_xfs_inode_operation_reclaims_total

The total number of reclaim operations on inodes in XFS.

node_xfs_inode_operation_recycled_total

The total number of reuse operations on inodes in XFS.

node_xfs_read_calls_total

The total number of read calls in XFS.

node_xfs_vnode_active_total

The total number of active vnodes in XFS.

node_xfs_vnode_allocate_total

The total number of vnode allocations in XFS.

node_xfs_vnode_get_total

The total number of vnode retrievals in XFS.

node_xfs_vnode_hold_total

The total number of vnodes held in XFS.

node_xfs_vnode_reclaim_total

The total number of vnodes reclaimed in XFS.

node_xfs_vnode_release_total

The total number of vnodes released in XFS.

node_xfs_vnode_remove_total

The total number of vnodes removed in XFS.

node_xfs_write_calls_total

The total number of write calls in XFS.

process_cpu_seconds_total

The total process CPU seconds.

process_max_fds

The maximum number of file descriptors for the process.

process_open_fds

The number of file descriptors opened by the process.

process_resident_memory_bytes

The resident memory size of the process in bytes.

process_start_time_seconds

The process startup duration in seconds.

process_virtual_memory_bytes

The number of virtual memory bytes for the process.

process_virtual_memory_max_bytes

The maximum number of virtual memory bytes for the process.

promhttp_metric_handler_errors_total

The total number of errors from the Prometheus HTTP metric handler.

promhttp_metric_handler_requests_in_flight

The current number of requests being handled by the Prometheus HTTP metric handler.

promhttp_metric_handler_requests_total

The total number of requests handled by the Prometheus HTTP metric handler.

scrape_duration_seconds

The scrape duration in seconds.

scrape_samples_post_metric_relabeling

The number of scraped samples after metric relabeling.

scrape_samples_scraped

The number of scraped samples.

scrape_series_added

The number of new series added during the scrape.

up

The connectivity of metric collection.

kube-state-metrics (job name: _kube-state-metrics)

Metric

Description

kube_configmap_info

The information about the ConfigMap.

kube_cronjob_annotations

The annotations of the Kubernetes CronJob.

kube_cronjob_created

The creation time of the Kubernetes CronJob.

kube_cronjob_info

The information about the Kubernetes CronJob.

kube_cronjob_labels

The labels of the Kubernetes CronJob.

kube_cronjob_metadata_resource_version

The metadata resource version of the Kubernetes CronJob.

kube_cronjob_next_schedule_time

The next schedule time of the Kubernetes CronJob.

kube_cronjob_spec_failed_job_history_limit

The failed job history limit of the Kubernetes CronJob.

kube_cronjob_spec_starting_deadline_seconds

The starting deadline seconds of the Kubernetes CronJob.

kube_cronjob_spec_successful_job_history_limit

The successful job history limit of the Kubernetes CronJob.

kube_cronjob_spec_suspend

The suspend status of the Kubernetes CronJob.

kube_cronjob_status_active

The number of active jobs of the Kubernetes CronJob.

kube_cronjob_status_last_schedule_time

The last schedule time of the Kubernetes CronJob.

kube_cronjob_status_last_successful_time

The last successful execution time of the Kubernetes CronJob.

kube_daemonset_created

The creation time of the Kubernetes DaemonSet.

kube_daemonset_status_current_number_scheduled

The current number of scheduled nodes for the Kubernetes DaemonSet.

kube_daemonset_status_desired_number_scheduled

The desired number of scheduled nodes for the Kubernetes DaemonSet

kube_daemonset_status_number_available

The number of available nodes in the Kubernetes DaemonSet.

kube_daemonset_status_number_misscheduled

The number of missed scheduled nodes in the Kubernetes DaemonSet.

kube_daemonset_status_number_ready

The number of ready nodes in the Kubernetes DaemonSet.

kube_daemonset_status_number_unavailable

The number of unavailable nodes in the Kubernetes DaemonSet.

kube_daemonset_status_updated_number_scheduled

The number of updated scheduled nodes in the Kubernetes DaemonSet

kube_daemonset_updated_number_scheduled

The number of updated scheduled nodes in the Kubernetes DaemonSet

kube_deployment_created

The creation time of the Kubernetes Deployment.

kube_deployment_labels

The labels of the Kubernetes Deployment.

kube_deployment_metadata_generation

The metadata generation of the Kubernetes Deployment.

kube_deployment_spec_replicas

The number of replicas specified in the Kubernetes Deployment.

kube_deployment_spec_strategy_rollingupdate_max_unavailable

The maximum number of unavailable pods during rolling update of the Kubernetes Deployment.

kube_deployment_status_observed_generation

The observed generation of the Kubernetes Deployment.

kube_deployment_status_replicas

The total number of replicas in the Kubernetes Deployment.

kube_deployment_status_replicas_available

The number of available replicas in the Kubernetes Deployment.

kube_deployment_status_replicas_ready

The number of ready replicas in the Kubernetes Deployment.

kube_deployment_status_replicas_unavailable

The number of unavailable replicas in the Kubernetes Deployment.

kube_deployment_status_replicas_updated

The number of updated replicas in the Kubernetes Deployment.

kube_horizontalpodautoscaler_info

The information about the Kubernetes HorizontalPodAutoscaler.

kube_horizontalpodautoscaler_labels

The labels of the Kubernetes HorizontalPodAutoscaler.

kube_horizontalpodautoscaler_metadata_generation

The metadata generation of the Kubernetes HorizontalPodAutoscaler.

kube_horizontalpodautoscaler_spec_max_replicas

The maximum number of replicas specified in the Kubernetes HorizontalPodAutoscaler.

kube_horizontalpodautoscaler_spec_min_replicas

The minimum number of replicas specified in the Kubernetes HorizontalPodAutoscaler.

kube_horizontalpodautoscaler_spec_target_metric

The target metrics of the Kubernetes HorizontalPodAutoscaler.

kube_horizontalpodautoscaler_status_condition

The status conditions of the Kubernetes HorizontalPodAutoscaler.

kube_horizontalpodautoscaler_status_current_replicas

The current number of replicas in the Kubernetes HorizontalPodAutoscaler.

kube_horizontalpodautoscaler_status_desired_replicas

The desired number of replicas in the Kubernetes HorizontalPodAutoscaler.

kube_hpa_labels

The labels of the Kubernetes HorizontalPodAutoscaler.

kube_hpa_metadata_generation

The metadata generation of the Kubernetes HorizontalPodAutoscaler.

kube_hpa_spec_max_replicas

The maximum number of replicas specified in the Kubernetes HorizontalPodAutoscaler.

kube_hpa_spec_min_replicas

The minimum number of replicas specified in the Kubernetes HorizontalPodAutoscaler.

kube_hpa_spec_target_metric

The target metrics of the Kubernetes HorizontalPodAutoscaler.

kube_hpa_status_condition

The status conditions of the Kubernetes HorizontalPodAutoscaler.

kube_hpa_status_current_replicas

The current number of replicas in the Kubernetes HorizontalPodAutoscaler.

kube_hpa_status_desired_replicas

The desired number of replicas in the Kubernetes HorizontalPodAutoscaler.

kube_ingress_info

The information about the Ingress.

kube_job_created

The information about the Ingress

kube_job_failed

The total number of failures for the job.

kube_job_info

The information about the Job.

kube_job_spec_completions

The number of completed jobs.

kube_job_status_active

The number of active jobs.

kube_job_status_failed

The number of failed jobs.

kube_job_status_succeeded

The number of successful jobs.

kube_namespace_created

The creation time of the namespace.

kube_namespace_labels

The labels of the namespace.

kube_namespace_status_phase

The phase of the namespace status.

kube_node_info

The information about the node.

kube_node_labels

The labels of the node.

kube_node_spec_taint

The taint configurations of the node.

kube_node_spec_unschedulable

The unschedulable flag of the node.

kube_node_status_allocatable

The allocatable resources of the node.

kube_node_status_allocatable_cpu_cores

The allocatable CPU cores of the node.

kube_node_status_allocatable_memory_bytes

The allocatable memory bytes of the node.

kube_node_status_allocatable_pods

The allocatable number of Pods on the node.

kube_node_status_capacity

The capacity of the node.

kube_node_status_capacity_cpu_cores

The capacity CPU cores of the node.

kube_node_status_capacity_memory_bytes

The capacity memory bytes of the node.

kube_node_status_capacity_pods

The capacity number of Pods on the node.

kube_node_status_condition

The status conditions of the node.

kube_persistentvolume_status_phase

The phase of the PersistentVolume (PV) status.

kube_persistentvolumeclaim_info

The information about the PersistentVolumeClaim (PVC).

kube_persistentvolumeclaim_resource_requests_storage_bytes

The storage resource request of the PVC.

kube_persistentvolumeclaim_status_phase

The phase of the PVC status.

kube_pod_completion_time

The completion time of the Pod.

kube_pod_container_info

The information about the Pod container.

kube_pod_container_resource_limits

The resource limit of the Pod container.

kube_pod_container_resource_limits_cpu_cores

The CPU core limit of the Pod container.

kube_pod_container_resource_limits_memory_bytes

The memory byte limit of the Pod container.

kube_pod_container_resource_requests

The resource requests of the Pod container.

kube_pod_container_resource_requests_cpu_cores

The CPU core requests of the Pod container

kube_pod_container_resource_requests_memory_bytes

The memory byte requests of the Pod container

kube_pod_container_status_last_terminated_reason

The last termination reason of the Pod container.

kube_pod_container_status_ready

The ready status of the Pod container.

kube_pod_container_status_restarts_total

The total number of restarts for the Pod container.

kube_pod_container_status_running

The running status of the Pod container.

kube_pod_container_status_terminated

The terminated status of the Pod container.

kube_pod_container_status_terminated_reason

The termination reason of the Pod container.

kube_pod_container_status_waiting

The waiting status of the Pod container.

kube_pod_container_status_waiting_reason

The waiting reason of the Pod container.

kube_pod_created

The creation time of the Pod.

kube_pod_deletion_timestamp

The deletion timestamp of the Pod.

kube_pod_info

The information about the Pod.

kube_pod_labels

The labels of the Pod.

kube_pod_owner

The owner of the Pod.

kube_pod_start_time

The start time of the Pod.

kube_pod_status_container_ready_time

The container ready time of the Pod status.

kube_pod_status_initialized_time

The initialization completion time of the Pod status.

kube_pod_status_phase

The phase of the Pod status.

kube_pod_status_ready

The ready status of the Pod.

kube_pod_status_ready_time

The ready time of the Pod.

kube_pod_status_reason

The reason for the Pod status.

kube_pod_status_scheduled_time

The scheduling time of the Pod.

kube_pod_status_unschedulable

The unschedulable flag of the Pod.

kube_replicaset_owner

The owner of the ReplicaSet.

kube_replicaset_status_ready_replicas

The number of ready replicas in the ReplicaSet.

kube_resource_relationship

The relationships between resources.

kube_resourcequota

The resource quota.

kube_resourcequota_created

The creation time of the resource quota.

kube_secret_info

The information about the secret.

kube_service_info

The information about the service.

kube_service_spec_type

The type specification of the service.

kube_service_status_load_balancer_ingress

The load balancer ingress information of the service status.

kube_statefulset_created

The creation time of the StatefulSet.

kube_statefulset_metadata_generation

The metadata generation of the StatefulSet.

kube_statefulset_replicas

The number of replicas in the StatefulSet.

kube_statefulset_status_replicas

The number of replicas in the state of the StatefulSet.

kube_statefulset_status_replicas_available

The number of available replicas in the state of the StatefulSet.

kube_statefulset_status_replicas_ready

The number of ready replicas in the state of the StatefulSet.

kube_statefulset_status_replicas_updated

The number of updated replicas in the state of the StatefulSet.

process_cpu_seconds_total

The total number of CPU seconds used by the process.

process_resident_memory_bytes

The resident memory size of the process in bytes.

rest_client_requests_total

The number of REST client requests.

up

The connectivity of metric collection.

workqueue_adds_total

The total number of additions to the work queue.

workqueue_depth

The work queue depth.

workqueue_queue_duration_seconds_bucket

The distribution of queue duration in seconds for the work queue.

kube-events (job name: _arms/kube-event)

Metric

Description

aliyun_prometheus_agent_append_duration_seconds

The duration of the Prometheus agent append operations in seconds.

aliyun_prometheus_agent_job_discovery_status

The discovery status of the Prometheus agent collection jobs.

aliyun_prometheus_agent_scrape_custom_error

The number of custom collection errors of the Prometheus agent.

aliyun_prometheus_agent_scrapes_by_target_total

The total number of scrapes by the Prometheus agent per target.

aliyun_prometheus_agent_target_info

The target information of the Prometheus agent.

eventer_events_error_total

The total number of event processing errors.

eventer_events_normal_total

The total number of normal events.

eventer_events_warning_total

The total number of warning events.

eventer_exporter_duration_milliseconds_count

The count of samples for exporter duration in milliseconds.

eventer_exporter_duration_milliseconds_sum

The sum of exporter duration in milliseconds.

eventer_manager_last_time_seconds

The last operation time of the event manager in seconds.

eventer_scraper_duration_milliseconds_count

The count of scraper duration in milliseconds.

eventer_scraper_duration_milliseconds_sum

The sum of scraper duration in milliseconds.

eventer_scraper_events_total_number

The total number of events scraped.

eventer_scraper_last_time_seconds

The last execution time of the scraper in seconds.

go_gc_duration_seconds

The Go GC pause duration in seconds.

go_gc_duration_seconds_count

The Go GC pause duration in seconds.

go_gc_duration_seconds_sum

The total Go GC pause duration in seconds.

go_goroutines

The number of goroutines.

go_info

The Go-specific information.

go_memstats_alloc_bytes

The amount of memory allocated in bytes.

go_memstats_alloc_bytes_total

The cumulative amount of memory allocated in bytes.

go_memstats_buck_hash_sys_bytes

The amount of memory used by hash tables in the operating system in bytes.

go_memstats_frees_total

The total number of releases.

go_memstats_gc_cpu_fraction

The GC CPU utilization (%).

go_memstats_gc_sys_bytes

The amount of memory used by GC in the operating system in bytes.

go_memstats_heap_alloc_bytes

The amount of heap memory allocated in bytes.

go_memstats_heap_idle_bytes

The amount of idle heap memory in bytes.

go_memstats_heap_inuse_bytes

The amount of heap memory in use in bytes.

go_memstats_heap_objects

The number of objects allocated on the heap.

go_memstats_heap_released_bytes

The amount of heap memory released in bytes.

go_memstats_heap_sys_bytes

The amount of memory allocated to the heap by the operating system in bytes.

go_memstats_last_gc_time_seconds

The last GC duration in seconds.

go_memstats_lookups_total

The total number of lookups.

go_memstats_mallocs_total

The total number of allocations.

go_memstats_mcache_inuse_bytes

The amount of memory in use in mcache in bytes.

go_memstats_mcache_sys_bytes

The amount of memory allocated to mcache by the operating system in bytes.

go_memstats_mspan_inuse_bytes

The amount of memory in use in mspan in bytes.

go_memstats_mspan_sys_bytes

The amount of memory allocated to mspan by the operating system in bytes.

go_memstats_next_gc_bytes

The number of bytes to be released at the next GC in bytes.

go_memstats_other_sys_bytes

The amount of memory allocated for other purposes by the operating system in bytes.

go_memstats_stack_inuse_bytes

The amount of stack memory in use in bytes.

go_memstats_stack_sys_bytes

The amount of memory allocated to the stack by the operating system in bytes.

go_memstats_sys_bytes

The total memory allocated by the operating system in bytes.

go_threads

The number of threads.

process_cpu_seconds_total

The total process CPU seconds.

process_max_fds

The maximum number of file descriptors for the process.

process_open_fds

The number of file descriptors opened by the process.

process_resident_memory_bytes

The resident memory size of the process in bytes.

process_start_time_seconds

The process startup duration in seconds.

process_virtual_memory_bytes

The number of virtual memory bytes for the process.

process_virtual_memory_max_bytes

The maximum number of virtual memory bytes for the process.

promhttp_metric_handler_requests_in_flight

The current number of requests being handled by the Prometheus HTTP metric handler.

promhttp_metric_handler_requests_total

The total number of requests handled by the Prometheus HTTP metric handler.

scrape_duration_seconds

The scrape duration in seconds.

scrape_samples_post_metric_relabeling

The number of scraped samples after metric relabeling.

scrape_samples_scraped

The number of scraped samples.

scrape_series_added

The number of new series added during the scrape.

up

The connectivity of metric collection.

CoreDNS (job name: arms-ack-coredns)

Metric

Description

aliyun_prometheus_agent_append_duration_seconds

The duration of the Prometheus agent append operations in seconds.

aliyun_prometheus_agent_job_discovery_status

The discovery status of the Prometheus agent collection jobs.

aliyun_prometheus_agent_scrape_custom_error

The number of custom collection errors of the Prometheus agent.

aliyun_prometheus_agent_scrapes_by_target_total

The total number of scrapes by the Prometheus agent per target.

aliyun_prometheus_agent_target_info

The target information of the Prometheus agent.

coredns_autopath_success_count_total

The total number of successful automatic path resolutions in CoreDNS.

coredns_autopath_success_total

The total number of successful automatic path resolutions in CoreDNS.

coredns_build_info

The build information of CoreDNS.

coredns_cache_drops_total

The total number of cache drops in CoreDNS.

coredns_cache_entries

The number of cache entries in CoreDNS.

coredns_cache_evictions_total

The total number of cache evictions in CoreDNS.

coredns_cache_hits_total

The total number of cache hits in CoreDNS.

coredns_cache_misses_total

The total number of cache misses in CoreDNS.

coredns_cache_requests_total

The total number of cache requests in CoreDNS.

coredns_cache_size

The size of the cache in CoreDNS.

coredns_dns_do_requests_total

The total number of DNS DO requests in CoreDNS.

coredns_dns_request_count_total

The total count of DNS requests in CoreDNS.

coredns_dns_request_duration_seconds_bucket

The percentile of DNS request durations in seconds in CoreDNS.

coredns_dns_request_duration_seconds_count

The count of DNS request durations in seconds in CoreDNS.

coredns_dns_request_duration_seconds_sum

The sum of DNS request durations in seconds in CoreDNS.

coredns_dns_request_size_bytes_bucket

The percentile of DNS request sizes in bytes in CoreDNS.

coredns_dns_request_size_bytes_count

The count of DNS request sizes in bytes in CoreDNS.

coredns_dns_request_size_bytes_sum

The sum of DNS request sizes in bytes in CoreDNS.

coredns_dns_request_type_count_total

The total count of DNS request types in CoreDNS.

coredns_dns_requests_total

The total number of DNS requests in CoreDNS.

coredns_dns_response_rcode_count_total

The total count of DNS response codes in CoreDNS.

coredns_dns_response_size_bytes_bucket

The percentile of DNS response sizes in bytes in CoreDNS.

coredns_dns_response_size_bytes_count

The count of DNS response sizes in bytes in CoreDNS.

coredns_dns_response_size_bytes_sum

The sum of DNS response sizes in bytes in CoreDNS.

coredns_dns_responses_total

The total number of DNS responses in CoreDNS.

coredns_forward_conn_cache_hits_total

The total number of cache hits for forwarded connections in CoreDNS.

coredns_forward_conn_cache_misses_total

The total number of cache misses for forwarded connections in CoreDNS.

coredns_forward_healthcheck_broken_total

The total number of health check failures for forwarded connections in CoreDNS.

coredns_forward_healthcheck_failure_count_total

The total count of health check failures for forwarded connections in CoreDNS.

coredns_forward_healthcheck_failures_total

The total number of health check failures for forwarded connections in CoreDNS.

coredns_forward_max_concurrent_rejects_total

The total number of maximum concurrent rejections for forwarded connections in CoreDNS.

coredns_forward_request_count_total

The total count of forwarded requests in CoreDNS.

coredns_forward_request_duration_seconds_bucket

The percentile of forwarded request durations in seconds in CoreDNS.

coredns_forward_request_duration_seconds_count

The count of forwarded request durations in seconds in CoreDNS.

coredns_forward_request_duration_seconds_sum

The sum of forwarded request durations in seconds in CoreDNS.

coredns_forward_requests_total

The total number of forwarded requests in CoreDNS.

coredns_forward_response_rcode_count_total

The total count of forwarded response codes in CoreDNS.

coredns_forward_responses_total

The total number of forwarded responses in CoreDNS.

coredns_forward_sockets_open

The number of open sockets for forwarded connections in CoreDNS.

coredns_health_request_duration_seconds_bucket

The percentile of health check request durations in seconds in CoreDNS.

coredns_health_request_duration_seconds_count

The count of health check request durations in seconds in CoreDNS.

coredns_health_request_duration_seconds_sum

The sum of health check request durations in seconds in CoreDNS.

coredns_health_request_failures_total

The total number of health check request failures in CoreDNS.

coredns_hosts_entries

The number of host entries in CoreDNS.

coredns_hosts_reload_timestamp_seconds

The timestamp of the last host reload in CoreDNS in seconds.

coredns_kubernetes_dns_programming_duration_seconds_bucket

The percentile of Kubernetes DNS programming durations in seconds in CoreDNS.

coredns_kubernetes_dns_programming_duration_seconds_count

The count of Kubernetes DNS programming durations in seconds in CoreDNS.

coredns_kubernetes_dns_programming_duration_seconds_sum

The sum of Kubernetes DNS programming durations in seconds in CoreDNS.

coredns_local_localhost_requests_total

The total number of localhost requests in CoreDNS.

coredns_panic_count_total

The total number of panics in CoreDNS.

coredns_panics_total

The total count of panics in CoreDNS.

coredns_plugin_enabled

The enabling status of CoreDNS plugins.

coredns_reload_failed_total

The total number of reload failures in CoreDNS.

coredns_reload_version_info

The version information of CoreDNS reloads.

coredns_template_matches_total

The total number of template matches in CoreDNS.

go_gc_duration_seconds

The Go GC pause duration in seconds.

go_gc_duration_seconds_count

The Go GC pause duration in seconds.

go_gc_duration_seconds_sum

The total Go GC pause duration in seconds.

go_goroutines

The number of goroutines.

go_info

The Go-specific information.

go_memstats_alloc_bytes

The amount of memory allocated in bytes.

go_memstats_alloc_bytes_total

The cumulative amount of memory allocated in bytes.

go_memstats_buck_hash_sys_bytes

The amount of memory used by hash tables in the operating system in bytes.

go_memstats_frees_total

The total number of releases.

go_memstats_gc_cpu_fraction

The GC CPU utilization (%).

go_memstats_gc_sys_bytes

The amount of memory used by GC in the operating system in bytes.

go_memstats_heap_alloc_bytes

The amount of heap memory allocated in bytes.

go_memstats_heap_idle_bytes

The amount of idle heap memory in bytes.

go_memstats_heap_inuse_bytes

The amount of heap memory in use in bytes.

go_memstats_heap_objects

The number of objects allocated on the heap.

go_memstats_heap_released_bytes

The amount of heap memory released in bytes.

go_memstats_heap_sys_bytes

The amount of memory allocated to the heap by the operating system in bytes.

go_memstats_last_gc_time_seconds

The last GC duration in seconds.

go_memstats_lookups_total

The total number of lookups.

go_memstats_mallocs_total

The total number of allocations.

go_memstats_mcache_inuse_bytes

The amount of memory in use in mcache in bytes.

go_memstats_mcache_sys_bytes

The amount of memory allocated to mcache by the operating system in bytes.

go_memstats_mspan_inuse_bytes

The amount of memory in use in mspan in bytes.

go_memstats_mspan_sys_bytes

The amount of memory allocated to mspan by the operating system in bytes.

go_memstats_next_gc_bytes

The number of bytes to be released at the next GC in bytes.

go_memstats_other_sys_bytes

The amount of memory allocated for other purposes by the operating system in bytes.

go_memstats_stack_inuse_bytes

The amount of stack memory in use in bytes.

go_memstats_stack_sys_bytes

The amount of memory allocated to the stack by the operating system in bytes.

go_memstats_sys_bytes

The total memory allocated by the operating system in bytes.

go_threads

The number of threads.

process_cpu_seconds_total

The total process CPU seconds.

process_max_fds

The maximum number of file descriptors for the process.

process_open_fds

The number of file descriptors opened by the process.

process_resident_memory_bytes

The resident memory size of the process in bytes.

process_start_time_seconds

The process startup duration in seconds.

process_virtual_memory_bytes

The number of virtual memory bytes for the process.

process_virtual_memory_max_bytes

The maximum number of virtual memory bytes for the process.

scrape_duration_seconds

The scrape duration in seconds.

scrape_samples_post_metric_relabeling

The number of scraped samples after metric relabeling.

scrape_samples_scraped

The number of scraped samples.

scrape_series_added

The number of new series added during the scrape.

up

The connectivity of metric collection.

CSI clusters (job name: k8s-csi-cluster-pv)

Metric

Description

alibaba_cloud_storage_operator_build_info

The build information about the storage operations system on Alibaba Cloud.

aliyun_prometheus_agent_append_duration_seconds

The duration of the Prometheus agent append operations in seconds.

aliyun_prometheus_agent_job_discovery_status

The discovery status of the Prometheus agent collection jobs.

aliyun_prometheus_agent_scrape_custom_error

The number of custom collection errors of the Prometheus agent.

aliyun_prometheus_agent_scrapes_by_target_total

The total number of scrapes by the Prometheus agent per target.

aliyun_prometheus_agent_target_info

The target information of the Prometheus agent.

cluster_pv_detail_num_total

The total number of detailed PV information in the cluster.

cluster_pv_status_num_total

The total number of PV states in the cluster.

cluster_pvc_detail_num_total

The total number of detailed PVC information in the cluster.

cluster_pvc_status_num_total

The total number of PVC states in the cluster.

cluster_scrape_collector_duration_seconds

The duration of the cluster scrape collector in seconds.

cluster_scrape_collector_success

The number of successful scrapes by the cluster collector.

scrape_duration_seconds

The scrape duration in seconds.

scrape_samples_post_metric_relabeling

The number of scraped samples after metric relabeling.

scrape_samples_scraped

The number of scraped samples.

scrape_series_added

The number of new series added during the scrape.

up

The connectivity of metric collection.

CSI nodes (job name: k8s-csi-node-pv)

Metric

Description

alibaba_cloud_csi_driver_build_info

The build information about the Container Storage Interface (CSI) driver.

aliyun_prometheus_agent_append_duration_seconds

The duration of the Prometheus agent append operations in seconds.

aliyun_prometheus_agent_job_discovery_status

The discovery status of the Prometheus agent collection jobs.

aliyun_prometheus_agent_scrape_custom_error

The number of custom collection errors of the Prometheus agent.

aliyun_prometheus_agent_scrapes_by_target_total

The total number of scrapes by the Prometheus agent per target.

aliyun_prometheus_agent_target_info

The target information of the Prometheus agent.

cluster_scrape_collector_duration_seconds

The duration of the cluster scrape collector in seconds.

cluster_scrape_collector_success

The number of successful scrapes by the cluster collector.

container_fs_available_bytes

The available bytes of the container file system.

container_fs_inodes_free

The number of available inodes in the container file system.

container_fs_inodes_total

The total number of inodes in the container file system.

container_fs_inodes_used

The number of used inodes in the container file system.

container_fs_limit_bytes

The limit of bytes in the container file system.

container_fs_usage_bytes

The used bytes in the container file system.

ephemeral_storage_pod_available_bytes

The available bytes of ephemeral storage Pod.

ephemeral_storage_pod_inodes_free

The available inodes of ephemeral storage Pod.

ephemeral_storage_pod_inodes_total

The total number of inodes in the ephemeral storage Pod.

ephemeral_storage_pod_inodes_used

The used inodes in the ephemeral storage Pod.

ephemeral_storage_pod_limit_bytes

The limit of bytes in the ephemeral storage Pod.

ephemeral_storage_pod_usage_bytes

The used bytes in the ephemeral storage Pod.

node_volume_backend_posix_access_total_counter

The total counter for Portable Operating System Interface (POSIX) access to the node volume backend.

node_volume_backend_posix_getattr_total_counter

The total counter for POSIX getattr calls to the node volume backend.

node_volume_backend_posix_getmode_total_counter

The total counter for POSIX getmode operations to the node volume backend.

node_volume_backend_posix_link_total_counter

The total counter for POSIX link operations to the node volume backend.

node_volume_backend_posix_lookup_total_counter

The total counter for POSIX lookup operations to the node volume backend.

node_volume_backend_posix_mknod_total_counter

The total counter for POSIX mknod operations to the node volume backend.

node_volume_backend_posix_readdir_total_counter

The total counter for POSIX readdir operations to the node volume backend.

node_volume_backend_posix_readlink_total_counter

The total counter for POSIX readlink operations to the node volume backend.

node_volume_backend_posix_remove_total_counter

The total counter for POSIX remove operations to the node volume backend.

node_volume_backend_posix_rename_total_counter

The total counter for POSIX rename operations to the node volume backend.

node_volume_backend_posix_setattr_total_counter

The total counter for POSIX setattr operations to the node volume backend.

node_volume_backend_posix_statfs_total_counter

The total counter for POSIX statfs operations to the node volume backend.

node_volume_backend_read_bytes_total_counter

The total counter for bytes read from the node volume backend.

node_volume_backend_read_completed_total_counter

The total number of completed read requests to the node volume backend.

node_volume_backend_read_time_milliseconds_total_counter

The total milliseconds spent on reads to the node volume backend.

node_volume_backend_write_bytes_total_counter

The total number of bytes written to the node volume backend.

node_volume_backend_write_completed_total_counter

The total number of completed write requests to the node volume backend.

node_volume_backend_write_time_milliseconds_total_counter

The total milliseconds spent on writes to the node volume backend.

node_volume_capacity_bytes_available

The available capacity of the node volume in bytes.

node_volume_capacity_bytes_available_counter

The available capacity of the node volume in bytes.

node_volume_capacity_bytes_total

The total capacity of the node volume in bytes.

node_volume_capacity_bytes_total_counter

The total capacity of the node volume in bytes (counter).

node_volume_capacity_bytes_used

The used capacity of the node volume in bytes.

node_volume_capacity_bytes_used_counter

The used capacity of the node volume in bytes (counter).

node_volume_hot_spot_head_file_top

The top hot spot files in the node volume.

node_volume_hot_spot_read_file_top

The top files read in the node volume hot spots.

node_volume_hot_spot_write_file_top

The top files written in the node volume hot spots.

node_volume_inode_bytes_available_counter

The counter for available inode bytes in the node volume.

node_volume_inode_bytes_total_counter

The counter for total inode bytes in the node volume.

node_volume_inode_bytes_used_counter

The counter for used inode bytes in the node volume.

node_volume_inodes_available

The number of available inodes in the node volume.

node_volume_inodes_total

The total number of inodes in the node volume.

node_volume_inodes_used

The number of used inodes in the node volume.

node_volume_io_now

The current I/O count in the node volume.

node_volume_io_time_seconds_total

The total seconds spent on I/O in the node volume.

node_volume_oss_delete_object_total_counter

The total counter for Object Storage Service (OSS) object deletions in the node volume.

node_volume_oss_get_object_total_counter

The total counter for OSS object gets in the node volume.

node_volume_oss_head_object_total_counter

The total counter for OSS object metadata in the node volume.

node_volume_oss_post_object_total_counter

The total counter for OSS object POSTs in the node volume.

node_volume_oss_put_object_total_counter

The total counter for OSS object PUTs in the node volume.

node_volume_posix_access_total_counter

The total counter for POSIX accesses in the node volume.

node_volume_posix_chmod_total_counter

The total counter for POSIX chmod operations in the node volume.

node_volume_posix_chown_total_counter

The total counter for POSIX chown operations in the node volume.

node_volume_posix_create_total_counter

The total counter for POSIX creations in the node volume.

node_volume_posix_flush_total_counter

The total counter for POSIX flushes in the node volume.

node_volume_posix_fsync_total_counter

The total counter for POSIX fsyncs in the node volume.

node_volume_posix_mkdir_total_counter

The total counter for POSIX mkdir operations in the node volume.

node_volume_posix_open_total_counter

The total counter for POSIX opens in the node volume.

node_volume_posix_opendir_total_counter

The total counter for POSIX opendir operations in the node volume.

node_volume_posix_read_total_counter

The total counter for POSIX reads in the node volume.

node_volume_posix_readdir_total_counter

The total counter for POSIX readdir operations in the node volume.

node_volume_posix_release_total_counter

The total counter for POSIX releases in the node volume.

node_volume_posix_rename_total_counter

The total counter for POSIX renames in the node volume.

node_volume_posix_rmdir_total_counter

The total counter for POSIX rmdir operations in the node volume.

node_volume_posix_truncate_total_counter

The total counter for POSIX truncate operations in the node volume.

node_volume_posix_write_total_counter

The total counter for POSIX writes in the node volume.

node_volume_read_bytes_total

The total number of bytes read from the node volume.

node_volume_read_bytes_total_counter

The total number of bytes read from the node volume (counter).

node_volume_read_completed_total

The total number of completed read requests to the node volume.

node_volume_read_completed_total_counter

The total number of completed read requests to the node volume (counter).

node_volume_read_merged_total

The total number of merged read operations in the node volume.

node_volume_read_queue_time_milliseconds_total

The total milliseconds spent on read queue in the node volume.

node_volume_read_rtt_time_milliseconds_total

The total milliseconds spent on read round-trip time in the node volume.

node_volume_read_sent_bytes_total

The total number of bytes sent during reads in the node volume.

node_volume_read_time_milliseconds_total

The total milliseconds spent on reads in the node volume.

node_volume_read_time_milliseconds_total_counter

The total milliseconds spent on reads in the node volume (counter).

node_volume_read_timeouts_total

The total number of read timeouts in the node volume.

node_volume_read_transmissions_total

The total number of read transmissions in the node volume.

node_volume_vg_free_bytes

The free bytes in the volume group (VG) of the node volume.

node_volume_vg_size_bytes

The total bytes in the VG of the node volume.

node_volume_write_bytes_total

The total number of bytes written to the node volume.

node_volume_write_bytes_total_counter

The total number of bytes written to the node volume (counter).

node_volume_write_completed_total

The total number of completed write requests to the node volume.

node_volume_write_completed_total_counter

The total number of completed write requests to the node volume (counter).

node_volume_write_merged_total

The total number of merged write operations in the node volume.

node_volume_write_queue_time_milliseconds_total

The total milliseconds spent on write queue in the node volume.

node_volume_write_recv_bytes_total

The total number of bytes received during writes in the node volume.

node_volume_write_rtt_time_milliseconds_total

The total milliseconds spent on write round-trip time in the node volume.

node_volume_write_time_milliseconds_total

The total milliseconds spent on writes in the node volume.

node_volume_write_time_milliseconds_total_counter

The total milliseconds spent on writes in the node volume (counter).

node_volume_write_timeouts_total

The total number of write timeouts in the node volume.

node_volume_write_transmissions_total

The total number of write transmissions in the node volume.

scrape_duration_seconds

The scrape duration in seconds.

scrape_samples_post_metric_relabeling

The number of scraped samples after metric relabeling.

scrape_samples_scraped

The number of scraped samples.

scrape_series_added

The number of new series added during the scrape.

up

The connectivity of metric collection.

GPU-Exporter (job name: gpu-exporter)

Metric

Description

DCGM_CUSTOM_ALLOCATE_MODE

The mode in which the node runs. A value of 0 indicates that no GPU Pods are running on the node. A value of 1 indicates that the GPU Pods on the current node run in an exclusive GPU mode. A value of 2 indicates that the GPU Pods on the current node run in a shared GPU mode.

DCGM_CUSTOM_CONTAINER_CP_ALLOCATED

The ratio of the GPU computing power allocated to the container to the total computing power of the GPU. The value ranges from 0 to 1. In exclusive GPU mode or in shared GPU mode in which the container requests only GPU memory, the value of this metric is 0, which indicates that the allocation of GPU computing power is unlimited. For example, if a GPU provides a total of 100 compute units (CUs) of GPU computing power and allocates 30 CUs to a container, the ratio of the GPU computing power allocated to the container is calculated by using the following formula: 30/100 = 0.3.

DCGM_CUSTOM_CONTAINER_MEM_ALLOCATED

The amount of GPU memory allocated to the container.

DCGM_CUSTOM_DEV_FB_ALLOCATED

The ratio of the allocated GPU memory to the total memory of the GPU. The value ranges from 0 to 1.

DCGM_CUSTOM_DEV_FB_TOTAL

The total memory of the GPU.

DCGM_CUSTOM_ILLEGAL_PROCESS_DECODE_UTIL

The illegal process decode utilization.

DCGM_CUSTOM_ILLEGAL_PROCESS_ENCODE_UTIL

The illegal process encode utilization.

DCGM_CUSTOM_ILLEGAL_PROCESS_MEM_COPY_UTIL

The memory copy utilization of illegal processes.

DCGM_CUSTOM_ILLEGAL_PROCESS_MEM_USED

The memory used by illegal processes.

DCGM_CUSTOM_ILLEGAL_PROCESS_SM_UTIL

The SM utilization of illegal processes.

DCGM_CUSTOM_PROCESS_DECODE_UTIL

The decoder utilization of GPU threads.

DCGM_CUSTOM_PROCESS_ENCODE_UTIL

The encoder utilization of GPU threads.

DCGM_CUSTOM_PROCESS_MEM_COPY_UTIL

The memory copy utilization of GPU threads.

DCGM_CUSTOM_PROCESS_MEM_USED

The amount of GPU memory used by GPU threads.

DCGM_CUSTOM_PROCESS_SM_UTIL

The SM utilization of GPU threads.

DCGM_FI_DEV_APP_MEM_CLOCK

The memory application clock speed.

DCGM_FI_DEV_APP_SM_CLOCK

The SM application clock speed.

DCGM_FI_DEV_BAR1_FREE

The remaining Base Address Register 1 (BAR1).

DCGM_FI_DEV_BAR1_TOTAL

The total size of device BAR1.

DCGM_FI_DEV_BAR1_USED

The used BAR1.

DCGM_FI_DEV_BOARD_LIMIT_VIOLATION

The time of the violation due to board limitations.

DCGM_FI_DEV_CLOCK_THROTTLE_REASONS

The reasons for clock throttling.

DCGM_FI_DEV_COUNT

The number of devices.

DCGM_FI_DEV_DEC_UTIL

The decoder utilization.

DCGM_FI_DEV_ENC_UTIL

The encoder utilization.

DCGM_FI_DEV_FB_FREE

The amount of free frame buffer memory.

DCGM_FI_DEV_FB_USED

The amount of used frame buffer memory. The value of this metric is the same as the value of Memory-Usage returned by the nvidia-smi command.

DCGM_FI_DEV_GPU_TEMP

The GPU temperature.

DCGM_FI_DEV_GPU_UTIL

The GPU utilization within a cycle of 1 second or 1/6 second. The cycle varies based on the GPU model. A cycle is a period of time during which one or more kernel functions remain active. This metric only indicates that one or more kernel functions are occupying GPU resources. The metric does not display detailed GPU usage information.

DCGM_FI_DEV_LOW_UTIL_VIOLATION

The time of the violation due to low utilization.

DCGM_FI_DEV_MEM_CLOCK

The memory clock speed.

DCGM_FI_DEV_MEM_COPY_UTIL

The memory bandwidth utilization. For example, the maximum memory bandwidth of NVIDIA V100 is 900 GB/s. If the memory bandwidth used is 450 GB/s, the memory bandwidth utilization is 50%.

DCGM_FI_DEV_MEMORY_TEMP

The memory temperature.

DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL

The total NVLink bandwidth.

DCGM_FI_DEV_PCIE_REPLAY_COUNTER

The PCIe replay counter.

DCGM_FI_DEV_POWER_USAGE

The power usage.

DCGM_FI_DEV_POWER_VIOLATION

The time of the violation due to power limitations.

DCGM_FI_DEV_PSTATE

The status of the device power.

DCGM_FI_DEV_RELIABILITY_VIOLATION

The time of the violation due to board reliability.

DCGM_FI_DEV_RETIRED_DBE

The number of pages retired due to double bit errors.

DCGM_FI_DEV_RETIRED_PENDING

The number of pages to be retired. These pages are marked as unavailable due to errors in the GPU memory.

DCGM_FI_DEV_RETIRED_SBE

The number of pages retired due to single bit errors.

DCGM_FI_DEV_SM_CLOCK

The SM clock speed.

DCGM_FI_DEV_SYNC_BOOST_VIOLATION

The time of the violation due to synchronous limit raising.

DCGM_FI_DEV_THERMAL_VIOLATION

The time of the violation due to thermal limitations.

DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION

The total energy consumed since the driver was last loaded.

DCGM_FI_DEV_VIDEO_CLOCK

The video clock speed.

DCGM_FI_DEV_XID_ERRORS

The last XID error that occurred within a period of time.

DCGM_FI_PROF_DRAM_ACTIVE

The cycle fraction for memory bandwidth utilization when sending data to device memory or receiving data from device memory.

The value is an average value within a time interval rather than an instantaneous value.

A larger value of this metric indicates higher device memory utilization.

If the value is 1 (100%), a DRAM command is executed every cycle within the entire interval. The peak value of the metric can reach 0.8 (80%).

If the value of this metric is 0.2 (20%), 20% of the cycles within the time interval are spent reading from or writing to device memory.

DCGM_FI_PROF_GR_ENGINE_ACTIVE

The percentage of time that the Graphics or Compute engines were active within a time interval. The value indicates the average across all Graphics and Compute engines. A Graphics or Compute engine is considered active when a Graphics or Compute context is bound to a thread and the Graphics or Compute context is in a busy state.

DCGM_FI_PROF_NVLINK_RX_BYTES

The TX rate of NVLink and the RX rate of NVLink. The bytes transmitted or received exclude the header.

The value is an average value within a time interval rather than an instantaneous value.

For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum NVLink Gen2 bandwidth is 25 GB/s per direction per link.

DCGM_FI_PROF_NVLINK_TX_BYTES

The total number of bytes sent through NVLink.

DCGM_FI_PROF_PCIE_RX_BYTES

The TX rate of PCle and the RX rate of PCIe. The bytes transmitted or received include both the header and payload.

The value is an average value within a time interval rather than an instantaneous value.

For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum PCIe Gen3 bandwidth is 985 MB/s per lane.

DCGM_FI_PROF_PCIE_TX_BYTES

The TX rate of PCle and the RX rate of PCIe. The bytes transmitted or received include both the header and payload.

The value is an average value within a time interval rather than an instantaneous value.

For example, if 1 GB of data is transmitted within 1 second, the TX rate is 1 GB/s regardless of whether the transmission occurs at a consistent rate or in bursts. Theoretically, the maximum PCIe Gen3 bandwidth is 985 MB/s per lane.

DCGM_FI_PROF_PIPE_FP16_ACTIVE

The fraction of cycles during which the FP16 (half-precision) pipeline was active.

The value is an average value within a time interval rather than an instantaneous value.

A higher value indicates higher utilization of the FP16 cores.

A value of 1 (100%) means that an FP16 instruction was executed every two cycles throughout the entire time interval (for example, on Volta-type cards).

If the value of this metric is 0.2 (20%), one of the following conditions may exist:

The FP16 core utilization of 20% of the SMs within the time interval is 100%.

The FP16 core utilization of all SMs within the time interval is 20%.

The FP16 core utilization of all SMs within 20% of the time interval is 100%.

Other conditions.

DCGM_FI_PROF_PIPE_FP32_ACTIVE

The fraction of cycles during which the FMA (Fused Multiply-Add) pipeline was active. The FMA operations include both FP32 (single-precision) and integer operations.

The value is an average value within a time interval rather than an instantaneous value.

A higher value indicates higher utilization of the FP32 cores.

A value of 1 (100%) means that an FP32 instruction was executed every two cycles throughout the entire time interval (for example, on Volta-type cards).

If the value of this metric is 0.2 (20%), one of the following conditions may exist:

The FP32 core utilization of 20% of the SMs within the time interval is 100%.

The FP32 core utilization of all SMs within the time interval is 20%.

The FP32 core utilization of all SMs within 20% of the time interval is 100%.

Other conditions.

DCGM_FI_PROF_PIPE_FP64_ACTIVE

The fraction of cycles during which the FP64 (double-precision) pipeline was active.

The value is an average value within a time interval rather than an instantaneous value.

A higher value indicates higher utilization of the FP64 cores.

A value of 1 (100%) means that an FP64 instruction was executed every four cycles throughout the entire time interval (for example, on Volta-type cards).

If the value of this metric is 0.2 (20%), one of the following conditions may exist:

The FP64 core utilization of 20% of the SMs within the time interval is 100%.

The FP64 core utilization of 20% of the SMs within the time interval is 100%.

The FP64 core utilization of all SMs within 20% of the time interval is 100%.

Other conditions.

DCGM_FI_PROF_PIPE_TENSOR_ACTIVE

The cycle fraction for the Tensor (HMMA/IMMA) pipe being in the Active state.

The value is an average value within a time interval rather than an instantaneous value.

A larger value of this metric indicates higher tensor core utilization.

If the value is 1 (100%), a Tensor instruction is issued every cycle within the entire interval. One instruction completes in two cycles.

If the value of this metric is 0.2 (20%), one of the following conditions may exist:

The tensor core utilization of 20% of the SMs within the time interval is 100%.

The tensor core utilization of all SMs within the time interval is 20%.

The tensor core utilization of all SMs within 20% of the time interval is 100%.

Other conditions.

DCGM_FI_PROF_SM_ACTIVE

The ratio of cycles during which at least one warp on an SM remains active. The value is an average of all SMs. The value does not vary with the number of warps included in the thread block. When a warp is scheduled and resources are allocated to the warp, the warp is considered active. In this case, the status of the warp may be Computing or not Computing; for example, it may be waiting for memory requests or in another non-Computing state. If the value of this metric drops below 0.5, the GPU utilization is low. To ensure high GPU utilization, make sure that the value is greater than 0.8. Assume that a GPU has N SMs. If all SMs in N thread blocks run a kernel function within a time interval, the value of this metric is 1 (100%). If N/5 thread blocks run a kernel function within a time interval, the value of this metric is 0.2. If N thread blocks run a kernel function during 20% of the cycle within a time interval, the value of this metric is 0.2.

DCGM_FI_PROF_SM_OCCUPANCY

The ratio of warps resident on an SM to the maximum number of warps that can reside on that SM, averaged over all SMs within a time interval. A higher occupancy does not necessarily indicate higher GPU utilization. Only in workloads where GPU memory bandwidth is the limiting factor (DCGM_FI_PROF_DRAM_ACTIVE), does a higher occupancy indicate more effective GPU utilization.

nvidia_gpu_allocated_num_devices

The number of allocated GPU devices. Warning: Will be deprecated in the future.

nvidia_gpu_memory_allocated_bytes

The full memory of GPU devices. Warning: Will be deprecated in the future, replaced by DCGM_CUSTOM_DEV_FB_allocated.

nvidia_gpu_sharing_memory

The memory allocated for GPU sharing. Warning: Will be deprecated in the future, DCGM_CUSTOM_DEV_FB_allocated.

up

The connectivity of metric collection.

Cost-Exporter (job name: alibaba-cloud-cost-exporter)

Metric

Description

deducted_by_cash_coupons

The bill discount amount for the current instance.

deducted_by_prepaid_card

The prepaid card discount amount for the current instance.

invoice_discount

The discount amount for the current instance.

list_price

The unit price for the current instance.

node_current_price

The actual price of the current node.

node_payAsYouGo_price

The pay-as-you-go price of the current node.

node_payByPeriod_price

The subscription price of the current node.

node_spot_price

The spot price of the current node.

outstanding_amount

The outstanding amount for the current instance.

payent_amount

The cash payment amount for the current instance.

pretax_amount

The payable amount for the current instance.

pretax_gross_amount

The original amount for the current instance.

usage

The resource usage for the current instance.

up

The connectivity of metric collection.

Ingress (job name: arms-ack-ingress)

Metric

Description

aliyun_prometheus_agent_append_duration_seconds

The duration of the Prometheus agent append operations in seconds.

aliyun_prometheus_agent_job_discovery_status

The discovery status of the Prometheus agent collection jobs.

aliyun_prometheus_agent_scrape_custom_error

The number of custom collection errors of the Prometheus agent.

aliyun_prometheus_agent_scrapes_by_target_total

The total number of scrapes by the Prometheus agent per target.

aliyun_prometheus_agent_target_info

The target information of the Prometheus agent.

go_cgo_go_to_c_calls_calls_total

The total number of C function calls made by cgo.

go_gc_cycles_automatic_gc_cycles_total

The total number of automatic GC cycles.

go_gc_cycles_forced_gc_cycles_total

The total number of forced GC cycles.

go_gc_cycles_total_gc_cycles_total

The total number of GC cycles.

go_gc_duration_seconds

The Go GC pause duration in seconds.

go_gc_duration_seconds_count

The Go GC pause duration in seconds.

go_gc_duration_seconds_sum

The total Go GC pause duration in seconds.

go_gc_heap_allocs_by_size_bytes_total_bucket

The distribution of Go GC heap allocations classified by size in bytes.

go_gc_heap_allocs_by_size_bytes_total_count

The count of Go GC heap allocations classified by size in bytes.

go_gc_heap_allocs_by_size_bytes_total_sum

The sum of Go GC heap allocations classified by size in bytes.

go_gc_heap_allocs_bytes_total

The total bytes allocated in the Go GC heap.

go_gc_heap_allocs_objects_total

The total objects allocated in the Go GC heap.

go_gc_heap_frees_by_size_bytes_total_bucket

The distribution of Go GC heap releases classified by size in bytes.

go_gc_heap_frees_by_size_bytes_total_count

The count of Go GC heap releases classified by size in bytes.

go_gc_heap_frees_by_size_bytes_total_sum

The sum of Go GC heap releases classified by size in bytes.

go_gc_heap_frees_bytes_total

The total bytes released in the Go GC heap.

go_gc_heap_frees_objects_total

The total objects released in the Go GC heap.

go_gc_heap_goal_bytes

The target size of the Go GC heap in bytes.

go_gc_heap_objects_objects

The number of objects in the Go GC heap.

go_gc_heap_tiny_allocs_objects_total

The total number of small object allocations in the Go GC.

go_gc_limiter_last_enabled_gc_cycle

The last enabled GC cycle.

go_gc_pauses_seconds_total_bucket

The distribution of Go GC pause time in seconds.

go_gc_pauses_seconds_total_count

The count of Go GC pause time in seconds.

go_gc_pauses_seconds_total_sum

The sum of Go GC pause time in seconds.

go_gc_stack_starting_size_bytes

The starting size of the Go GC stack in bytes.

go_goroutines

The number of goroutines.

go_info

The Go-specific information.

go_memory_classes_heap_free_bytes

The amount of idle heap memory in bytes.

go_memory_classes_heap_objects_bytes

The amount of heap memory occupied by objects in bytes.

go_memory_classes_heap_released_bytes

The amount of heap memory released in bytes.

go_memory_classes_heap_stacks_bytes

The amount of memory reserved for the stack in bytes.

go_memory_classes_heap_unused_bytes

The amount of heap memory not used in bytes.

go_memory_classes_metadata_mcache_free_bytes

The amount of idle memory in mcache in bytes.

go_memory_classes_metadata_mcache_inuse_bytes

The amount of memory in use in mcache in bytes.

go_memory_classes_metadata_mspan_free_bytes

The amount of idle memory in mspan in bytes.

go_memory_classes_metadata_mspan_inuse_bytes

The amount of memory in use in mspan in bytes.

go_memory_classes_metadata_other_bytes

The amount of memory occupied by other metadata in bytes.

go_memory_classes_os_stacks_bytes

The amount of memory reserved for the operating system stack in bytes.

go_memory_classes_other_bytes

The amount of memory used for other purposes in bytes.

go_memory_classes_profiling_buckets_bytes

The bytes used by profiling buckets.

go_memory_classes_total_bytes

The total memory in bytes.

go_memstats_alloc_bytes

The amount of memory allocated in bytes.

go_memstats_alloc_bytes_total

The cumulative amount of memory allocated in bytes.

go_memstats_buck_hash_sys_bytes

The amount of memory used by hash tables in the operating system in bytes.

go_memstats_frees_total

The total number of releases.

go_memstats_gc_cpu_fraction

The GC CPU utilization (%).

go_memstats_gc_sys_bytes

The amount of memory used by GC in the operating system in bytes.

go_memstats_heap_alloc_bytes

The amount of heap memory allocated in bytes.

go_memstats_heap_idle_bytes

The amount of idle heap memory in bytes.

go_memstats_heap_inuse_bytes

The amount of heap memory in use in bytes.

go_memstats_heap_objects

The number of objects allocated on the heap.

go_memstats_heap_released_bytes

The amount of heap memory released in bytes.

go_memstats_heap_sys_bytes

The amount of memory allocated to the heap by the operating system in bytes.

go_memstats_last_gc_time_seconds

The last GC duration in seconds.

go_memstats_lookups_total

The total number of lookups.

go_memstats_mallocs_total

The total number of allocations.

go_memstats_mcache_inuse_bytes

The amount of memory in use in mcache in bytes.

go_memstats_mcache_sys_bytes

The amount of memory allocated to mcache by the operating system in bytes.

go_memstats_mspan_inuse_bytes

The amount of memory in use in mspan in bytes.

go_memstats_mspan_sys_bytes

The amount of memory allocated to mspan by the operating system in bytes.

go_memstats_next_gc_bytes

The number of bytes to be released at the next GC in bytes.

go_memstats_other_sys_bytes

The amount of memory allocated for other purposes by the operating system in bytes.

go_memstats_stack_inuse_bytes

The amount of stack memory in use in bytes.

go_memstats_stack_sys_bytes

The amount of memory allocated to the stack by the operating system in bytes.

go_memstats_sys_bytes

The total memory allocated by the operating system in bytes.

go_sched_gomaxprocs_threads

The maximum parallelism of the Go scheduler in threads.

go_sched_goroutines_goroutines

The current number of goroutines in the Go scheduler.

go_sched_latencies_seconds_bucket

The distribution of Go scheduling latencies in seconds.

go_sched_latencies_seconds_count

The count of Go scheduling latencies in seconds.

go_sched_latencies_seconds_sum

The sum of Go scheduling latencies in seconds.

go_threads

The number of Go threads.

nginx_ingress_controller_admission_config_size

The size of the NGINX Ingress controller Admission Config.

nginx_ingress_controller_admission_render_duration

The rendering duration of the NGINX Ingress controller Admission Config.

nginx_ingress_controller_admission_render_ingresses

The number of Ingresses rendered by the NGINX Ingress controller.

nginx_ingress_controller_admission_roundtrip_duration

The round-trip processing duration of the NGINX Ingress controller.

nginx_ingress_controller_admission_tested_duration

The testing duration of the NGINX Ingress controller.

nginx_ingress_controller_admission_tested_ingresses

The number of Ingresses tested by the NGINX Ingress controller.

nginx_ingress_controller_build_info

The build information of the NGINX Ingress controller.

nginx_ingress_controller_bytes_sent_bucket

The distribution of total bytes sent by the NGINX Ingress controller.

nginx_ingress_controller_bytes_sent_count

The count of total bytes sent by the NGINX Ingress controller.

nginx_ingress_controller_bytes_sent_sum

The sum of total bytes sent by the NGINX Ingress controller.

nginx_ingress_controller_check_errors

The number of check errors in the NGINX Ingress controller.

nginx_ingress_controller_check_success

The number of successful checks in the NGINX Ingress controller.

nginx_ingress_controller_config_hash

The configuration hash of the NGINX Ingress controller.

nginx_ingress_controller_config_last_reload_successful

The success status of the last configuration reload in the NGINX Ingress controller.

nginx_ingress_controller_config_last_reload_successful_timestamp_seconds

The timestamp of the last successful configuration reload in the NGINX Ingress controller in seconds.

nginx_ingress_controller_connect_duration_seconds_bucket

The distribution of connection durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_connect_duration_seconds_count

The count of connection durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_connect_duration_seconds_sum

The sum of connection durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_errors

The number of errors in the NGINX Ingress controller.

nginx_ingress_controller_header_duration_seconds_bucket

The distribution of header processing durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_header_duration_seconds_count

The count of header processing durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_header_duration_seconds_sum

The sum of header processing durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_ingress_upstream_latency_seconds

The upstream latency in the NGINX Ingress controller in seconds.

nginx_ingress_controller_ingress_upstream_latency_seconds_count

The count of upstream latencies in the NGINX Ingress controller.

nginx_ingress_controller_ingress_upstream_latency_seconds_sum

The sum of upstream latencies in the NGINX Ingress controller.

nginx_ingress_controller_leader_election_status

The leader election status of the NGINX Ingress controller.

nginx_ingress_controller_nginx_process_connections

The number of connections in the nginx process of the NGINX Ingress controller.

nginx_ingress_controller_nginx_process_connections_total

The total number of connections in the nginx process of the NGINX Ingress controller.

nginx_ingress_controller_nginx_process_cpu_seconds_total

The total CPU utilization in seconds of the nginx process in the NGINX Ingress controller.

nginx_ingress_controller_nginx_process_num_procs

The number of nginx processes in the NGINX Ingress controller.

nginx_ingress_controller_nginx_process_oldest_start_time_seconds

The oldest start time in seconds of the nginx process in the NGINX Ingress controller.

nginx_ingress_controller_nginx_process_read_bytes_total

The total number of bytes read by the nginx process in the NGINX Ingress controller.

nginx_ingress_controller_nginx_process_requests_total

The total number of requests processed by the nginx process in the NGINX Ingress controller.

nginx_ingress_controller_nginx_process_resident_memory_bytes

The resident memory size in bytes of the nginx process in the NGINX Ingress controller.

nginx_ingress_controller_nginx_process_virtual_memory_bytes

The amount of virtual memory that is used by an NGINX process in bytes.

nginx_ingress_controller_nginx_process_write_bytes_total

The virtual memory size in bytes of the nginx process in the NGINX Ingress controller.

nginx_ingress_controller_orphan_ingress

The number of orphaned Ingresses in the NGINX Ingress controller.

nginx_ingress_controller_request_duration_seconds_bucket

The distribution of request durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_request_duration_seconds_count

The count of request durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_request_duration_seconds_sum

The sum of request durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_request_size_bucket

The distribution of request sizes in the NGINX Ingress controller.

nginx_ingress_controller_request_size_count

The count of request sizes in the NGINX Ingress controller.

nginx_ingress_controller_request_size_sum

The sum of request sizes in the NGINX Ingress controller.

nginx_ingress_controller_requests

The total number of requests in the NGINX Ingress controller.

nginx_ingress_controller_response_duration_seconds_bucket

The distribution of response durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_response_duration_seconds_count

The count of response durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_response_duration_seconds_sum

The sum of response durations in the NGINX Ingress controller in seconds.

nginx_ingress_controller_response_size_bucket

The distribution of response sizes in the NGINX Ingress controller.

nginx_ingress_controller_response_size_count

The count of response sizes in the NGINX Ingress controller.

nginx_ingress_controller_response_size_sum

The sum of response sizes in the NGINX Ingress controller.

nginx_ingress_controller_ssl_certificate_info

The SSL certificate information in the NGINX Ingress controller.

nginx_ingress_controller_ssl_expire_time_seconds

The expiration time of the SSL certificate in the NGINX Ingress controller in seconds.

nginx_ingress_controller_success

The number of successes in the NGINX Ingress controller.

scrape_duration_seconds

The scrape duration in seconds.

scrape_samples_post_metric_relabeling

The number of scraped samples after metric relabeling.

scrape_samples_scraped

The number of scraped samples.

scrape_series_added

The number of new series added during the scrape.

up

The connectivity of metric collection.

Koordinator (job name: kube-system, koordlet-metrics-podmonitor, or koord-manager-metrics-service)

Metric

Description

aliyun_prometheus_agent_append_duration_seconds

The duration of the Prometheus agent append operations in seconds.

aliyun_prometheus_agent_scrapes_by_target_total

The total number of scrapes by the Prometheus agent per target.

aliyun_prometheus_agent_target_info

The target information of the Prometheus agent.

koord_manager_recommender_recommendation_workload_target

The recommended specification metric for workload in the resource profiling feature.

koordlet_container_resource_limits

The limit metric for container resources.

koordlet_container_resource_requests

The request metric for container resources.

koordlet_node_priority_resource_reclaimable

The priority metric for node resources.

koordlet_node_resource_allocatable

The allocatable resource metric for the node.

scrape_duration_seconds

The scrape duration in seconds.

scrape_samples_post_metric_relabeling

The number of scraped samples after metric relabeling.

scrape_samples_scraped

The number of scraped samples.

scrape_series_added

The number of new series added during the scrape.

slo_manager_recommender_recommendation_workload_target

The resource specifications that are recommended based on the workload by the resource profiling feature. This metric is discontinued.

up

The connectivity of metric collection.

ETCD (job name: etcd)

Metric

Description

aliyun_prometheus_agent_append_duration_seconds

The duration of the Prometheus agent append operations in seconds.

aliyun_prometheus_agent_job_discovery_status

The discovery status of the Prometheus agent collection jobs.

aliyun_prometheus_agent_scrape_custom_error

The number of custom collection errors of the Prometheus agent.

aliyun_prometheus_agent_scrapes_by_target_total

The total number of scrapes by the Prometheus agent per target.

aliyun_prometheus_agent_target_info

The target information of the Prometheus agent.

cpu_utilization_core

The CPU core utilization.

etcd_cluster_version

The version of the cluster.

etcd_debugging_auth_revision

The authentication revision number for ETCD debugging.

etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket

The distribution of ETCD debugging disk backend commit rebalance duration in seconds.

etcd_debugging_disk_backend_commit_rebalance_duration_seconds_count

The count of ETCD debugging disk backend commit rebalance duration in seconds.

etcd_debugging_disk_backend_commit_rebalance_duration_seconds_sum

The sum of ETCD debugging disk backend commit rebalance duration in seconds.

etcd_debugging_disk_backend_commit_spill_duration_seconds_bucket

The distribution of ETCD debugging disk backend commit spill duration.

etcd_debugging_disk_backend_commit_spill_duration_seconds_count

The count of ETCD debugging disk backend commit spill duration.

etcd_debugging_disk_backend_commit_spill_duration_seconds_sum

The sum of ETCD debugging disk backend commit spill duration.

etcd_debugging_disk_backend_commit_write_duration_seconds_bucket

The distribution of ETCD debugging disk backend commit write duration in seconds.

etcd_debugging_disk_backend_commit_write_duration_seconds_count

The count of ETCD debugging disk backend commit write duration in seconds.

etcd_debugging_disk_backend_commit_write_duration_seconds_sum

The sum of ETCD debugging disk backend commit write duration in seconds.

etcd_debugging_lease_granted_total

The total number of lease grants in ETCD debugging.

etcd_debugging_lease_renewed_total

The total number of lease renewals in ETCD debugging.

etcd_debugging_lease_revoked_total

The total number of lease revocations in ETCD debugging.

etcd_debugging_lease_ttl_total_bucket

The distribution of lease TTLs in ETCD debugging.

etcd_debugging_lease_ttl_total_count

The count of lease TTLs in ETCD debugging.

etcd_debugging_lease_ttl_total_sum

The sum of lease TTLs in ETCD debugging.

etcd_debugging_mvcc_compact_revision

The compaction revision number for ETCD debugging MVCC.

etcd_debugging_mvcc_current_revision

The current revision version for ETCD debugging MVCC.

etcd_debugging_mvcc_db_compaction_keys_total

The total number of keys compressed in the ETCD debugging MVCC database.

etcd_debugging_mvcc_db_compaction_last

The last compaction time for the ETCD debugging MVCC database.

etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket

The distribution of MVCC database compaction pause durations in milliseconds for ETCD debugging.

etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_count

The count of MVCC database compaction pause durations in milliseconds for ETCD debugging.

etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_sum

The sum of MVCC database compaction pause durations in milliseconds for ETCD debugging.

etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket

The distribution of MVCC database compaction total durations in milliseconds for ETCD debugging.

etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_count

The count of MVCC database compaction total durations in milliseconds for ETCD debugging.

etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_sum

The sum of MVCC database compaction total durations in milliseconds for ETCD debugging.

etcd_debugging_mvcc_db_total_size_in_bytes

The total size of the MVCC database in bytes for ETCD debugging.

etcd_debugging_mvcc_delete_total

The total number of delete operations in ETCD debugging MVCC.

etcd_debugging_mvcc_events_total

The total number of events in ETCD debugging.

etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket

The distribution of MVCC index compaction pause durations in milliseconds for ETCD debugging.

etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_count

The count of MVCC index compaction pause durations in milliseconds for ETCD debugging.

etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_sum

The sum of MVCC index compaction pause durations in milliseconds for ETCD debugging.

etcd_debugging_mvcc_keys_total

The total number of keys in ETCD debugging MVCC.

etcd_debugging_mvcc_pending_events_total

The total number of pending events in ETCD debugging MVCC.

etcd_debugging_mvcc_put_total

The total number of put operations in ETCD debugging MVCC.

etcd_debugging_mvcc_range_total

The total number of range queries in ETCD MVCC.

etcd_debugging_mvcc_slow_watcher_total

The total number of slow watchers in ETCD debugging.

etcd_debugging_mvcc_total_put_size_in_bytes

The total size of MVCC puts in bytes for ETCD debugging.

etcd_debugging_mvcc_txn_total

The total number of MVCC transactions in ETCD debugging.

etcd_debugging_mvcc_watch_stream_total

The total number of snapshot streams in ETCD debugging.

etcd_debugging_mvcc_watcher_total

The total number of watchers in ETCD debugging.

etcd_debugging_server_lease_expired_total

The total number of expired leases in ETCD debugging.

etcd_debugging_snap_save_marshalling_duration_seconds_bucket

The distribution of snapshot save marshalling durations in seconds for ETCD debugging.

etcd_debugging_snap_save_marshalling_duration_seconds_count

The count of snapshot save marshalling durations in seconds for ETCD debugging.

etcd_debugging_snap_save_marshalling_duration_seconds_sum

The sum of snapshot save marshalling durations in seconds for ETCD debugging.

etcd_debugging_snap_save_total_duration_seconds_bucket

The distribution of snapshot save durations in seconds for ETCD debugging.

etcd_debugging_snap_save_total_duration_seconds_count

The count of snapshot save durations in seconds for ETCD debugging.

etcd_debugging_snap_save_total_duration_seconds_sum

The sum of snapshot save durations in seconds for ETCD debugging.

etcd_debugging_store_expires_total

The total number of expired items in ETCD debugging storage.

etcd_debugging_store_reads_total

The total number of reads in ETCD debugging storage.

etcd_debugging_store_watch_requests_total

The total number of watch requests in ETCD debugging storage.

etcd_debugging_store_watchers

The total number of watchers in ETCD debugging storage.

etcd_debugging_store_writes_total

The total number of writes in ETCD debugging storage.

etcd_disk_backend_commit_duration_seconds_bucket

The distribution of disk backend commit durations in seconds for ETCD.

etcd_disk_backend_commit_duration_seconds_count

The count of disk backend commit durations in seconds for ETCD.

etcd_disk_backend_commit_duration_seconds_sum

The sum of disk backend commit durations in seconds for ETCD.

etcd_disk_backend_defrag_duration_seconds_bucket

The distribution of disk backend defragmentation durations in seconds for ETCD.

etcd_disk_backend_defrag_duration_seconds_count

The count of disk backend defragmentation durations in seconds for ETCD.

etcd_disk_backend_defrag_duration_seconds_sum

The sum of disk backend defragmentation durations in seconds for ETCD.

etcd_disk_backend_snapshot_duration_seconds_bucket

The distribution of disk backend snapshot durations in seconds for ETCD.

etcd_disk_backend_snapshot_duration_seconds_count

The count of disk backend snapshot durations in seconds for ETCD.

etcd_disk_backend_snapshot_duration_seconds_sum

The sum of disk backend snapshot durations in seconds for ETCD.

etcd_disk_defrag_inflight

The number of ongoing disk defragmentations in ETCD.

etcd_disk_wal_fsync_duration_seconds_bucket

The distribution of WAL sync durations in seconds for ETCD disk.

etcd_disk_wal_fsync_duration_seconds_count

The count of WAL sync durations in seconds for ETCD disk.

etcd_disk_wal_fsync_duration_seconds_sum

The sum of WAL sync durations in seconds for ETCD disk.

etcd_disk_wal_write_bytes_total

The total number of bytes written to the WAL in ETCD disk.

etcd_grpc_proxy_cache_hits_total

The total number of cache hits in the ETCD gRPC proxy.

etcd_grpc_proxy_cache_keys_total

The total number of cache keys in the ETCD gRPC proxy.

etcd_grpc_proxy_cache_misses_total

The total number of cache misses in the ETCD gRPC proxy.

etcd_grpc_proxy_events_coalescing_total

The total number of event coalescings in the ETCD gRPC proxy.

etcd_grpc_proxy_watchers_coalescing_total

The total number of watcher coalescings in the ETCD gRPC proxy.

etcd_mvcc_db_open_read_transactions

The number of open read transactions in the ETCD MVCC database.

etcd_mvcc_db_total_size_in_bytes

The total size of the MVCC database in bytes for ETCD.

etcd_mvcc_db_total_size_in_use_in_bytes

The total size in use of the MVCC database in bytes for ETCD.

etcd_mvcc_delete_total

The total number of deletes in ETCD MVCC.

etcd_mvcc_hash_duration_seconds_bucket

The distribution of MVCC hash durations in seconds for ETCD.

etcd_mvcc_hash_duration_seconds_count

The count of MVCC hash durations in seconds for ETCD.

etcd_mvcc_hash_duration_seconds_sum

The sum of MVCC hash durations in seconds for ETCD.

etcd_mvcc_hash_rev_duration_seconds_bucket

The distribution of MVCC hash revision durations in seconds for ETCD.

etcd_mvcc_hash_rev_duration_seconds_count

The count of MVCC hash revision durations in seconds for ETCD.

etcd_mvcc_hash_rev_duration_seconds_sum

The sum of MVCC hash revision durations in seconds for ETCD.

etcd_mvcc_put_total

The total number of put operations in ETCD MVCC.

etcd_mvcc_range_total

The total number of range queries in ETCD MVCC.

etcd_mvcc_txn_total

The total number of MVCC transactions in ETCD.

etcd_network_active_peers

The number of active peers in the ETCD network.

etcd_network_client_grpc_received_bytes_total

The total number of bytes received by the ETCD network client via gRPC.

etcd_network_client_grpc_sent_bytes_total

The total number of bytes sent by the ETCD network client via gRPC.

etcd_network_disconnected_peers_total

The total number of disconnected peers in the ETCD network.

etcd_network_peer_received_bytes_total

The total number of bytes received by the ETCD network peer.

etcd_network_peer_received_failures_total

The total number of receive failures in the ETCD network peer.

etcd_network_peer_round_trip_time_seconds_bucket

The distribution of round trip times for the ETCD network peer in seconds.

etcd_network_peer_round_trip_time_seconds_count

The count of round trip times for the ETCD network peer in seconds.

etcd_network_peer_round_trip_time_seconds_sum

The sum of round trip times for the ETCD network peer in seconds.

etcd_network_peer_sent_bytes_total

The total number of bytes sent by the ETCD network peer.

etcd_network_peer_sent_failures_total

The total number of send failures by the ETCD network peer.

etcd_network_server_stream_failures_total

The total number of stream failures in the ETCD network server.

etcd_network_snapshot_receive_inflights_total

The number of concurrent snapshot receive requests in the ETCD network.

etcd_network_snapshot_receive_success

The number of successful snapshot receives in the ETCD network.

etcd_network_snapshot_receive_total_duration_seconds_bucket

The distribution of snapshot receive durations in seconds for the ETCD network.

etcd_network_snapshot_receive_total_duration_seconds_count

The count of snapshot receive durations in seconds for the ETCD network.

etcd_network_snapshot_receive_total_duration_seconds_sum

The sum of snapshot receive durations in seconds for the ETCD network.

etcd_network_snapshot_send_inflights_total

The number of concurrent snapshot send requests in the ETCD network.

etcd_network_snapshot_send_success

The number of successful snapshot sends in the ETCD network.

etcd_network_snapshot_send_total_duration_seconds_bucket

The distribution of snapshot send durations in seconds for the ETCD network.

etcd_network_snapshot_send_total_duration_seconds_count

The count of snapshot send durations in seconds for the ETCD network.

etcd_network_snapshot_send_total_duration_seconds_sum

The sum of snapshot send durations in seconds for the ETCD network.

etcd_server_apply_duration_seconds_bucket

The distribution of application durations in seconds for the ETCD server.

etcd_server_apply_duration_seconds_count

The count of application durations in seconds for the ETCD server.

etcd_server_apply_duration_seconds_sum

The sum of application durations in seconds for the ETCD server.

etcd_server_client_requests_total

The total number of client requests to the ETCD server.

etcd_server_go_version

The Go version of the ETCD server.

etcd_server_has_leader

Indicates whether a leader exists in the ETCD server.

etcd_server_health_failures

The number of health check failures in the ETCD server.

etcd_server_health_success

The number of successful health checks in the ETCD server.

etcd_server_heartbeat_send_failures_total

The total number of heartbeat send failures in the ETCD server.

etcd_server_id

The ID of the ETCD server.

etcd_server_is_leader

Indicates whether the ETCD server is a leader.

etcd_server_is_learner

Indicates whether the ETCD server is a learner.

etcd_server_leader_changes_seen_total

The total number of leader changes witnessed by the ETCD server.

etcd_server_learner_promote_successes

The number of successful learner promotions in the ETCD server.

etcd_server_proposals_applied_total

The total number of applied proposals in the ETCD server.

etcd_server_proposals_committed_total

The total number of committed proposals in the ETCD server.

etcd_server_proposals_failed_total

The total number of failed proposals in the ETCD server.

etcd_server_proposals_pending

The total number of pending proposals in the ETCD server.

etcd_server_quota_backend_bytes

The backend storage quota in bytes for the ETCD server.

etcd_server_read_indexes_failed_total

The total number of read index failures in the ETCD server.

etcd_server_slow_apply_total

The total number of slow applications in the ETCD server.

etcd_server_slow_read_indexes_total

The total number of slow read indexes in the ETCD server.

etcd_server_snapshot_apply_in_progress_total

The total number of snapshots being applied in the ETCD server.

etcd_server_version

The version of the ETCD server.

etcd_snap_db_fsync_duration_seconds_bucket

The distribution of ETCD snapshot database fsync durations in seconds.

etcd_snap_db_fsync_duration_seconds_count

The count of ETCD snapshot database fsync durations in seconds.

etcd_snap_db_fsync_duration_seconds_sum

The sum of ETCD snapshot database fsync durations in seconds.

etcd_snap_db_save_total_duration_seconds_bucket

The distribution of ETCD snapshot database save durations in seconds.

etcd_snap_db_save_total_duration_seconds_count

The count of ETCD snapshot database save durations in seconds.

etcd_snap_db_save_total_duration_seconds_sum

The sum of ETCD snapshot database save durations in seconds.

etcd_snap_fsync_duration_seconds_bucket

The distribution of ETCD snapshot fsync durations in seconds.

etcd_snap_fsync_duration_seconds_count

The count of ETCD snapshot fsync durations in seconds.

etcd_snap_fsync_duration_seconds_sum

The sum of ETCD snapshot fsync durations in seconds.

go_gc_duration_seconds

The Go GC pause duration in seconds.

go_gc_duration_seconds_count

The Go GC pause duration in seconds.

go_gc_duration_seconds_sum

The total Go GC pause duration in seconds.

go_goroutines

The number of goroutines.

go_info

The Go-specific information.

go_memstats_alloc_bytes

The amount of memory allocated in bytes.

go_memstats_alloc_bytes_total

The cumulative amount of memory allocated in bytes.

go_memstats_buck_hash_sys_bytes

The amount of memory used by hash tables in the operating system in bytes.

go_memstats_frees_total

The total number of releases.

go_memstats_gc_cpu_fraction

The GC CPU utilization (%).

go_memstats_gc_sys_bytes

The amount of memory used by GC in the operating system in bytes.

go_memstats_heap_alloc_bytes

The amount of heap memory allocated in bytes.

go_memstats_heap_idle_bytes

The amount of idle heap memory in bytes.

go_memstats_heap_inuse_bytes

The amount of heap memory in use in bytes.

go_memstats_heap_objects

The number of objects allocated on the heap.

go_memstats_heap_released_bytes

The amount of heap memory released in bytes.

go_memstats_heap_sys_bytes

The amount of memory allocated to the heap by the operating system in bytes.

go_memstats_last_gc_time_seconds

The last GC duration in seconds.

go_memstats_lookups_total

The total number of lookups.

go_memstats_mallocs_total

The total number of allocations.

go_memstats_mcache_inuse_bytes

The amount of memory in use in mcache in bytes.

go_memstats_mcache_sys_bytes

The amount of memory allocated to mcache by the operating system in bytes.

go_memstats_mspan_inuse_bytes

The amount of memory in use in mspan in bytes.

go_memstats_mspan_sys_bytes

The amount of memory allocated to mspan by the operating system in bytes.

go_memstats_next_gc_bytes

The number of bytes to be released at the next GC in bytes.

go_memstats_other_sys_bytes

The amount of memory allocated for other purposes by the operating system in bytes.

go_memstats_stack_inuse_bytes

The amount of stack memory in use in bytes.

go_memstats_stack_sys_bytes

The amount of memory allocated to the stack by the operating system in bytes.

go_memstats_sys_bytes

The total memory allocated by the operating system in bytes.

go_threads

The number of threads.

grpc_server_handled_total

The total number of requests handled by the gRPC server.

grpc_server_msg_received_total

The total number of requests received by the gRPC server.

grpc_server_msg_sent_total

The total number of requests sent by the gRPC server.

grpc_server_started_total

The total number of times the gRPC server has started.

memory_utilization_byte

The memory usage in bytes.

os_fd_limit

The file descriptor limit of the operating system.

os_fd_used

The number of file descriptors used by the operating system.

process_cpu_seconds_total

The total number of CPU seconds used by the process.

process_max_fds

The maximum number of file descriptors for the process.

process_open_fds

The number of file descriptors opened by the process.

process_resident_memory_bytes

The resident memory size of the process in bytes.

process_start_time_seconds

The process startup duration in seconds.

process_virtual_memory_bytes

The number of virtual memory bytes for the process.

process_virtual_memory_max_bytes

The maximum number of virtual memory bytes for the process.

promhttp_metric_handler_requests_in_flight

The current number of requests being handled by the Prometheus HTTP metric handler.

promhttp_metric_handler_requests_total

The total number of requests handled by the Prometheus HTTP metric handler.

scrape_duration_seconds

The scrape duration in seconds.

scrape_samples_post_metric_relabeling

The number of scraped samples after metric relabeling.

scrape_samples_scraped

The number of scraped samples.

scrape_series_added

The number of new series added during the scrape.

up

The connectivity of metric collection.

Scheduler (job name: ack-scheduler)

Metric

Description

aggregator_discovery_aggregation_count_total

The count of discovery aggregations performed by the aggregator.

aliyun_prometheus_agent_append_duration_seconds

The duration of the Prometheus agent append operations in seconds.

aliyun_prometheus_agent_job_discovery_status

The discovery status of the Prometheus agent collection jobs.

aliyun_prometheus_agent_scrape_custom_error

The number of custom collection errors of the Prometheus agent.

aliyun_prometheus_agent_scrapes_by_target_total

The total number of scrapes by the Prometheus agent per target.

aliyun_prometheus_agent_target_info

The target information of the Prometheus agent.

apiserver_audit_event_total

The total number of APIServer audit events.

apiserver_audit_requests_rejected_total

The total number of APIServer audit request rejections.

apiserver_client_certificate_expiration_seconds_bucket

The distribution of remaining seconds until APIServer client certificate expiration.

apiserver_client_certificate_expiration_seconds_count

The count of remaining seconds until APIServer client certificate expiration.

apiserver_client_certificate_expiration_seconds_sum

The sum of remaining seconds until APIServer client certificate expiration.

apiserver_delegated_authn_request_duration_seconds_bucket

The distribution of delegated authentication request durations in seconds for the APIServer.

apiserver_delegated_authn_request_duration_seconds_count

The count of delegated authentication request durations in seconds for the APIServer.

apiserver_delegated_authn_request_duration_seconds_sum

The sum of delegated authentication request durations in seconds for the APIServer.

apiserver_delegated_authn_request_total

The total number of delegated authentication requests for the APIServer.

apiserver_delegated_authz_request_duration_seconds_bucket

The distribution of delegated authorization request durations in seconds for the APIServer.

apiserver_delegated_authz_request_duration_seconds_count

The count of delegated authorization request durations in seconds for the APIServer.

apiserver_delegated_authz_request_duration_seconds_sum

The sum of delegated authorization request durations in seconds for the APIServer.

apiserver_delegated_authz_request_total

The total number of delegated authorization requests to the API server.

apiserver_encryption_config_controller_automatic_reload_failures_total

The total number of automatic reload failures for the APIServer encryption configuration controller.

apiserver_encryption_config_controller_automatic_reload_success_total

The total number of successful automatic reloads for the APIServer encryption configuration controller.

apiserver_envelope_encryption_dek_cache_fill_percent

The percentage of envelope encryption data encryption keys (DEKs) cache fill for the APIServer.

apiserver_storage_data_key_generation_duration_seconds_bucket

The distribution of data key generation durations for the APIServer storage.

apiserver_storage_data_key_generation_duration_seconds_count

The count of data key generation durations for the APIServer storage.

apiserver_storage_data_key_generation_duration_seconds_sum

The sum of data key generation durations for the APIServer storage.

apiserver_storage_data_key_generation_failures_total

The total number of data key generation failures for the APIServer storage.

apiserver_storage_envelope_transformation_cache_misses_total

The total number of envelope transformation cache misses for the APIServer storage.

apiserver_webhooks_x509_insecure_sha1_total

The total count of insecure SHA1 usage in X509 certificates for APIServer Webhooks.

apiserver_webhooks_x509_missing_san_total

The total count of missing SANs in X509 certificates for APIServer Webhooks.

authenticated_user_requests

The number of authenticated user requests.

authentication_attempts

The number of authentication attempts.

authentication_duration_seconds_bucket

The distribution of authentication durations in seconds.

authentication_duration_seconds_count

The count of authentication durations in seconds.

authentication_duration_seconds_sum

The sum of authentication durations in seconds.

authentication_token_cache_active_fetch_count

The count of active fetches for the authentication token cache.

authentication_token_cache_fetch_total

The total number of fetches for the authentication token cache.

authentication_token_cache_request_duration_seconds_bucket

The distribution of request durations in seconds for the authentication token cache.

authentication_token_cache_request_duration_seconds_count

The count of request durations in seconds for the authentication token cache.

authentication_token_cache_request_duration_seconds_sum

The sum of request durations in seconds for the authentication token cache.

authentication_token_cache_request_total

The total number of requests for the authentication token cache.

authorization_attempts_total

The total number of authorization attempts.

authorization_duration_seconds_bucket

The distribution of authorization durations in seconds.

authorization_duration_seconds_count

The count of authorization durations in seconds.

authorization_duration_seconds_sum

The sum of authorization durations in seconds.

cardinality_enforcement_unexpected_categorizations_total

The total number of unexpected categorizations during cardinality enforcement.

cpu_utilization_core

The CPU core utilization.

disabled_metric_total

The total number of disabled metrics.

disabled_metrics_total

The total number of disabled metrics.

go_cgo_go_to_c_calls_calls_total

The total number of Go to C calls via cgo.

go_cpu_classes_gc_mark_assist_cpu_seconds_total

The total number of CPU seconds for GC mark assist.

go_cpu_classes_gc_mark_dedicated_cpu_seconds_total

The total number of dedicated CPU seconds for GC marking in Go.

go_cpu_classes_gc_mark_idle_cpu_seconds_total

The idle CPU seconds for GC marking in Go.

go_cpu_classes_gc_pause_cpu_seconds_total

The total number of CPU seconds for GC pauses in Go.

go_cpu_classes_gc_total_cpu_seconds_total

The total number of CPU seconds for all GC activities in Go.

go_cpu_classes_idle_cpu_seconds_total

The total number of idle CPU seconds in Go.

go_cpu_classes_scavenge_assist_cpu_seconds_total

The total number of CPU seconds for GC scavenging assist.

go_cpu_classes_scavenge_background_cpu_seconds_total

The total number of CPU seconds for background GC scavenging.

go_cpu_classes_scavenge_total_cpu_seconds_total

The total CPU seconds for scavenge in Go CPU classes.

go_cpu_classes_total_cpu_seconds_total

The total CPU seconds summed across all Go CPU classes.

go_cpu_classes_user_cpu_seconds_total

The total user CPU seconds summed across Go CPU classes.

go_gc_cycles_automatic_gc_cycles_total

The total number of automatic GC cycles in Go.

go_gc_cycles_forced_gc_cycles_total

The total number of forced GC cycles in Go.

go_gc_cycles_total_gc_cycles_total

The total number of GC cycles in Go.

go_gc_duration_seconds

The duration of Go GC in seconds.

go_gc_duration_seconds_count

The count of Go GC durations in seconds.

go_gc_duration_seconds_sum

The sum of Go GC pause durations in seconds.

go_gc_gogc_percent

The GO GC target percentage.

go_gc_gomemlimit_bytes

The heap memory limit in bytes for Go GC.

go_gc_heap_allocs_by_size_bytes_bucket

The distribution of heap allocations by size in bytes for Go GC.

go_gc_heap_allocs_by_size_bytes_count

The count of heap allocations by size in bytes for Go GC.

go_gc_heap_allocs_by_size_bytes_sum

The sum of heap allocations by size in bytes for Go GC.

go_gc_heap_allocs_by_size_bytes_total_bucket

The distribution of heap allocations by size in bytes for Go GC.

go_gc_heap_allocs_by_size_bytes_total_count

The count of heap allocations by size in bytes for Go GC.

go_gc_heap_allocs_by_size_bytes_total_sum

The sum of heap allocations by size in bytes for Go GC.

go_gc_heap_allocs_bytes_total

The total bytes allocated in the Go GC heap.

go_gc_heap_allocs_objects_total

The total number of objects allocated on the heap for Go GC.

go_gc_heap_frees_by_size_bytes_bucket

The distribution of heap releases by size in bytes for Go GC.

go_gc_heap_frees_by_size_bytes_count

The count of heap releases by size in bytes for Go GC.

go_gc_heap_frees_by_size_bytes_sum

The sum of heap releases by size in bytes for Go GC.

go_gc_heap_frees_by_size_bytes_total_bucket

The distribution of total heap releases by size in bytes for Go GC.

go_gc_heap_frees_by_size_bytes_total_count

The count of total heap releases by size in bytes for Go GC.

go_gc_heap_frees_by_size_bytes_total_sum

The sum of total heap releases by size in bytes for Go GC.

go_gc_heap_frees_bytes_total

The total bytes released in the Go GC heap.

go_gc_heap_frees_objects_total

The total number of objects freed from the heap for Go GC.

go_gc_heap_goal_bytes

The target heap size in bytes for Go GC.

go_gc_heap_live_bytes

The live heap size in bytes for Go GC.

go_gc_heap_objects_objects

The number of objects in the heap for Go GC.

go_gc_heap_tiny_allocs_objects_total

The total number of tiny object allocations in the heap for Go GC.

go_gc_limiter_last_enabled_gc_cycle

The last enabled GC cycle for the Go GC limiter.

go_gc_pauses_seconds_bucket

The distribution of GC pause durations in seconds.

go_gc_pauses_seconds_count

The count of GC pause durations in seconds.

go_gc_pauses_seconds_sum

The sum of GC pause durations in seconds.

go_gc_pauses_seconds_total_bucket

The distribution of total GC pause durations in seconds.

go_gc_pauses_seconds_total_count

The count of total GC pause durations in seconds.

go_gc_pauses_seconds_total_sum

The sum of total GC pause durations in seconds.

go_gc_scan_globals_bytes

The number of global bytes scanned during Go GC.

go_gc_scan_heap_bytes

The number of heap bytes scanned during Go GC.

go_gc_scan_stack_bytes

The number of stack bytes scanned during Go GC.

go_gc_scan_total_bytes

The total number of bytes scanned during Go GC.

go_gc_stack_starting_size_bytes

The starting size of the Go GC stack in bytes.

go_godebug_non_default_behavior_execerrdot_events_total

The total number of execution error point events for non-default Go behavior.

go_godebug_non_default_behavior_gocachehash_events_total

The total number of Go cache hash events for non-default Go behavior.

go_godebug_non_default_behavior_gocachetest_events_total

The total number of gocachetest events for non-default Go debug behavior.

go_godebug_non_default_behavior_gocacheverify_events_total

The total number of gocacheverify events for non-default Go behavior.

go_godebug_non_default_behavior_gotypesalias_events_total

The total number of gotypealias events for non-default Go debug behavior.

go_godebug_non_default_behavior_http2client_events_total

The total number of http2client events for non-default Go debug behavior.

go_godebug_non_default_behavior_http2server_events_total

The total number of http2server events for non-default Go behavior.

go_godebug_non_default_behavior_httplaxcontentlength_events_total

The total number of HTTP lax content length events for non-default Go behavior.

go_godebug_non_default_behavior_httpmuxgo121_events_total

The total number of httpmuxgo121 events for non-default Go behavior.

go_godebug_non_default_behavior_installgoroot_events_total

The total number of goroot installation events for non-default Go debugging.

go_godebug_non_default_behavior_jstmpllitinterp_events_total

The total number of jstmpllitinterp events for non-default Go debug behavior.

go_godebug_non_default_behavior_multipartmaxheaders_events_total

The total number of multipart max headers events for non-default Go behavior.

go_godebug_non_default_behavior_multipartmaxparts_events_total

The total number of multipartmaxparts events for non-default Go debug behavior.

go_godebug_non_default_behavior_multipathtcp_events_total

The total number of multipathtcp events for non-default Go debug behavior.

go_godebug_non_default_behavior_panicnil_events_total

The total number of nil pointer panic events for non-default Go behavior.

go_godebug_non_default_behavior_randautoseed_events_total

The total number of random auto-seed events for non-default Go behavior.

go_godebug_non_default_behavior_tarinsecurepath_events_total

The total number of tarinsecurepath events for non-default Go debug behavior.

go_godebug_non_default_behavior_tls10server_events_total

The total number of TLS1.0 events for non-default Go debug behavior.

go_godebug_non_default_behavior_tlsmaxrsasize_events_total

The total number of tlsmaxrsasize events for non-default Go debug behavior.

go_godebug_non_default_behavior_tlsrsakex_events_total

The total number of TLS RSA key exchange events for non-default Go debug behavior.

go_godebug_non_default_behavior_tlsunsafeekm_events_total

The total number of TLS insecure EKM events for non-default Go debug behavior.

go_godebug_non_default_behavior_x509sha1_events_total

The total number of x509sha1 events for non-default Go debug behavior.

go_godebug_non_default_behavior_x509usefallbackroots_events_total

The total number of X509 use fallback roots events for non-default Go behavior.

go_godebug_non_default_behavior_x509usepolicies_events_total

The total number of x509usepolicies events for non-default Go debug behavior.

go_godebug_non_default_behavior_zipinsecurepath_events_total

The total number of zipinsecurepath events for non-default Go debug behavior.

go_goroutines

Go goroutines

go_info

The Go-specific information.

go_memory_classes_heap_free_bytes

The free bytes in the heap.

go_memory_classes_heap_objects_bytes

The bytes used by heap objects.

go_memory_classes_heap_released_bytes

The released bytes in the heap for memory classes.

go_memory_classes_heap_stacks_bytes

The bytes used by stacks.

go_memory_classes_heap_unused_bytes

The unused bytes in the heap.

go_memory_classes_metadata_mcache_free_bytes

The free bytes in metadata mcache.

go_memory_classes_metadata_mcache_inuse_bytes

The in-use bytes in metadata mcache.

go_memory_classes_metadata_mspan_free_bytes

The free bytes in metadata mspan.

go_memory_classes_metadata_mspan_inuse_bytes

The in-use bytes in metadata mspan.

go_memory_classes_metadata_other_bytes

The other bytes in metadata.

go_memory_classes_os_stacks_bytes

The bytes used by OS stacks in memory classes.

go_memory_classes_other_bytes

The other bytes.

go_memory_classes_profiling_buckets_bytes

The bytes used by profiling buckets.

go_memory_classes_total_bytes

The total bytes.

go_memstats_alloc_bytes

The allocated bytes.

go_memstats_alloc_bytes_total

The total allocated bytes.

go_memstats_buck_hash_sys_bytes

The buck hash system bytes.

go_memstats_frees_total

The total number of releases.

go_memstats_gc_cpu_fraction

The fraction of CPU time spent in GC.

go_memstats_gc_sys_bytes

The GC system bytes.

go_memstats_heap_alloc_bytes

The allocated bytes on the heap.

go_memstats_heap_idle_bytes

The idle bytes on the heap.

go_memstats_heap_inuse_bytes

The in-use bytes on the heap.

go_memstats_heap_objects

The number of objects on the heap.

go_memstats_heap_released_bytes

The released bytes on the heap.

go_memstats_heap_sys_bytes

The system bytes on the heap.

go_memstats_last_gc_time_seconds

The last GC duration in seconds.

go_memstats_lookups_total

The total number of lookups.

go_memstats_mallocs_total

The total number of allocations.

go_memstats_mcache_inuse_bytes

The amount of memory in use in mcache in bytes.

go_memstats_mcache_sys_bytes

The amount of memory allocated to mcache by the operating system in bytes.

go_memstats_mspan_inuse_bytes

The amount of memory in use in mspan in bytes.

go_memstats_mspan_sys_bytes

The amount of memory allocated to mspan by the operating system in bytes.

go_memstats_next_gc_bytes

The number of bytes to be released at the next GC in bytes.

go_memstats_other_sys_bytes

The total memory allocated by the operating system in bytes.

go_memstats_stack_inuse_bytes

The amount of stack memory in use in bytes.

go_memstats_stack_sys_bytes

The amount of stack memory allocated by the operating system in bytes.

go_memstats_sys_bytes

The total memory allocated by the operating system in bytes.

go_sched_gomaxprocs_threads

The number of threads determined by GOMAXPROCS.

go_sched_goroutines_goroutines

The number of goroutines.

go_sched_latencies_seconds_bucket

The distribution of Go scheduling latencies in seconds.

go_sched_latencies_seconds_count

The count of Go scheduling latencies in seconds.

go_sched_latencies_seconds_sum

The sum of Go scheduling latencies in seconds.

go_sched_pauses_stopping_gc_seconds_bucket

The distribution of stopping GC pause seconds.

go_sched_pauses_stopping_gc_seconds_count

The count of stopping GC pause seconds.

go_sched_pauses_stopping_gc_seconds_sum

The sum of stopping GC pause seconds.

go_sched_pauses_stopping_other_seconds_bucket

The distribution of other stopping seconds for Go scheduler pauses.

go_sched_pauses_stopping_other_seconds_count

The count of other stopping seconds for Go scheduler pauses.

go_sched_pauses_stopping_other_seconds_sum

The sum of other stopping seconds for Go scheduler pauses.

go_sched_pauses_total_gc_seconds_bucket

The distribution of total GC seconds for Go scheduler pauses.

go_sched_pauses_total_gc_seconds_count

The count of total GC seconds for Go scheduler pauses.

go_sched_pauses_total_gc_seconds_sum

The sum of total GC seconds for Go scheduler pauses.

go_sched_pauses_total_other_seconds_bucket

The distribution of other pause seconds.

go_sched_pauses_total_other_seconds_count

The count of other pause seconds.

go_sched_pauses_total_other_seconds_sum

The sum of other pause seconds.

go_sync_mutex_wait_total_seconds_total

The total seconds of Go sync mutex wait.

go_threads

The number of Go threads.

hidden_metric_total

The total number of hidden metrics.

hidden_metrics_total

The total number of hidden metrics.

kubernetes_build_info

The Kubernetes build information.

kubernetes_feature_enabled

The Kubernetes enabled features.

leader_election_master_status

The master status of leader election.

memory_utilization_byte

The used memory in bytes.

process_cpu_seconds_total

The total CPU seconds of the process.

process_max_fds

The maximum number of file descriptors for the process.

process_open_fds

The number of file descriptors opened by the process.

process_resident_memory_bytes

The resident memory size of the process in bytes.

process_start_time_seconds

The process startup duration in seconds.

process_virtual_memory_bytes

The number of virtual memory bytes for the process.

process_virtual_memory_max_bytes

The maximum number of virtual memory bytes for the process.

registered_metric_total

The total number of registered metrics.

registered_metrics_total

The total number of registered metrics.

rest_client_exec_plugin_certificate_rotation_age_bucket

The distribution of certificate rotation age for REST client exec plugin.

rest_client_exec_plugin_certificate_rotation_age_count

The count of certificate rotation age for REST client exec plugin.

rest_client_exec_plugin_certificate_rotation_age_sum

The sum of certificate rotation age for REST client exec plugin.

rest_client_rate_limiter_duration_seconds_bucket

The distribution of rate limiter durations for REST client.

rest_client_rate_limiter_duration_seconds_count

The count of rate limiter durations for REST client.

rest_client_rate_limiter_duration_seconds_sum

The sum of rate limiter durations for REST client.

rest_client_request_duration_seconds_bucket

The distribution of request durations in seconds for REST client.

rest_client_request_duration_seconds_count

The count of request durations in seconds for REST client.

rest_client_request_duration_seconds_sum

The sum of request durations in seconds for REST client.

rest_client_request_retries_total

The total number of request retries for REST client.

rest_client_request_size_bytes_bucket

The distribution of request sizes in bytes for REST client.

rest_client_request_size_bytes_count

The count of request sizes in bytes for REST client.

rest_client_request_size_bytes_sum

The sum of request sizes in bytes for REST client.

rest_client_requests_total

The total number of requests for REST client.

rest_client_response_size_bytes_bucket

The distribution of response sizes in bytes for REST client.

rest_client_response_size_bytes_count

The count of response sizes in bytes for REST client.

rest_client_response_size_bytes_sum

The sum of response sizes in bytes for REST client.

rest_client_transport_cache_entries

The number of transport cache entries for REST client.

rest_client_transport_create_calls_total

The total number of transport create calls for REST client.

scheduler_binding_duration_seconds_bucket

The distribution of binding durations in seconds for the scheduler.

scheduler_binding_duration_seconds_count

The count of binding durations in seconds for the scheduler.

scheduler_binding_duration_seconds_sum

The sum of binding durations in seconds for the scheduler.

scheduler_e2e_scheduling_duration_seconds_bucket

The distribution of end-to-end scheduling durations for the scheduler.

scheduler_e2e_scheduling_duration_seconds_count

The count of end-to-end scheduling durations for the scheduler.

scheduler_e2e_scheduling_duration_seconds_sum

The sum of end-to-end scheduling durations for the scheduler.

scheduler_framework_extension_point_duration_seconds_bucket

The distribution of extension point durations for the scheduler framework.

scheduler_framework_extension_point_duration_seconds_count

The count of extension point durations for the scheduler framework.

scheduler_framework_extension_point_duration_seconds_sum

The sum of extension point durations for the scheduler framework.

scheduler_goroutines

The number of goroutines for the scheduler.

scheduler_pending_pods

The number of pending pods for the scheduler.

scheduler_plugin_evaluation_total

The total number of plugin evaluations for the scheduler.

scheduler_plugin_execution_duration_seconds_bucket

The distribution of execution durations in seconds for the scheduler plugins.

scheduler_plugin_execution_duration_seconds_count

The count of execution durations in seconds for the scheduler plugins.

scheduler_plugin_execution_duration_seconds_sum

The sum of execution durations in seconds for the scheduler plugins.

scheduler_pod_preemption_victims_bucket

The distribution of preemption victims for the scheduler.

scheduler_pod_preemption_victims_count

The count of preemption victims for the scheduler.

scheduler_pod_preemption_victims_sum

The sum of preemption victims for the scheduler.

scheduler_pod_scheduling_attempts_bucket

The distribution of pod scheduling attempts for the scheduler.

scheduler_pod_scheduling_attempts_count

The count of pod scheduling attempts for the scheduler.

scheduler_pod_scheduling_attempts_sum

The sum of pod scheduling attempts for the scheduler.

scheduler_pod_scheduling_duration_seconds_bucket

The distribution of pod scheduling durations in seconds for the scheduler.

scheduler_pod_scheduling_duration_seconds_count

The count of pod scheduling durations in seconds for the scheduler.

scheduler_pod_scheduling_duration_seconds_sum

The sum of pod scheduling durations in seconds for the scheduler.

scheduler_pod_scheduling_sli_duration_seconds_bucket

The distribution of SLI durations for pod scheduling.

scheduler_pod_scheduling_sli_duration_seconds_count

The count of SLI durations for pod scheduling.

scheduler_pod_scheduling_sli_duration_seconds_sum

The sum of SLI durations for pod scheduling.

scheduler_preemption_attempts_total

The total number of preemption attempts for the scheduler.

scheduler_preemption_victims_bucket

The distribution of preemption victims for the scheduler.

scheduler_preemption_victims_count

The count of preemption victims for the scheduler.

scheduler_preemption_victims_sum

The sum of preemption victims for the scheduler.

scheduler_queue_incoming_pods_total

The total number of incoming pods for the scheduler.

scheduler_schedule_attempts_total

The total number of scheduling attempts for the scheduler.

scheduler_scheduler_cache_size

The scheduler cache size.

scheduler_scheduler_goroutines

The number of goroutines for the scheduler.

scheduler_scheduling_algorithm_duration_seconds_bucket

The distribution of scheduling algorithm durations in seconds.

scheduler_scheduling_algorithm_duration_seconds_count

The count of scheduling algorithm durations in seconds.

scheduler_scheduling_algorithm_duration_seconds_sum

The sum of scheduling algorithm durations in seconds.

scheduler_scheduling_algorithm_predicate_evaluation_seconds_bucket

The distribution of predicate evaluation seconds for the scheduling algorithm.

scheduler_scheduling_algorithm_predicate_evaluation_seconds_count

The count of predicate evaluation seconds for the scheduling algorithm.

scheduler_scheduling_algorithm_predicate_evaluation_seconds_sum

The sum of predicate evaluation seconds for the scheduling algorithm.

scheduler_scheduling_algorithm_preemption_evaluation_seconds_bucket

The distribution of preemption evaluation seconds for the scheduling algorithm.

scheduler_scheduling_algorithm_preemption_evaluation_seconds_count

The count of preemption evaluation seconds for the scheduling algorithm.

scheduler_scheduling_algorithm_preemption_evaluation_seconds_sum

The sum of preemption evaluation seconds for the scheduling algorithm.

scheduler_scheduling_algorithm_priority_evaluation_seconds_bucket

The distribution of priority evaluation durations in seconds for the scheduling algorithm.

scheduler_scheduling_algorithm_priority_evaluation_seconds_count

The count of priority evaluation durations in seconds for the scheduling algorithm.

scheduler_scheduling_algorithm_priority_evaluation_seconds_sum

The sum of priority evaluation durations in seconds for the scheduling algorithm.

scheduler_scheduling_attempt_duration_seconds_bucket

The distribution of scheduling attempt durations.

scheduler_scheduling_attempt_duration_seconds_count

The count of scheduling attempt durations.

scheduler_scheduling_attempt_duration_seconds_sum

The sum of scheduling attempt durations.

scheduler_scheduling_duration_seconds

The distribution of scheduling durations in seconds.

scheduler_scheduling_duration_seconds_count

The count of scheduling durations in seconds.

scheduler_scheduling_duration_seconds_sum

The sum of scheduling durations in seconds.

scheduler_total_preemption_attempts

The total number of preemption attempts by the scheduler.

scheduler_unschedulable_pods

The number of unscheduled pods by the scheduler.

scheduler_volume_scheduling_duration_seconds_bucket

The distribution of volume scheduling durations in seconds.

scheduler_volume_scheduling_duration_seconds_count

The count of volume scheduling durations in seconds.

scheduler_volume_scheduling_duration_seconds_sum

The sum of volume scheduling durations in seconds.

scheduler_volume_scheduling_stage_error_total

The number of errors that are returned during volume scheduling.

scrape_duration_seconds

The scrape duration in seconds.

scrape_samples_post_metric_relabeling

The number of scraped samples after metric relabeling.

scrape_samples_scraped

The number of scraped samples.

scrape_series_added

The number of new series added during the scrape.

up

The connectivity of metric collection.

workqueue_adds_total

The total number of additions to the work queue.

workqueue_depth

The work queue depth.

workqueue_longest_running_processor_seconds

The longest running processor duration in seconds for the work queue.

workqueue_queue_duration_seconds_bucket

The distribution of queue durations in seconds for the work queue.

workqueue_queue_duration_seconds_count

The count of queue durations in seconds for the work queue.

workqueue_queue_duration_seconds_sum

The sum of queue durations in seconds for the work queue.

workqueue_retries_total

The total number of retries in the work queue.

workqueue_unfinished_work_seconds

The unfinished work duration in seconds for the work queue.

workqueue_work_duration_seconds_bucket

The distribution of work durations for the work queue.

workqueue_work_duration_seconds_count

The count of work durations for the work queue.

workqueue_work_duration_seconds_sum

The sum of work durations for the work queue.

References