All Products
Search
Document Center

Application Real-Time Monitoring Service:Metrics

Last Updated:Aug 11, 2023

When you use Alibaba Cloud Managed Service for Prometheus, you are charged based on the number of reported data entries on billable metrics. The metrics are classified into two types: basic metrics and custom metrics. Basic metrics are free of charge. You are charged for custom metrics starting from January 6, 2020.

Kubernetes clusters

The following tables describe the basic metrics of Kubernetes clusters that are supported by Managed Service for Prometheus.

Jobs names and basic metrics related to Prometheus instance status

Job nameMetric typeMetric nameDescription
_arms-prom/kubelet/1Basic metricpromhttp_metric_handler_requests_in_flight-
go_memstats_mallocs_totalA counter value that shows the number of allocated heap objects. You can call the rate() function to calculate the allocation rate of heap objects.
go_memstats_lookups_totalA counter value that shows the number of dereferenced pointers. You can call the rate() function to calculate the dereferencing rate of pointers.
go_memstats_last_gc_time_secondsThe timestamp when the last garbage collection (GC) was completed.
go_memstats_heap_sys_bytesThe number of memory bytes allocated for the heap from the operating system, including the virtual address space that is reserved but not used.
go_memstats_heap_released_bytesThe number of free spans that have been returned to the operating system.
go_memstats_heap_objectsThe number of objects allocated on the heap. These objects change with GC and the allocation of new objects.
go_memstats_heap_inuse_bytesThe number of bytes occupied by the spans in use.
go_memstats_heap_idle_bytesThe number of memory bytes occupied by idle spans.
go_memstats_heap_alloc_bytesThe number of memory bytes allocated for heap objects, including not only all reachable heap objects, but also the unreachable objects that are not removed during GC.
go_memstats_gc_sys_bytesThe amount of memory occupied by GC metadata.
go_memstats_gc_cpu_fractionThe percentage of CPU time consumed by GC since the program was started.
go_memstats_frees_totalA counter value that shows the number of removed heap objects. You can call the rate() function to calculate the removal rate of heap objects. You can use the go_memstats_mallocs_total -go_memstats_frees_total formula to calculate the number of surviving heap objects.
go_memstats_buck_hash_sys_bytesThe amount of memory occupied by the hash tables used for profiling.
go_memstats_alloc_bytes_totalThe value of the metric increases as objects are allocated in the heap, but does not decrease when objects are removed. Similar to Prometheus counters, the rate() function can be called to query the memory consumption rate.
go_memstats_alloc_bytesThe number of memory bytes allocated for heap objects, including not only all reachable heap objects, but also the unreachable objects that are not removed during GC.
scrape_duration_seconds-
go_infoThe information about the Go version. The value is obtained by calling the runtime.Version() function.
go_goroutinesThe value is obtained by calling the runtime.NumGoroutine() function based on the sched scheduler structure and the global allglen variable. All fields in the sched structure may concurrently change. Therefore, the system checks whether the value is less than 1. If the value is less than 1, 1 is returned.
scrape_samples_post_metric_relabeling-
go_gc_duration_seconds_sum-
go_gc_duration_seconds_count-
blackbox_exporter_config_last_reload_successful-
blackbox_exporter_config_last_reload_success_timestamp_seconds-
scrape_samples_scraped-
blackbox_exporter_build_info-
arms_prometheus_target_scrapes_sample_out_of_order_total-
arms_prometheus_target_scrapes_sample_out_of_bounds_total-
arms_prometheus_target_scrapes_sample_duplicate_timestamp_total-
scrape_series_added-
arms_prometheus_target_scrapes_exceeded_sample_limit_total-
arms_prometheus_target_scrapes_cache_flush_forced_total_arms-prom/kubelet/1-
arms_prometheus_target_scrape_pools_total-
statsd_metric_mapper_cache_gets_total-
statsd_metric_mapper_cache_hits_total-
statsd_metric_mapper_cache_length-
arms_prometheus_target_scrape_pools_failed_total-
up-
arms_prometheus_target_scrape_pool_reloads_total-
arms_prometheus_target_scrape_pool_reloads_failed_total-

Job names and basic metrics related to API server data collection

Job nameMetric typeMetric name
apiserverBasic metricapiserver_request_duration_seconds_bucket (obsolete by default)
apiserver_admission_controller_admission_duration_seconds_bucket
apiserver_request_total
rest_client_requests_total
apiserver_admission_webhook_admission_duration_seconds_bucket
apiserver_current_inflight_requests
up
apiserver_admission_webhook_admission_duration_seconds_count
scrape_samples_post_metric_relabeling
scrape_samples_scraped
scrape_series_added
scrape_duration_seconds

Job names and basic metrics related to Ingress data collection

Job nameMetric typeMetric nameDescription
arms-ack-ingressBasic metricnginx_ingress_controller_request_duration_seconds_bucket-
nginx_ingress_controller_response_duration_seconds_bucket (obsolete by default)-
nginx_ingress_controller_response_size_bucket (obsolete by default)-
nginx_ingress_controller_request_size_bucket-
nginx_ingress_controller_bytes_sent_bucket-
go_gc_duration_secondsThe value is obtained by calling the debug.ReadGCStats() function. When the function is called, the PauseQuantile field of the GCStats structure is set to 5. The function will return the minimum percentile, 25%, 50%, 75%, and the maximum percentile of the GC pause time. Then, the Prometheus Go client creates a summary metric based on the returned percentile of the GC pause time, NumGC, and PauseTotal variables.
nginx_ingress_controller_nginx_process_connections-
nginx_ingress_controller_request_duration_seconds_sum-
nginx_ingress_controller_request_duration_seconds_count (obsolete by default)-
nginx_ingress_controller_bytes_sent_sum-
nginx_ingress_controller_request_size_sum-
nginx_ingress_controller_response_duration_seconds_count-
nginx_ingress_controller_response_duration_seconds_sum (obsolete by default)-
nginx_ingress_controller_response_size_count (obsolete by default)-
nginx_ingress_controller_bytes_sent_count-
nginx_ingress_controller_response_size_sum-
nginx_ingress_controller_request_size_count-
promhttp_metric_handler_requests_total-
nginx_ingress_controller_nginx_process_connections_total-
go_memstats_mcache_sys_bytesThe amount of memory allocated from the operating system for the mcache structure.
go_memstats_lookups_totalA counter value that shows the number of dereferenced pointers. You can call the rate() function to calculate the dereferencing rate of pointers.
go_threadsThe value is obtained by calling the runtime.CreateThreadProfile() function based on the global allm variable.
go_memstats_sys_bytesThe number of memory bytes that Go has obtained from the system.
go_memstats_last_gc_time_secondsThe timestamp when the last GC was completed.
go_memstats_heap_sys_bytesThe number of memory bytes allocated for the heap from the operating system, including the virtual address space that is reserved but not used.
go_memstats_heap_objectsThe number of objects allocated on the heap. These objects change with GC and the allocation of new objects.
go_memstats_heap_inuse_bytesThe number of bytes occupied by the spans in use.
go_memstats_heap_idle_bytesThe number of memory bytes occupied by idle spans.
go_memstats_heap_alloc_bytesThe number of memory bytes allocated for heap objects, including not only all reachable heap objects, but also the unreachable objects that are not removed during GC.
go_memstats_gc_sys_bytesThe amount of memory occupied by GC metadata.
promhttp_metric_handler_requests_in_flight-
go_memstats_stack_sys_bytesThe number of stack memory bytes obtained from the operating system. The value is obtained based on the value of the go_memstats_stack_inuse_bytes metric plus the size of the OS thread stack.
go_memstats_stack_inuse_bytesThe size of the used memory on a stack memory span on which at least one stack object is allocated.
go_memstats_gc_cpu_fractionThe percentage of CPU time consumed by GC since the program was started.
go_memstats_frees_totalA counter value that shows the number of removed heap objects. You can call the rate() function to calculate the removal rate of heap objects. You can use the go_memstats_mallocs_total - go_memstats_frees_total formula to calculate the number of surviving heap objects.
go_memstats_buck_hash_sys_bytesThe amount of memory occupied by the hash tables used for profiling.
go_memstats_alloc_bytes_totalThe value of the metric increases as objects are allocated in the heap, but does not decrease when objects are removed. Similar to Prometheus counters, the rate() function can be called to query the memory consumption rate.
go_memstats_alloc_bytesThe number of memory bytes allocated for heap objects, including not only all reachable heap objects, but also the unreachable objects that are not removed during GC.
nginx_ingress_controller_nginx_process_num_procs-
go_infoThe information about the Go version. The value is obtained by calling the runtime.Version() function.
go_memstats_mallocs_totalA counter value that shows the number of allocated heap objects. You can call the rate() function to calculate the allocation rate of heap objects.
go_memstats_other_sys_bytesThe size of the memory used for other runtime allocations.
go_goroutinesThe value is obtained by calling the runtime.NumGoroutine() function based on the sched scheduler structure and the global allglen variable. All fields in the sched structure may concurrently change. Therefore, the system checks whether the value is less than 1. If the value is less than 1, 1 is returned.
scrape_samples_post_metric_relabeling-
scrape_samples_scraped-
process_virtual_memory_max_bytes-
process_virtual_memory_bytesThe virtual set size (VSS). The value indicates all allocated memory, including the memory that is allocated but not used, and the memory that is shared and swapped out.
scrape_duration_seconds-
go_memstats_heap_released_bytesThe number of free spans that have been returned to the operating system.
go_gc_duration_seconds_sum-
go_memstats_next_gc_bytesThe heap memory size during the next GC cycle. GC is used to guarantee that the value is no less than the value of the go_memstats_heap_alloc_bytes metric.
go_gc_duration_seconds_count-
nginx_ingress_controller_config_hash-
nginx_ingress_controller_config_last_reload_successful-
nginx_ingress_controller_config_last_reload_successful_timestamp_seconds-
nginx_ingress_controller_ingress_upstream_latency_seconds_count-
nginx_ingress_controller_ingress_upstream_latency_seconds_sum-
process_start_time_secondsThe value is obtained based on the start_time parameter. The start_time parameter specifies the time when a process starts. Unit: jiffy. The data comes from the /proc/stat directory. You can divide the value of the start_time parameter by USER_HZ to calculate the value, which is measured in seconds.
nginx_ingress_controller_nginx_process_cpu_seconds_total-
scrape_series_added-
nginx_ingress_controller_nginx_process_oldest_start_time_seconds-
nginx_ingress_controller_nginx_process_read_bytes_total-
nginx_ingress_controller_nginx_process_requests_total-
nginx_ingress_controller_nginx_process_resident_memory_bytes-
nginx_ingress_controller_nginx_process_virtual_memory_bytes-
nginx_ingress_controller_nginx_process_write_bytes_total-
nginx_ingress_controller_requests-
go_memstats_mcache_inuse_bytesThe amount of memory used by the mcache structure.
nginx_ingress_controller_success-
process_resident_memory_bytesThe RSS. The value indicates the actual memory used by processes, including the shared memory. The memory that is allocated but not used, or the memory that is swapped out is not included.
process_open_fdsThe value is obtained by calculating the total number of files in the /proc/PID/fd directory. It shows the total number of regular files, sockets, and pseudo-terminals opened by Go processes.
process_max_fdsThe value is obtained by reading the value of the Max Open Files row in the /proc/{PID}/limits file. The value is a soft limit. The soft limit is the value that the kernel uses to limit the resources. The hard limit is the maximum value of the soft limit.
process_cpu_seconds_totalThe value is obtained based on the utime parameter (the number of ticks executed by the Go process in user mode) and the stime parameter (the number of ticks executed by the Go process in kernel mode or when the system is called). Unit of the parameters: jiffy, which measures the tick time between two system timer interruptions. The value of the process_cpu_seconds_total metric is the sum of utime and stime divided by USER_HZ. The total number of program ticks divided by the tick rate (ticks per second. Unit: Hz) is the total time (unit: seconds) that the operating system has been running the process.
go_memstats_mspan_sys_bytesThe amount of memory allocated from the operating system for the mspan structure.
up-
go_memstats_mspan_inuse_bytesThe amount of memory used by the mspan structure.
nginx_ingress_controller_ssl_expire_time_seconds-
nginx_ingress_controller_leader_election_status-

Job names and basic metrics related to CoreDNS data collection

Job nameMetric typeMetric nameDescription
arms-ack-corednsBasic metriccoredns_forward_request_duration_seconds_bucket-
coredns_dns_request_size_bytes_bucket-
coredns_dns_response_size_bytes_bucket-
coredns_kubernetes_dns_programming_duration_seconds_bucket-
coredns_dns_request_duration_seconds_bucket-
coredns_plugin_enabled-
coredns_health_request_duration_seconds_bucket-
go_gc_duration_secondsThe value is obtained by calling the debug.ReadGCStats() function. When the function is called, the PauseQuantile field of the GCStats structure is set to 5. The function will return the minimum percentile, 25%, 50%, 75%, and the maximum percentile of the GC pause time. Then, the Prometheus Go client creates a summary metric based on the returned percentile of the GC pause time, NumGC, and PauseTotal variables.
coredns_forward_responses_total-
coredns_forward_request_duration_seconds_sum-
coredns_forward_request_duration_seconds_count-
coredns_dns_requests_total-
coredns_forward_conn_cache_misses_total-
coredns_dns_responses_total-
coredns_cache_entries-
coredns_cache_hits_total-
coredns_forward_conn_cache_hits_total-
coredns_forward_requests_total-
coredns_dns_request_size_bytes_sum-
coredns_dns_response_size_bytes_count-
coredns_dns_response_size_bytes_sum-
coredns_dns_request_size_bytes_count-
scrape_duration_seconds-
scrape_samples_scraped-
scrape_series_added-
up-
scrape_samples_post_metric_relabeling-
go_memstats_lookups_totalA counter value that shows the number of dereferenced pointers. You can call the rate() function to calculate the dereferencing rate of pointers.
go_memstats_last_gc_time_secondsThe timestamp when the last GC was completed.
go_memstats_heap_sys_bytesThe number of memory bytes allocated for the heap from the operating system, including the virtual address space that is reserved but not used.
coredns_build_info-
go_memstats_heap_released_bytesThe number of free spans that have been returned to the operating system.
go_memstats_heap_objectsThe number of objects allocated on the heap. These objects change with GC and the allocation of new objects.
go_memstats_heap_inuse_bytesThe number of bytes occupied by the spans in use.
go_memstats_heap_idle_bytesThe number of memory bytes occupied by idle spans.
go_memstats_heap_alloc_bytesThe number of memory bytes allocated for heap objects, including not only all reachable heap objects, but also the unreachable objects that are not removed during GC.
go_memstats_gc_sys_bytesThe amount of memory occupied by GC metadata.
go_memstats_sys_bytesThe number of memory bytes that Go has obtained from the system.
go_memstats_stack_sys_bytesThe number of stack memory bytes obtained from the operating system. The value is obtained based on the value of the go_memstats_stack_inuse_bytes metric plus the size of the OS thread stack.
go_memstats_mallocs_totalA counter value that shows the number of allocated heap objects. You can call the rate() function to calculate the allocation rate of heap objects.
go_memstats_gc_cpu_fractionThe percentage of CPU time consumed by GC since the program was started.
go_memstats_stack_inuse_bytesThe size of the used memory on a stack memory span on which at least one stack object is allocated.
go_memstats_frees_totalA counter value that shows the number of removed heap objects. You can call the rate() function to calculate the removal rate of heap objects. You can use the go_memstats_mallocs_total - go_memstats_frees_total formula to calculate the number of surviving heap objects.
go_memstats_buck_hash_sys_bytesThe amount of memory occupied by the hash tables used for profiling.
go_memstats_alloc_bytes_totalThe value of the metric increases as objects are allocated in the heap, but does not decrease when objects are removed. Similar to Prometheus counters, the rate() function can be called to query the memory consumption rate.
go_memstats_alloc_bytesThe number of memory bytes allocated for heap objects. The value is the same as the value of the go_memstats_heap_alloc_bytes metric. The heap objects include not only all reachable heap objects, but also the unreachable objects that are not removed during GC.
coredns_cache_misses_total-
go_memstats_other_sys_bytesThe size of the memory used for other runtime allocations.
go_memstats_mcache_inuse_bytesThe amount of memory used by the mcache structure.
go_goroutinesThe value is obtained by calling the runtime.NumGoroutine() function based on the sched scheduler structure and the global allglen variable. All fields in the sched structure may concurrently change. Therefore, the system checks whether the value is less than 1. If the value is less than 1, 1 is returned.
process_virtual_memory_max_bytes-
process_virtual_memory_bytesThe VSS. The value indicates all allocated memory, including the memory that is allocated but not used, and the memory that is shared and swapped out.
go_gc_duration_seconds_sum-
go_gc_duration_seconds_countarms-ack-coredns-
go_memstats_next_gc_bytesThe heap memory size during the next GC cycle. GC is used to guarantee that the value is no less than the value of the go_memstats_heap_alloc_bytes metric.
coredns_dns_request_duration_seconds_count-
coredns_reload_failed_total-
coredns_panics_total-
coredns_local_localhost_requests_total-
coredns_kubernetes_dns_programming_duration_seconds_sum-
coredns_kubernetes_dns_programming_duration_seconds_count-
coredns_dns_request_duration_seconds_sum-
coredns_hosts_reload_timestamp_seconds-
oredns_health_request_failures_total-
process_start_time_secondsThe value is obtained based on the start_time parameter. The start_time parameter specifies the time when a process starts. Unit: jiffy. The data comes from the /proc/stat directory. You can divide the value of the start_time parameter by USER_HZ to calculate the value, which is measured in seconds.
process_resident_memory_bytesThe RSS. The value indicates the actual memory used by processes, including the shared memory. The memory that is allocated but not used, or the memory that is swapped out is not included.
process_open_fdsThe value is obtained by calculating the total number of files in the /proc/PID/fd directory. It shows the total number of regular files, sockets, and pseudo-terminals opened by Go processes.
process_max_fdsThe value is obtained by reading the value of the Max Open Files row in the /proc/{PID}/limits file. The value is a soft limit. The soft limit is the value that the kernel uses to limit the resources. The hard limit is the maximum value of the soft limit.
process_cpu_seconds_totalThe value is obtained based on the utime parameter (the number of ticks executed by the Go process in user mode) and the stime parameter (the number of ticks executed by the Go process in kernel mode or when the system is called). Unit of the parameters: jiffy, which measures the tick time between two system timer interruptions. The value of the process_cpu_seconds_total metric is the sum of utime and stime divided by USER_HZ. The total number of program ticks divided by the tick rate (ticks per second. Unit: Hz) is the total time (unit: seconds) that the operating system has been running the process.
coredns_health_request_duration_seconds_sum-
coredns_health_request_duration_seconds_count-
go_memstats_mspan_sys_bytesThe amount of memory allocated from the operating system for the mspan structure.
coredns_forward_max_concurrent_rejects_total-
coredns_forward_healthcheck_broken_total-
go_memstats_mcache_sys_bytesThe amount of memory allocated from the operating system for the mcache structure.
go_memstats_mspan_inuse_bytesThe amount of memory used by the mspan structure.
go_threadsThe value is obtained by calling the runtime.CreateThreadProfile() function based on the global allm variable.
go_infoThe information about the Go version. The value is obtained by calling the runtime.Version() function.

Job names and basic metrics related to Kube-State-Metrics data collection

Job nameMetric typeMetric name
_kube-state-metricsBasic metrickube_pod_container_status_waiting_reason
kube_pod_status_phase
kube_pod_container_status_last_terminated_reason
kube_pod_container_status_terminated_reason
kube_pod_status_ready
kube_node_status_condition
kube_pod_container_status_running
kube_pod_container_status_restarts_total
kube_pod_container_info
kube_pod_container_status_waiting
kube_pod_container_status_terminated
kube_pod_labels
kube_pod_owner
kube_pod_info
kube_pod_container_resource_limits
kube_persistentvolume_status_phase
kube_pod_container_resource_requests_memory_bytes
kube_pod_container_resource_requests_cpu_cores
kube_pod_container_resource_limits_memory_bytes
kube_node_status_capacity
kube_service_info
kube_pod_container_resource_limits_cpu_cores
kube_deployment_status_replicas_updated
kube_deployment_status_replicas_unavailable
kube_deployment_spec_replicas
kube_deployment_created
kube_deployment_metadata_generation
kube_deployment_status_replicas
kube_deployment_labels
kube_deployment_status_observed_generation
kube_deployment_status_replicas_available
kube_deployment_spec_strategy_rollingupdate_max_unavailable
kube_daemonset_status_desired_number_scheduled
kube_daemonset_updated_number_scheduled
kube_daemonset_status_number_ready
kube_daemonset_status_number_misscheduled
kube_daemonset_status_number_available
kube_daemonset_status_current_number_scheduled
kube_daemonset_created
kube_node_status_allocatable_cpu_cores
kube_node_status_capacity_memory_bytes
kube_node_spec_unschedulable
kube_node_status_allocatable_memory_bytes
kube_node_labels
kube_node_info
kube_namespace_labels
kube_node_status_capacity_cpu_cores
kube_node_status_capacity_pods
kube_node_status_allocatable_pods
kube_node_spec_taint
kube_statefulset_status_replicas
kube_statefulset_replicas
kube_statefulset_created
up
scrape_samples_scraped
scrape_duration_seconds
scrape_samples_post_metric_relabeling
scrape_series_added

Job names and basic metrics related to Kubelet data collection

Job nameMetric typeMetric nameDescription
_arms/kubelet/metricBasic metricrest_client_request_duration_seconds_bucket-
apiserver_client_certificate_expiration_seconds_bucket-
kubelet_pod_worker_duration_seconds_bucket-
kubelet_pleg_relist_duration_seconds_bucket-
workqueue_queue_duration_seconds_bucket-
rest_client_requests_total-
go_gc_duration_secondsThe value is obtained by calling the debug.ReadGCStats() function. When the function is called, the PauseQuantile field of the GCStats structure is set to 5. The function will return the minimum percentile, 25%, 50%, 75%, and the maximum percentile of the GC pause time. Then, the Prometheus Go client creates a summary metric based on the returned percentile of the GC pause time, NumGC, and PauseTotal variables.
process_cpu_seconds_totalThe value is obtained based on the utime parameter (the number of ticks executed by the Go process in user mode) and the stime parameter (the number of ticks executed by the Go process in kernel mode or when the system is called). Unit of the parameters: jiffy, which measures the tick time between two system timer interruptions. The value of the process_cpu_seconds_total metric is the sum of utime and stime divided by USER_HZ. The total number of program ticks divided by the tick rate (ticks per second. Unit: Hz) is the total time (unit: seconds) that the operating system has been running the process.
process_resident_memory_bytesThe RSS. The value indicates the actual memory used by processes, including the shared memory. The memory that is allocated but not used, or the memory that is swapped out is not included.
kubernetes_build_info-
kubelet_node_name-
kubelet_certificate_manager_client_ttl_seconds-
kubelet_certificate_manager_client_expiration_renew_errors-
scrape_duration_seconds-
go_goroutinesThe value is obtained by calling the runtime.NumGoroutine() function based on the sched scheduler structure and the global allglen variable. All fields in the sched structure may concurrently change. Therefore, the system checks whether the value is less than 1. If the value is less than 1, 1 is returned.
crape_samples_post_metric_relabeling-
scrape_samples_scraped-
scrape_series_added-
up-
apiserver_client_certificate_expiration_seconds_count-
workqueue_adds_total-
workqueue_depth-

Job names and basic metrics related to cAdvisor data collection

Job nameMetric typeMetric name
_arms/kubelet/cadvisorBasic metriccontainer_memory_failures_total (obsolete by default)
container_memory_rss
container_spec_memory_limit_bytes
container_memory_failcnt
container_memory_cache
container_memory_swap
container_memory_usage_bytes
container_memory_max_usage_bytes
container_cpu_load_average_10s
container_fs_reads_total (obsolete by default)
container_fs_writes_total (obsolete by default)
container_network_transmit_errors_total
container_network_receive_bytes_total
container_network_transmit_packets_total
container_network_receive_errors_total
container_network_receive_bytes_total
container_network_receive_errors_total
container_network_transmit_errors_total
container_memory_working_set_bytes
container_cpu_usage_seconds_total
container_fs_reads_bytes_total
container_fs_writes_bytes_total
container_spec_cpu_quota
container_cpu_cfs_periods_total
container_cpu_cfs_throttled_periods_total
container_cpu_cfs_throttled_seconds_total
container_fs_inodes_free
container_fs_io_time_seconds_total
container_fs_io_time_weighted_seconds_total
container_fs_limit_bytes
container_tasks_state (obsolete by default)
container_fs_read_seconds_total (obsolete by default)
container_fs_write_seconds_total (obsolete by default)
container_fs_usage_bytes
container_fs_inodes_total
container_fs_io_current
scrape_duration_seconds
scrape_samples_scraped
machine_cpu_cores
machine_memory_bytes
scrape_samples_post_metric_relabeling
scrape_series_added
up
_arms-prom/kube-apiserver/cadvisorBasic metricscrape_duration_seconds
up
scrape_samples_scraped
scrape_samples_post_metric_relabeling
scrape_series_added

Job names and basic metrics related to ACK Scheduler data collection

Job nameMetric typeMetric name
ack-schedulerBasic metricrest_client_request_duration_seconds_bucket
scheduler_pod_scheduling_attempts_bucket
rest_client_requests_total
scheduler_pending_pods
scheduler_scheduler_cache_size
up

Job names and basic metrics related to etcd data collection

Job nameMetric typeMetric name
etcdBasic metricetcd_disk_backend_commit_duration_seconds_bucket
up
etcd_server_has_leader
etcd_debugging_mvcc_keys_total
etcd_debugging_mvcc_db_total_size_in_bytes
etcd_server_leader_changes_seen_total

Job names and basic metrics related to node data collection

Job nameMetric typeMetric nameDescription
node-exporterBasic metricnode_filesystem_size_bytes-
node_filesystem_readonly-
node_filesystem_free_bytes-
node_filesystem_avail_bytes-
node_cpu_seconds_total-
node_network_receive_bytes_total-
node_network_receive_errs_total-
node_network_transmit_bytes_total-
node_network_receive_packets_total-
node_network_transmit_drop_total-
node_network_transmit_errs_total-
node_network_up-
node_network_transmit_packets_total-
node_network_receive_drop_total-
go_gc_duration_secondsThe value is obtained by calling the debug.ReadGCStats() function. When the function is called, the PauseQuantile field of the GCStats structure is set to 5. The function will return the minimum percentile, 25%, 50%, 75%, and the maximum percentile of the GC pause time. Then, the Prometheus Go client creates a summary metric based on the returned percentile of the GC pause time, NumGC, and PauseTotal variables.
node_load5-
node_filefd_allocated-
node_exporter_build_info-
node_disk_written_bytes_total-
node_disk_writes_completed_total-
node_disk_write_time_seconds_total-
node_nf_conntrack_entries-
node_nf_conntrack_entries_limit-
node_processes_max_processes-
node_processes_pids-
node_sockstat_TCP_alloc-
node_sockstat_TCP_inuse-
node_sockstat_TCP_tw-
node_timex_offset_seconds-
node_timex_sync_status-
node_uname_info-
node_vmstat_pgfault-
node_vmstat_pgmajfault-
node_vmstat_pgpgin-
node_vmstat_pgpgout-
node_disk_reads_completed_total-
node_disk_read_time_seconds_total-
process_cpu_seconds_totalThe value is obtained based on the utime parameter (the number of ticks executed by the Go process in user mode) and the stime parameter (the number of ticks executed by the Go process in kernel mode or when the system is called). Unit of the parameters: jiffy, which measures the tick time between two system timer interruptions. The value of the process_cpu_seconds_total metric is the sum of utime and stime divided by USER_HZ. The total number of program ticks divided by the tick rate (ticks per second. Unit: Hz) is the total time (unit: seconds) that the operating system has been running the process.
node_disk_read_bytes_total-
node_disk_io_time_weighted_seconds_total-
node_disk_io_time_seconds_total-
node_disk_io_now-
node_context_switches_total-
node_boot_time_seconds-
process_resident_memory_bytesThe RSS. The value indicates the actual memory used by processes, including the shared memory. The memory that is allocated but not used, or the memory that is swapped out is not included.
node_intr_total-
node_load1-
go_goroutinesThe value is obtained by calling the runtime.NumGoroutine() function based on the sched scheduler structure and the global allglen variable. All fields in the sched structure may concurrently change. Therefore, the system checks whether the value is less than 1. If the value is less than 1, 1 is returned.
scrape_duration_seconds-
node_load15-
scrape_samples_post_metric_relabeling-
node_netstat_Tcp_PassiveOpens-
scrape_samples_scraped-
node_netstat_Tcp_CurrEstab-
scrape_series_added-
node_netstat_Tcp_ActiveOpens-
node_memory_MemTotal_bytes-
node_memory_MemFree_bytes-
node_memory_MemAvailable_bytes-
node_memory_Cached_bytes-
up-
node_memory_Buffers_bytes-

Job names and basic metrics related to GPU data collection

Job nameMetric typeMetric nameDescription
gpu-exporterBasic metricgo_gc_duration_secondsThe value is obtained by calling the debug.ReadGCStats() function. When the function is called, the PauseQuantile field of the GCStats structure is set to 5. The function will return the minimum percentile, 25%, 50%, 75%, and the maximum percentile of the GC pause time. Then, the Prometheus Go client creates a summary metric based on the returned percentile of the GC pause time, NumGC, and PauseTotal variables.
promhttp_metric_handler_requests_total-
scrape_series_added-
up-
scrape_duration_seconds-
scrape_samples_scraped-
scrape_samples_post_metric_relabeling-
go_memstats_mcache_inuse_bytesThe amount of memory used by the mcache structure.
process_virtual_memory_max_bytes-
process_virtual_memory_bytesThe VSS. The value indicates all allocated memory, including the memory that is allocated but not used, and the memory that is shared and swapped out.
process_start_time_secondsThe value is obtained based on the start_time parameter. The start_time parameter specifies the time when a process starts. Unit: jiffy. The data comes from the /proc/stat directory. You can divide the value of the start_time parameter by USER_HZ to calculate the value, which is measured in seconds.
go_memstats_next_gc_bytesThe heap memory size during the next GC cycle. GC is used to guarantee that the value is no less than the value of the go_memstats_heap_alloc_bytes metric.
go_memstats_heap_objectsThe number of objects allocated on the heap. These objects change with GC and the allocation of new objects.
process_resident_memory_bytesThe RSS. The value indicates the actual memory used by processes, including the shared memory. The memory that is allocated but not used, or the memory that is swapped out is not included.
process_open_fdsThe value is obtained by calculating the total number of files in the /proc/PID/fd directory. It shows the total number of regular files, sockets, and pseudo-terminals opened by Go processes.
process_max_fdsThe value is obtained by reading the value of the Max Open Files row in the /proc/{PID}/limits file. The value is a soft limit. The soft limit is the value that the kernel uses to limit the resources. The hard limit is the maximum value of the soft limit.
go_memstats_other_sys_bytesThe size of the memory used for other runtime allocations.
go_gc_duration_seconds_count-
go_memstats_heap_alloc_bytesThe number of memory bytes allocated for heap objects, including not only all reachable heap objects, but also the unreachable objects that are not removed during GC.
process_cpu_seconds_totalThe value is obtained based on the utime parameter (the number of ticks executed by the Go process in user mode) and the stime parameter (the number of ticks executed by the Go process in kernel mode or when the system is called). Unit of the parameters: jiffy, which measures the tick time between two system timer interruptions. The value of the process_cpu_seconds_total metric is the sum of utime and stime divided by USER_HZ. The total number of program ticks divided by the tick rate (ticks per second. Unit: Hz) is the total time (unit: seconds) that the operating system has been running the process.
nvidia_gpu_temperature_celsius (obsolete by default)-
go_memstats_stack_inuse_bytesThe size of the used memory on a stack memory span on which at least one stack object is allocated.
nvidia_gpu_power_usage_milliwatts (obsolete by default)-
nvidia_gpu_num_devices (obsolete by default)-
nvidia_gpu_memory_used_bytes (obsolete by default)-
nvidia_gpu_memory_total_bytes (obsolete by default)-
go_memstats_stack_sys_bytesThe number of stack memory bytes obtained from the operating system. The value is obtained based on the value of the go_memstats_stack_inuse_bytes metric plus the size of the OS thread stack.
nvidia_gpu_memory_allocated_bytes (obsolete by default)-
nvidia_gpu_duty_cycle (obsolete by default)-
nvidia_gpu_allocated_num_devices (obsolete by default)-
promhttp_metric_handler_requests_in_flight-
go_memstats_sys_bytesThe number of memory bytes that Go has obtained from the system.
go_memstats_gc_sys_bytesThe amount of memory occupied by GC metadata.
go_memstats_gc_cpu_fractionThe percentage of CPU time consumed by GC since the program was started.
go_memstats_heap_released_bytesThe number of free spans that have been returned to the operating system.
go_memstats_frees_totalA counter value that shows the number of removed heap objects. You can call the rate() function to calculate the removal rate of heap objects. You can use the go_memstats_mallocs_total -go_memstats_frees_total formula to calculate the number of surviving heap objects.
go_threadsThe value is obtained by calling the runtime.CreateThreadProfile() function based on the global allm variable.
go_memstats_mspan_sys_bytesThe amount of memory allocated from the operating system for the mspan structure.
go_memstats_buck_hash_sys_bytesThe amount of memory occupied by the hash tables used for profiling.
go_memstats_alloc_bytes_totalThe value of the metric increases as objects are allocated in the heap, but does not decrease when objects are removed. Similar to Prometheus counters, the rate() function can be called to query the memory consumption rate.
go_memstats_heap_sys_bytesThe number of memory bytes allocated for the heap from the operating system, including the virtual address space that is reserved but not used.
go_memstats_mspan_inuse_bytesThe amount of memory used by the mspan structure.
go_memstats_alloc_bytesThe number of memory bytes allocated for heap objects. The value is the same as the value of the go_memstats_heap_alloc_bytes metric. The heap objects include not only all reachable heap objects, but also the unreachable objects that are not removed during GC.
go_infoThe information about the Go version. The value is obtained by calling the runtime.Version() function.
go_memstats_last_gc_time_secondsThe timestamp when the last GC was completed.
go_memstats_heap_inuse_bytesThe number of bytes occupied by the spans in use.
go_memstats_mcache_sys_bytesThe amount of memory allocated from the operating system for the mcache structure.
go_memstats_lookups_totalA counter value that shows the number of dereferenced pointers. You can call the rate() function to calculate the dereferencing rate of pointers.
go_memstats_mallocs_totalA counter value that shows the number of allocated heap objects. You can call the rate() function to calculate the allocation rate of heap objects.
go_gc_duration_seconds_sum-
go_goroutinesThe value is obtained by calling the runtime.NumGoroutine() function based on the sched scheduler structure and the global allglen variable. All fields in the sched structure may concurrently change. Therefore, the system checks whether the value is less than 1. If the value is less than 1, 1 is returned.
go_memstats_heap_idle_bytesThe number of memory bytes occupied by idle spans.

Job names and basic metrics related to PV data collection

Job nameMetric typeMetric name
k8s-csi-cluster-pvBasic metriccluster_pvc_detail_num_total
cluster_pv_detail_num_total
cluster_pv_status_num_total
cluster_scrape_collector_success
cluster_scrape_collector_duration_seconds
alibaba_cloud_storage_operator_build_info
cluster_pvc_status_num_total
scrape_duration_seconds
scrape_samples_post_metric_relabeling
scrape_samples_scraped
scrape_series_added
up
k8s-csi-node-pvBasic metriccluster_scrape_collector_duration_seconds
cluster_scrape_collector_success
alibaba_cloud_csi_driver_build_info
up
scrape_series_added
scrape_samples_post_metric_relabeling
scrape_samples_scraped
scrape_duration_seconds

Hybrid Cloud Monitoring

The following table describes the metrics of Hybrid Cloud Monitoring that are supported by Managed Service for Prometheus.

CategoryMetric typeMetric nameDescription
ECSCustom metriccpu_util_lizationThe CPU utilization of an Elastic Compute Service (ECS) instance.
internet_in_rateThe average rate of inbound traffic from the Internet to an ECS instance.
internet_out_rateThe average rate of outbound traffic from an ECS instance to the Internet.
disk_read_bpsThe bit rate of reads to all disks of an ECS instance.
disk_write_bpsThe number of reads per second to all disks of an ECS instance.
vpc_public_ip_internet_in_RateThe average rate of inbound traffic from the Internet to the IP address of an ECS instance.
vpc_public_ip_internet_out_RateThe utilization of outbound bandwidth from the IP address of an ECS instance to the Internet.
cpu_total(Agent) cpu.total
memory_totalspace(Agent) memory.total.space
memory_usedutilization(Agent) memory.used.utilization
diskusage_utilization(Agent) disk.usage.utilization_device
RDSCustom metriccpu_usage_averageThe CPU utilization.
disk_usageThe disk usage.
iops_usageThe IOPS usage.
connection_usageThe connection utilization.
data_delayThe latency of read-only instances.
memory_usageThe memory usage.
mysql_network_in_newThe inbound bandwidth of an ApsaraDB RDS for MySQL instance.
mysql_network_out_newThe outbound bandwidth of an ApsaraDB RDS for MySQL instance.
mysql_active_sessionsMySQL_ActiveSessions
sqlserver_network_in_newThe inbound bandwidth of an ApsaraDB RDS for SQL Server instance.
sqlserver_network_out_newThe outbound bandwidth of an ApsaraDB RDS for SQL Server instance.
NATCustom metricsnat_connectionThe number of SNAT connections.
snat_connection_drop_limitThe cumulative number of SNAT connections dropped due to the limit on the number of concurrent connections.
snat_connection_drop_rate_limitThe cumulative number of SNAT connections dropped due to the limit on the number of new connections.
net_rx_rateThe inbound bandwidth.
net_tx_rateThe outbound bandwidth.
net_rx_pkgsThe rate of inbound packets.
net_tx_pkgsThe rate of outbound packets.
RocketMQCustom metricconsumer_lag_gidThe number of accumulated messages.
receive_message_count_gidThe number of messages received per minute by a consumer group.
send_message_count_gidThe number of messages sent per minute by a producer group.
consumer_lag_topicThe number of accumulated messages of a topic or group.
receive_message_count_topicThe number of messages of a topic received per minute by a consumer group.
send_message_count_topicThe number of messages of a topic sent per minute by a producer group.
receive_message_count The number of messages received per minute.
send_message_count The number of messages sent per minute.
SLBCustom metrichealthy_server_countThe number of healthy backend ECS instances.
unhealthy_server_countThe number of unhealthy backend ECS instances.
packet_txThe number of inbound packets per second.
packet_rxThe number of outbound packets per second.
traffic_rx_newThe inbound bandwidth.
traffic_tx_newThe outbound bandwidth.
active_connectionThe number of active connections over TCP.
inactive_connectionThe number of inactive connections on a port.
new_connectionThe number of new connections over TCP.
max_connectionThe number of concurrent connections on a port.
instance_active_connectionThe number of active connections established to an instance.
instance_new_connectionThe number of new connections established to an instance per second.
instance_max_connectionThe maximum number of concurrent connections established to an instance per second.
instance_drop_connectionThe number of connections that are dropped per second on an instance.
instance_traffic_rxThe inbound traffic per second of an instance. Unit: bit.
instance_traffic_txThe outbound traffic per second of an instance. Unit: bit.
E-MapReduce (EMR)Custom metricactive_applicationsThe number of active jobs.
active_usersThe number of active users.
aggregate_containers_allocatedThe total number of allocated containers.
aggregate_containers_releasedThe total number of released containers.
allocated_containersThe number of allocated containers.
apps_completedThe number of completed jobs.
apps_failedThe number of failed jobs
apps_killedThe number of terminated jobs.
apps_pendingThe number of pending jobs.
apps_runningThe number of running jobs.
apps_submittedThe number of submitted jobs.
available_mbThe size of the memory available to the current queue.
available_vcoresThe number of vCores available to the current queue.
pending_containersThe number of pending containers.
reserved_containersThe number of reserved containers.
EIPCustom metricnet_rx_rateThe inbound bandwidth.
net_tx_rateThe outbound bandwidth.
net_rx_pkgs_rateThe rate of inbound packets.
net_tx_pkgs_rateThe rate of outbound packets.
out_ratelimit_drop_speedThe rate at which packets are dropped due to throttling.
OSSCustom metricavailabilityThe availability.
request_valid_rateThe ratio of valid requests.
success_rateThe ratio of successful requests.
network_error_rateThe ratio of failed requests due to network issues.
total_request_countThe total number of requests.
valid_countThe number of valid requests.
internet_sendThe outbound traffic over the Internet.
internet_recvThe inbound traffic over the Internet.
intranet_sendThe outbound traffic over the internal network.
intranet_recvThe inbound traffic over the internal network.
success_countThe total number of successful requests.
network_error_countThe total number of failed requests due to network issues.
client_timeout_countThe total number of failed requests due to client timeouts.
ElasticsearchCustom metricnode_cpu_utilizationThe CPU utilization of a node.
node_heap_memory_utilizationThe heap memory utilization of a node.
node_stats_exception_log_countThe number of exceptions.
node_stats_full_gc_collection_countThe number of full heap garbage collections (full GCs).
node_disk_utilizationThe disk usage of a node.
node_load_1mThe average load of a node over the last 1 minute.
cluster_query_qpsThe queries per second (QPS) of a cluster.
cluster_index_qpsClusterIndexQPS
LogstashCustom metriccpu_percentThe CPU utilization of a node.
node_heap_memoryThe memory usage of a node.
node_disk_usageThe disk usage of a node.
DRDSCustom metriccpu_utilizationThe CPU utilization.
connection_countThe number of connections.
logic_qpsThe logical QPS.
logic_rtThe logical response time (RT).
memory_utilizationThe memory usage.
network_input_trafficThe inbound bandwidth.
network_output_trafficThe outbound bandwidth.
physics_qpsThe physical QPS.
physics_rtThe physical RT.
thread_countThe number of active threads.
com_insert_selectThe number of INSERT and SELECT statements that are executed per second on a private ApsaraDB RDS for MySQL instance.
com_replaceThe number of REPLACE statements that are executed per second on a private ApsaraDB RDS for MySQL instance.
com_replace_selectThe number of REPLACE and SELECT statements that are executed per second on a private ApsaraDB RDS for MySQL instance.
com_selectThe number of SELECT statements that are executed per second on a private ApsaraDB RDS for MySQL instance.
com_updateThe number of UPDATE statements that are executed per second on a private ApsaraDB RDS for MySQL instance.
conn_usageThe connection usage of a private ApsaraDB RDS for MySQL instance.
cpu_usageThe CPU utilization of a private ApsaraDB RDS for MySQL instance.
disk_usageThe disk usage of a private ApsaraDB RDS for MySQL instance.
ibuf_dirty_ratioThe dirty page ratio of the buffer pool of a private ApsaraDB RDS for MySQL instance.
ibuf_pool_readsThe number of physical reads per second on a private ApsaraDB RDS for MySQL instance.
ibuf_read_hitThe read hit ratio of the buffer pool of a private ApsaraDB RDS for MySQL instance.
ibuf_request_rThe number of logical reads per second on a private ApsaraDB RDS for MySQL instance.
ibuf_request_wThe number of logical writes per second on a private ApsaraDB RDS for MySQL instance.
ibuf_use_ratioThe utilization of the buffer pool of a private ApsaraDB RDS for MySQL instance.
inno_data_readThe amount of data read per second on a private ApsaraDB RDS for MySQL instance that uses InnoDB.
inno_data_writtenThe amount of data written per second to a private ApsaraDB RDS for MySQL instance that uses InnoDB.
inno_row_deleteThe number of rows deleted per second from a private ApsaraDB RDS for MySQL instance that uses InnoDB.
inno_row_insertThe number of rows inserted per second to a private ApsaraDB RDS for MySQL instance that uses InnoDB.
inno_row_readedThe number of rows read per second on a private ApsaraDB RDS for MySQL instance that uses InnoDB.
inno_row_updateThe number of rows updated per second on a private ApsaraDB RDS for MySQL instance that uses InnoDB.
innodb_log_write_requestsThe number of write requests per second to the logs of a private ApsaraDB RDS for MySQL instance that uses InnoDB.
innodb_log_writesThe number of logical writes per second to the logs of a private ApsaraDB RDS for MySQL instance that uses InnoDB.
innodb_os_log_fsyncsThe number of times fsync is called per second to write data to the logs of a private ApsaraDB RDS for MySQL instance that uses InnoDB.
input_traffic_psThe inbound bandwidth of a private ApsaraDB RDS for MySQL instance.
iops_usageThe IOPS usage of a private ApsaraDB RDS for MySQL instance.
mem_usageThe memory usage of a private ApsaraDB RDS for MySQL instance.
output_traffic_psThe outbound bandwidth of a private ApsaraDB RDS for MySQL instance.
qpsThe QPS of a private ApsaraDB RDS for MySQL instance.
slave_lagThe latency of a private read-only ApsaraDB RDS for MySQL instance.
slow_queriesThe slow queries per second of a private ApsaraDB RDS for MySQL instance.
tb_tmp_diskThe number of temporary tables created per second on a private ApsaraDB RDS for MySQL instance.
KafkaCustom metricinstance_disk_capacityThe disk usage of an instance.
instance_message_inputThe number of messages produced on an instance.
instance_message_outputThe number of messages consumed on an instance.
topic_message_inputThe number of messages produced in a topic.
topic_message_outputThe number of messages consumed in a topic.
MongoDBCustom metriccpu_utilizationThe CPU utilization.
memory_utilizationThe memory usage.
disk_utilizationThe disk usage.
iops_utilizationThe IOPS usage.
qpsThe QPS.
connect_amountThe number of used connections.
instance_disk_amountThe disk space occupied by an instance.
data_disk_amountThe disk space occupied by data.
log_disk_amountThe disk space occupied by logs.
intranet_inThe inbound traffic over the internal network.
intranet_outThe outbound traffic over the internal network.
number_requestsThe number of requests.
op_insertThe number of insert operations.
op_queryThe number of query operations.
op_updateThe number of update operations.
op_deleteThe number of delete operations.
op_getmoreThe number of getMore operations.
op_commandThe number of operations performed by running commands.
PolarDBCustom metricactive_connectionsThe number of active connections.
blks_read_deltaThe number of reads to a data block.
cluster_active_sessionsThe number of active connections.
cluster_connection_utilizationThe connection utilization.
cluster_cpu_utilizationThe CPU utilization.
cluster_data_ioThe I/O throughput per second of a storage engine.
cluster_data_iopsThe IOPS of a storage engine.
cluster_mem_hit_ratioThe cache hit ratio.
cluster_memory_utilizationThe memory usage.
cluster_qpsThe QPS.
cluster_slow_queries_psThe number of slow queries per second.
cluster_tpsThe number of transactions per second.
conn_usageThe connection usage.
cpu_totalThe CPU utilization.
db_ageThe maximum database age.
instance_connection_utilizationThe connection usage of an instance.
instance_cpu_utilizationThe CPU utilization of an instance.
instance_input_bandwidthThe inbound bandwidth of an instance.
instance_memory_utilizationThe memory usage of an instance.
instance_output_bandwidthThe outbound bandwidth of an instance.
mem_usageThe memory usage.
pls_data_sizeThe disk data size of a PolarDB for PostgreSQL cluster.
pls_iopspg IOPS
pls_iops_readThe read IOPS of a PolarDB for PostgreSQL cluster.
pls_iops_writeThe write IOPS of a PolarDB for PostgreSQL cluster.
pls_pg_wal_dir_sizeThe size of write-ahead logging (WAL) files of a PolarDB for PostgreSQL cluster.
pls_throughputThe I/O throughput of a PolarDB for PostgreSQL cluster.
pls_throughput_readThe read I/O throughput of a PolarDB for PostgreSQL cluster.
pls_throughput_writeThe write I/O throughput of a PolarDB for PostgreSQL cluster.
swell_timeThe point in time at which data bloat occurs in a PolarDB for PostgreSQL cluster.
tpspg TPS
cluster_iopsThe IOPS.
RedisCustom metricintranet_in_ratioThe bandwidth utilization of writes.
intranet_out_ratioThe bandwidth utilization of reads.
failed_countThe number of failed operations.
cpu_usageThe CPU utilization.
used_memoryThe memory usage.
used_connectionThe number of used connections.
used_qpsThe number of used QPS.

Cloud service monitoring

The following table describes the metrics of cloud service monitoring that are supported by Managed Service for Prometheus.

ApsaraMQ for RocketMQ

CategoryMetric typeMetric nameDescription
ProducerCustom metricrocketmq_producer_requestsThe number of API calls that are made to send messages.
rocketmq_producer_messagesThe number of sent messages.
rocketmq_producer_message_size_bytesThe total size of sent messages.
rocketmq_producer_send_success_rateThe success rate of message sending.
rocketmq_producer_failure_api_callsThe number of failed API calls that are made to send messages.
rocketmq_producer_send_rt_milliseconds_avgThe average time required to send messages.
rocketmq_producer_send_rt_milliseconds_minThe minimum time required to send messages.
rocketmq_producer_send_rt_milliseconds_maxThe maximum time required to send messages.
rocketmq_producer_send_rt_milliseconds_p95The 95th percentile of the time required to send messages.
rocketmq_producer_send_rt_milliseconds_p99The 99th percentile of the time required to send messages.
ConsumerCustom metricrocketmq_consumer_requestsThe number of API calls that are made to consume messages.
rocketmq_consumer_send_back_requestsThe number of API calls that are made to return messages after consumers fail to consume messages.
rocketmq_consumer_send_back_messagesThe messages returned from consumers after consumers fail to consume messages.
rocketmq_consumer_messagesThe number of consumed messages.
rocketmq_consumer_message_size_bytesThe total size of messages consumed within 1 minute.
rocketmq_consumer_ready_and_inflight_messagesThe number of lagging messages, including ready messages and inflight messages.
rocketmq_consumer_ready_messagesThe number of ready messages.
rocketmq_consumer_inflight_messagesThe number of inflight messages.
rocketmq_consumer_queue_time_millisecondsThe queuing duration of messages.
rocketmq_consumer_message_await_time_milliseconds_avgThe average time required for consumer clients to allocate resources to process messages.
rocketmq_consumer_message_await_time_milliseconds_minThe minimum time required for consumer clients to allocate resources to process messages.
rocketmq_consumer_message_await_time_milliseconds_maxThe maximum time required for consumer clients to allocate resources to process messages.
rocketmq_consumer_message_await_time_milliseconds_p95The 95th percentile of the time required for consumer clients to allocate resources to process messages.
rocketmq_consumer_message_await_time_milliseconds_p99The 99th percentile of the time required for consumer clients to allocate resources to process messages.
rocketmq_consumer_message_process_time_milliseconds_avgThe average time required for consumers to process messages.
rocketmq_consumer_message_process_time_milliseconds_minThe minimum time required for consumers to process messages.
rocketmq_consumer_message_process_time_milliseconds_maxThe maximum time required for consumers to process messages.
rocketmq_consumer_message_process_time_milliseconds_p95The 95th percentile of the time required for consumers to process messages.
rocketmq_consumer_message_process_time_milliseconds_p99The 99th percentile of the time required for consumers to process messages.
rocketmq_consumer_consume_success_rateThe success rate of message consumption.
rocketmq_consumer_failure_api_callsThe number of failed API calls that are made to consume messages.
rocketmq_consumer_to_dlq_messagesThe number of dead-letter messages.
OverviewCustom metricrabbitmq_instance_api_totalThe number of instance-level API calls that are initiated within seconds.
rabbitmq_connections_opened_totalThe total number of opened connections.
rabbitmq_connections_closed_totalThe total number of closed connections.
rabbitmq_channels_opened_totalThe total number of opened channels.
rabbitmq_channels_closed_totalThe total number of closed channels.
rabbitmq_queues_declared_totalThe total number of declared queues.
rabbitmq_queues_deleted_totalThe total number of deleted queues.
rabbitmq_exchange_declared_total-
rabbitmq_exchange_deleted_total-
rabbitmq_exchange_bind_total-
rabbitmq_exchange_unbind_total-
rabbitmq_queue_bind_total-
rabbitmq_queue_unbind_total-
rabbitmq_connectionsThe number of connections that are being opened.
rabbitmq_channelsThe number of channels that are being opened.
ConnectionsCustom metricrabbitmq_connection_channelsThe number of channels on connections.
ExchangeCustom metricrabbitmq_exchange_messages_published_in_totalThe number of inbound messages.
rabbitmq_exchange_messages_published_out_totalThe number of outbound messages.
QueuesCustom metricrabbitmq_queue_messages_published_totalThe total number of messages published to queues.
rabbitmq_queue_messages_readyThe number of messages that are ready to be delivered to consumers.
rabbitmq_queue_messages_unackedThe number of messages that are being scheduled.
rabbitmq_queue_deliver_totalThe total number of messages that have been delivered to consumers but not yet consumed.
rabbitmq_queue_get_total-
rabbitmq_queue_ack_total-
rabbitmq_queue_uack_total-
rabbitmq_queue_recover_total-
rabbitmq_queue_reject_total-
rabbitmq_queue_consumersThe number of consumers in queues.

MongoDB

Metric typeMetric nameDescription
Custom metricavg_rtThe average response time of an instance.
bytes_inThe inbound traffic of an instance.
bytes_outThe outbound traffic of an instance.
bytes_read_into_cacheThe amount of data read from the WiredTiger cache.
bytes_written_from_cacheThe amount of data written into the WiredTiger cache.
commandThe QPS of protocol command operations.
conn_usageThe connection utilization of an instance. The value is generated by dividing the number of current connections by the maximum number of connections.
connections_activeThe number of active connections of an instance.
cpu_usageThe CPU utilization of an instance.
current_connThe total number of current connections of an instance.
data_iopsThe IOPS usage of the data disk.
data_sizeThe used data disk space of an instance.
deleteThe QPS of delete operations.
disk_usageThe disk usage of an instance. The value is generated by dividing the used space by the maximum space.
document_deleted_psThe number of documents deleted from an instance.
document_inserted_psThe number of documents inserted into an instance.
document_returned_psThe number of documents returned by an instance.
document_updated_psThe number of documents updated by an instance.
getmoreThe QPS of read operations.
gl_ac_readersThe number of global read locks currently used by an instance.
gl_ac_writersThe number of global write locks currently used by an instance.
gl_cq_readersThe length of the queue waiting for the global read locks.
gl_cq_totalThe length of the queue waiting for the global locks.
gl_cq_writersThe length of the queue waiting for global write locks.
ins_sizeThe used disk space of an instance.
insertThe QPS of insert operations.
iocheck_costThe I/O latency. The value indicates the I/O performance.
iops_usageThe IOPS usage.
job_cursors_closedThe number of cursors that are closed with closed sessions.
log_iopsThe IOPS usage of the log disk.
log_sizeThe used log disk space of an instance.
maximum_bytes_configuredThe maximum size of the WiredTiger disk.
mem_usageThe memory usage.
moveChunk_donor_started_psThe number of times that the current node is used as the moveChunk source shard.
moveChunk_recip_stared_psThe number of times that the current node is used as the moveChunk destination shard.
noTimeout_openThe number of opened cursors without a timeout period.
operation_exactIDCount_psThe number of requests that need to be broadcast to obtain information about the matched IDs.
operation_scanAndOrder_psThe number of requests for which indexes cannot be used for sorting.
operation_writeConflicts_psThe number of write conflicts.
pinned_openThe number of opened cursors with a timeout period.
queryThe QPS of query operations.
queryExecutor_scannedObject_psThe number of queried documents.
queryExecutor_scanned_psThe number of queried indexes.
read_concurrent_trans_availableThe number of concurrent read requests available in a WiredTiger request queue.
read_concurrent_trans_outThe number of concurrent read requests sent from a WiredTiger request queue.
repl_lagThe data synchronization latency of the primary and secondary nodes of an instance.
timed_outThe number of cursors that are closed due to timeout.
total_openThe number of cursors that are being opened.
ttl_deletedDocuments_psThe number of documents that are deleted due to time-to-Live (TTL) indexes.
ttl_passes_psThe number of delete operations that the background TTL threads perform.
updateThe QPS of update operations.
write_concurrent_trans_availableThe number of concurrent write requests available in a WiredTiger request queue.
write_concurrent_trans_outThe number of concurrent write requests sent from a WiredTiger request queue.
wt_cache_dirty_usageThe dirty cache usage of the WiredTiger storage engine of an instance.
wt_cache_usageThe cache usage of the WiredTiger storage engine of an instance.