When you use Alibaba Cloud Managed Service for Prometheus, you are charged based on the number of reported data samples on billable metrics. The metrics are classified into basic metrics and custom metrics. Basic metrics are free of charge. You are charged for custom metrics starting from January 6, 2020.
Kubernetes clusters
The following tables describe the basic metrics of Kubernetes clusters that are supported by Managed Service for Prometheus.
Jobs names and basic metrics related to Prometheus instance status
Job name | Metric type | Metric name | Description |
_arms-prom/kubelet/1 | Basic metric | promhttp_metric_handler_requests_in_flight | - |
go_memstats_mallocs_total | A counter value that shows the number of allocated heap objects. You can call the | ||
go_memstats_lookups_total | A counter value that shows the number of dereferenced pointers. You can call the | ||
go_memstats_last_gc_time_seconds | The timestamp when the last garbage collection (GC) was complete. | ||
go_memstats_heap_sys_bytes | The number of memory bytes allocated for the heap from the operating system, including the virtual address space that is reserved but not used. | ||
go_memstats_heap_released_bytes | The number of free spans that have been returned to the operating system. | ||
go_memstats_heap_objects | The number of objects allocated on the heap. The number varies based on GC and the allocation of new objects. | ||
go_memstats_heap_inuse_bytes | The number of bytes occupied by the spans in use. | ||
go_memstats_heap_idle_bytes | The number of memory bytes occupied by free spans. | ||
go_memstats_heap_alloc_bytes | The number of memory bytes allocated for heap objects. The heap objects include all reachable heap objects and the unreachable objects that are not removed during GC. | ||
go_memstats_gc_sys_bytes | The amount of memory occupied by GC metadata. | ||
go_memstats_gc_cpu_fraction | The percentage of CPU time consumed by GC since the program was started. | ||
go_memstats_frees_total | A counter value that shows the number of removed heap objects. You can call the | ||
go_memstats_buck_hash_sys_bytes | The amount of memory occupied by the hash tables used for profiling. | ||
go_memstats_alloc_bytes_total | The value of the metric increases as objects are allocated in the heap, but does not decrease when objects are removed. Similar to Prometheus counters, the | ||
go_memstats_alloc_bytes | The number of memory bytes allocated for heap objects. The heap objects include all reachable heap objects and the unreachable objects that are not removed during GC. | ||
scrape_duration_seconds | - | ||
go_info | The information about the Go version. The value is obtained by calling the | ||
go_goroutines | The value is obtained by calling the | ||
scrape_samples_post_metric_relabeling | - | ||
go_gc_duration_seconds_sum | - | ||
go_gc_duration_seconds_count | - | ||
blackbox_exporter_config_last_reload_successful | - | ||
blackbox_exporter_config_last_reload_success_timestamp_seconds | - | ||
scrape_samples_scraped | - | ||
blackbox_exporter_build_info | - | ||
arms_prometheus_target_scrapes_sample_out_of_order_total | - | ||
arms_prometheus_target_scrapes_sample_out_of_bounds_total | - | ||
arms_prometheus_target_scrapes_sample_duplicate_timestamp_total | - | ||
scrape_series_added | - | ||
arms_prometheus_target_scrapes_exceeded_sample_limit_total | - | ||
arms_prometheus_target_scrapes_cache_flush_forced_total_arms-prom/kubelet/1 | - | ||
arms_prometheus_target_scrape_pools_total | - | ||
statsd_metric_mapper_cache_gets_total | - | ||
statsd_metric_mapper_cache_hits_total | - | ||
statsd_metric_mapper_cache_length | - | ||
arms_prometheus_target_scrape_pools_failed_total | - | ||
up | - | ||
arms_prometheus_target_scrape_pool_reloads_total | - | ||
arms_prometheus_target_scrape_pool_reloads_failed_total | - |
Job names and basic metrics related to API server data collection
Job name | Metric type | Metric name |
apiserver | Basic metric | apiserver_request_duration_seconds_bucket (deprecated by default) |
apiserver_admission_controller_admission_duration_seconds_bucket | ||
apiserver_request_total | ||
rest_client_requests_total | ||
apiserver_admission_webhook_admission_duration_seconds_bucket | ||
apiserver_current_inflight_requests | ||
up | ||
apiserver_admission_webhook_admission_duration_seconds_count | ||
scrape_samples_post_metric_relabeling | ||
scrape_samples_scraped | ||
scrape_series_added | ||
scrape_duration_seconds |
Job names and basic metrics related to Ingress data collection
Job name | Metric type | Metric name | Description |
arms-ack-ingress | Basic metric | nginx_ingress_controller_request_duration_seconds_bucket | - |
nginx_ingress_controller_response_duration_seconds_bucket (deprecated by default) | - | ||
nginx_ingress_controller_response_size_bucket (deprecated by default) | - | ||
nginx_ingress_controller_request_size_bucket | - | ||
nginx_ingress_controller_bytes_sent_bucket | - | ||
go_gc_duration_seconds | The value is obtained by calling the | ||
nginx_ingress_controller_nginx_process_connections | - | ||
nginx_ingress_controller_request_duration_seconds_sum | - | ||
nginx_ingress_controller_request_duration_seconds_count (deprecated by default) | - | ||
nginx_ingress_controller_bytes_sent_sum | - | ||
nginx_ingress_controller_request_size_sum | - | ||
nginx_ingress_controller_response_duration_seconds_count | - | ||
nginx_ingress_controller_response_duration_seconds_sum (deprecated by default) | - | ||
nginx_ingress_controller_response_size_count (deprecated by default) | - | ||
nginx_ingress_controller_bytes_sent_count | - | ||
nginx_ingress_controller_response_size_sum | - | ||
nginx_ingress_controller_request_size_count | - | ||
promhttp_metric_handler_requests_total | - | ||
nginx_ingress_controller_nginx_process_connections_total | - | ||
go_memstats_mcache_sys_bytes | The amount of memory allocated from the operating system for the mcache structure. | ||
go_memstats_lookups_total | A counter value that shows the number of dereferenced pointers. You can call the | ||
go_threads | The value is obtained by calling the | ||
go_memstats_sys_bytes | The number of memory bytes that Go has obtained from the system. | ||
go_memstats_last_gc_time_seconds | The timestamp when the last GC was complete. | ||
go_memstats_heap_sys_bytes | The number of memory bytes allocated for the heap from the operating system, including the virtual address space that is reserved but not used. | ||
go_memstats_heap_objects | The number of objects allocated on the heap. The number varies based on GC and the allocation of new objects. | ||
go_memstats_heap_inuse_bytes | The number of bytes occupied by the spans in use. | ||
go_memstats_heap_idle_bytes | The number of memory bytes occupied by free spans. | ||
go_memstats_heap_alloc_bytes | The number of memory bytes allocated for heap objects. The heap objects include all reachable heap objects and the unreachable objects that are not removed during GC. | ||
go_memstats_gc_sys_bytes | The amount of memory occupied by GC metadata. | ||
promhttp_metric_handler_requests_in_flight | - | ||
go_memstats_stack_sys_bytes | The number of stack memory bytes obtained from the operating system. The value is obtained based on the value of the | ||
go_memstats_stack_inuse_bytes | The amount of used memory on a stack memory span on which at least one stack object is allocated. | ||
go_memstats_gc_cpu_fraction | The percentage of CPU time consumed by GC since the program was started. | ||
go_memstats_frees_total | A counter value that shows the number of removed heap objects. You can call the | ||
go_memstats_buck_hash_sys_bytes | The amount of memory occupied by the hash tables used for profiling. | ||
go_memstats_alloc_bytes_total | The value of the metric increases as objects are allocated in the heap, but does not decrease when objects are removed. Similar to Prometheus counters, the | ||
go_memstats_alloc_bytes | The number of memory bytes allocated for heap objects. The heap objects include all reachable heap objects and the unreachable objects that are not removed during GC. | ||
nginx_ingress_controller_nginx_process_num_procs | - | ||
go_info | The information about the Go version. The value is obtained by calling the | ||
go_memstats_mallocs_total | A counter value that shows the number of allocated heap objects. You can call the | ||
go_memstats_other_sys_bytes | The amount of memory used for other runtime allocations. | ||
go_goroutines | The value is obtained by calling the | ||
scrape_samples_post_metric_relabeling | - | ||
scrape_samples_scraped | - | ||
process_virtual_memory_max_bytes | - | ||
process_virtual_memory_bytes | The virtual set size (VSS). The value indicates all allocated memory, including the memory that is allocated but not used, and the memory that is shared and swapped out. | ||
scrape_duration_seconds | - | ||
go_memstats_heap_released_bytes | The number of free spans that have been returned to the operating system. | ||
go_gc_duration_seconds_sum | - | ||
go_memstats_next_gc_bytes | The amount of heap memory during the next GC cycle. GC is used to ensure that the value is greater than the value of the | ||
go_gc_duration_seconds_count | - | ||
nginx_ingress_controller_config_hash | - | ||
nginx_ingress_controller_config_last_reload_successful | - | ||
nginx_ingress_controller_config_last_reload_successful_timestamp_seconds | - | ||
nginx_ingress_controller_ingress_upstream_latency_seconds_count | - | ||
nginx_ingress_controller_ingress_upstream_latency_seconds_sum | - | ||
process_start_time_seconds | The value is obtained based on the | ||
nginx_ingress_controller_nginx_process_cpu_seconds_total | - | ||
scrape_series_added | - | ||
nginx_ingress_controller_nginx_process_oldest_start_time_seconds | - | ||
nginx_ingress_controller_nginx_process_read_bytes_total | - | ||
nginx_ingress_controller_nginx_process_requests_total | - | ||
nginx_ingress_controller_nginx_process_resident_memory_bytes | - | ||
nginx_ingress_controller_nginx_process_virtual_memory_bytes | - | ||
nginx_ingress_controller_nginx_process_write_bytes_total | - | ||
nginx_ingress_controller_requests | - | ||
go_memstats_mcache_inuse_bytes | The amount of memory used by the mcache structure. | ||
nginx_ingress_controller_success | - | ||
process_resident_memory_bytes | The resident set size (RSS). The value indicates the actual memory used by processes, including the shared memory. The memory that is allocated but not used, or the memory that is swapped out is not included. | ||
process_open_fds | The value is obtained by calculating the total number of files in the | ||
process_max_fds | The value is obtained by reading the value of the | ||
process_cpu_seconds_total | The value is obtained based on the | ||
go_memstats_mspan_sys_bytes | The amount of memory allocated from the operating system for the mspan structure. | ||
up | - | ||
go_memstats_mspan_inuse_bytes | The amount of memory used by the mspan structure. | ||
nginx_ingress_controller_ssl_expire_time_seconds | - | ||
nginx_ingress_controller_leader_election_status | - |
Job names and basic metrics related to CoreDNS data collection
Job name | Metric type | Metric name | Description |
arms-ack-coredns | Basic metric | coredns_forward_request_duration_seconds_bucket | - |
coredns_dns_request_size_bytes_bucket | - | ||
coredns_dns_response_size_bytes_bucket | - | ||
coredns_kubernetes_dns_programming_duration_seconds_bucket | - | ||
coredns_dns_request_duration_seconds_bucket | - | ||
coredns_plugin_enabled | - | ||
coredns_health_request_duration_seconds_bucket | - | ||
go_gc_duration_seconds | The value is obtained by calling the | ||
coredns_forward_responses_total | - | ||
coredns_forward_request_duration_seconds_sum | - | ||
coredns_forward_request_duration_seconds_count | - | ||
coredns_dns_requests_total | - | ||
coredns_forward_conn_cache_misses_total | - | ||
coredns_dns_responses_total | - | ||
coredns_cache_entries | - | ||
coredns_cache_hits_total | - | ||
coredns_forward_conn_cache_hits_total | - | ||
coredns_forward_requests_total | - | ||
coredns_dns_request_size_bytes_sum | - | ||
coredns_dns_response_size_bytes_count | - | ||
coredns_dns_response_size_bytes_sum | - | ||
coredns_dns_request_size_bytes_count | - | ||
scrape_duration_seconds | - | ||
scrape_samples_scraped | - | ||
scrape_series_added | - | ||
up | - | ||
scrape_samples_post_metric_relabeling | - | ||
go_memstats_lookups_total | A counter value that shows the number of dereferenced pointers. You can call the | ||
go_memstats_last_gc_time_seconds | The timestamp when the last GC was complete. | ||
go_memstats_heap_sys_bytes | The number of memory bytes allocated for the heap from the operating system, including the virtual address space that is reserved but not used. | ||
coredns_build_info | - | ||
go_memstats_heap_released_bytes | The number of free spans that have been returned to the operating system. | ||
go_memstats_heap_objects | The number of objects allocated on the heap. The number varies based on GC and the allocation of new objects. | ||
go_memstats_heap_inuse_bytes | The number of bytes occupied by the spans in use. | ||
go_memstats_heap_idle_bytes | The number of memory bytes occupied by free spans. | ||
go_memstats_heap_alloc_bytes | The number of memory bytes allocated for heap objects. The heap objects include all reachable heap objects and the unreachable objects that are not removed during GC. | ||
go_memstats_gc_sys_bytes | The amount of memory occupied by GC metadata. | ||
go_memstats_sys_bytes | The number of memory bytes that Go has obtained from the system. | ||
go_memstats_stack_sys_bytes | The number of stack memory bytes obtained from the operating system. The value is obtained based on the value of the | ||
go_memstats_mallocs_total | A counter value that shows the number of allocated heap objects. You can call the | ||
go_memstats_gc_cpu_fraction | The percentage of CPU time consumed by GC since the program was started. | ||
go_memstats_stack_inuse_bytes | The amount of used memory on a stack memory span on which at least one stack object is allocated. | ||
go_memstats_frees_total | A counter value that shows the number of removed heap objects. You can call the | ||
go_memstats_buck_hash_sys_bytes | The amount of memory occupied by the hash tables used for profiling. | ||
go_memstats_alloc_bytes_total | The value of the metric increases as objects are allocated in the heap, but does not decrease when objects are removed. Similar to Prometheus counters, the | ||
go_memstats_alloc_bytes | The number of memory bytes allocated for heap objects. The value is the same as the value of the | ||
coredns_cache_misses_total | - | ||
go_memstats_other_sys_bytes | The amount of memory used for other runtime allocations. | ||
go_memstats_mcache_inuse_bytes | The amount of memory used by the mcache structure. | ||
go_goroutines | The value is obtained by calling the | ||
process_virtual_memory_max_bytes | - | ||
process_virtual_memory_bytes | The VSS. The value indicates all allocated memory, including the memory that is allocated but not used, and the memory that is shared and swapped out. | ||
go_gc_duration_seconds_sum | - | ||
go_gc_duration_seconds_countarms-ack-coredns | - | ||
go_memstats_next_gc_bytes | The amount of heap memory during the next GC cycle. GC is used to ensure that the value is greater than the value of the | ||
coredns_dns_request_duration_seconds_count | - | ||
coredns_reload_failed_total | - | ||
coredns_panics_total | - | ||
coredns_local_localhost_requests_total | - | ||
coredns_kubernetes_dns_programming_duration_seconds_sum | - | ||
coredns_kubernetes_dns_programming_duration_seconds_count | - | ||
coredns_dns_request_duration_seconds_sum | - | ||
coredns_hosts_reload_timestamp_seconds | - | ||
oredns_health_request_failures_total | - | ||
process_start_time_seconds | The value is obtained based on the | ||
process_resident_memory_bytes | The RSS. The value indicates the actual memory used by processes, including the shared memory. The memory that is allocated but not used, or the memory that is swapped out is not included. | ||
process_open_fds | The value is obtained by calculating the total number of files in the | ||
process_max_fds | The value is obtained by reading the value of the | ||
process_cpu_seconds_total | The value is obtained based on the | ||
coredns_health_request_duration_seconds_sum | - | ||
coredns_health_request_duration_seconds_count | - | ||
go_memstats_mspan_sys_bytes | The amount of memory allocated from the operating system for the mspan structure. | ||
coredns_forward_max_concurrent_rejects_total | - | ||
coredns_forward_healthcheck_broken_total | - | ||
go_memstats_mcache_sys_bytes | The amount of memory allocated from the operating system for the mcache structure. | ||
go_memstats_mspan_inuse_bytes | The amount of memory used by the mspan structure. | ||
go_threads | The value is obtained by calling the | ||
go_info | The information about the Go version. The value is obtained by calling the |
Job names and basic metrics related to Kube-State-Metrics data collection
Job name | Metric type | Metric name |
_kube-state-metrics | Basic metric | kube_pod_container_status_waiting_reason |
kube_pod_status_phase | ||
kube_pod_container_status_last_terminated_reason | ||
kube_pod_container_status_terminated_reason | ||
kube_pod_status_ready | ||
kube_node_status_condition | ||
kube_pod_container_status_running | ||
kube_pod_container_status_restarts_total | ||
kube_pod_container_info | ||
kube_pod_container_status_waiting | ||
kube_pod_container_status_terminated | ||
kube_pod_labels | ||
kube_pod_owner | ||
kube_pod_info | ||
kube_pod_container_resource_limits | ||
kube_persistentvolume_status_phase | ||
kube_pod_container_resource_requests_memory_bytes | ||
kube_pod_container_resource_requests_cpu_cores | ||
kube_pod_container_resource_limits_memory_bytes | ||
kube_node_status_capacity | ||
kube_service_info | ||
kube_pod_container_resource_limits_cpu_cores | ||
kube_deployment_status_replicas_updated | ||
kube_deployment_status_replicas_unavailable | ||
kube_deployment_spec_replicas | ||
kube_deployment_created | ||
kube_deployment_metadata_generation | ||
kube_deployment_status_replicas | ||
kube_deployment_labels | ||
kube_deployment_status_observed_generation | ||
kube_deployment_status_replicas_available | ||
kube_deployment_spec_strategy_rollingupdate_max_unavailable | ||
kube_daemonset_status_desired_number_scheduled | ||
kube_daemonset_updated_number_scheduled | ||
kube_daemonset_status_number_ready | ||
kube_daemonset_status_number_misscheduled | ||
kube_daemonset_status_number_available | ||
kube_daemonset_status_current_number_scheduled | ||
kube_daemonset_created | ||
kube_node_status_allocatable_cpu_cores | ||
kube_node_status_capacity_memory_bytes | ||
kube_node_spec_unschedulable | ||
kube_node_status_allocatable_memory_bytes | ||
kube_node_labels | ||
kube_node_info | ||
kube_namespace_labels | ||
kube_node_status_capacity_cpu_cores | ||
kube_node_status_capacity_pods | ||
kube_node_status_allocatable_pods | ||
kube_node_spec_taint | ||
kube_statefulset_status_replicas | ||
kube_statefulset_replicas | ||
kube_statefulset_created | ||
up | ||
scrape_samples_scraped | ||
scrape_duration_seconds | ||
scrape_samples_post_metric_relabeling | ||
scrape_series_added |
Job names and basic metrics related to Kubelet data collection
Job name | Metric type | Metric name | Description |
_arms/kubelet/metric | Basic metric | rest_client_request_duration_seconds_bucket | - |
apiserver_client_certificate_expiration_seconds_bucket | - | ||
kubelet_pod_worker_duration_seconds_bucket | - | ||
kubelet_pleg_relist_duration_seconds_bucket | - | ||
workqueue_queue_duration_seconds_bucket | - | ||
rest_client_requests_total | - | ||
go_gc_duration_seconds | The value is obtained by calling the | ||
process_cpu_seconds_total | The value is obtained based on the | ||
process_resident_memory_bytes | The RSS. The value indicates the actual memory used by processes, including the shared memory. The memory that is allocated but not used, or the memory that is swapped out is not included. | ||
kubernetes_build_info | - | ||
kubelet_node_name | - | ||
kubelet_certificate_manager_client_ttl_seconds | - | ||
kubelet_certificate_manager_client_expiration_renew_errors | - | ||
scrape_duration_seconds | - | ||
go_goroutines | The value is obtained by calling the | ||
crape_samples_post_metric_relabeling | - | ||
scrape_samples_scraped | - | ||
scrape_series_added | - | ||
up | - | ||
apiserver_client_certificate_expiration_seconds_count | - | ||
workqueue_adds_total | - | ||
workqueue_depth | - |
Job names and basic metrics related to cAdvisor data collection
Job name | Metric type | Metric name |
_arms/kubelet/cadvisor | Basic metric | container_memory_failures_total (deprecated by default) |
container_memory_rss | ||
container_spec_memory_limit_bytes | ||
container_memory_failcnt | ||
container_memory_cache | ||
container_memory_swap | ||
container_memory_usage_bytes | ||
container_memory_max_usage_bytes | ||
container_cpu_load_average_10s | ||
container_fs_reads_total (deprecated by default) | ||
container_fs_writes_total (deprecated by default) | ||
container_network_transmit_errors_total | ||
container_network_receive_bytes_total | ||
container_network_transmit_packets_total | ||
container_network_receive_errors_total | ||
container_network_receive_bytes_total | ||
container_network_receive_errors_total | ||
container_network_transmit_errors_total | ||
container_memory_working_set_bytes | ||
container_cpu_usage_seconds_total | ||
container_fs_reads_bytes_total | ||
container_fs_writes_bytes_total | ||
container_spec_cpu_quota | ||
container_cpu_cfs_periods_total | ||
container_cpu_cfs_throttled_periods_total | ||
container_cpu_cfs_throttled_seconds_total | ||
container_fs_inodes_free | ||
container_fs_io_time_seconds_total | ||
container_fs_io_time_weighted_seconds_total | ||
container_fs_limit_bytes | ||
container_tasks_state (deprecated by default) | ||
container_fs_read_seconds_total (deprecated by default) | ||
container_fs_write_seconds_total (deprecated by default) | ||
container_fs_usage_bytes | ||
container_fs_inodes_total | ||
container_fs_io_current | ||
scrape_duration_seconds | ||
scrape_samples_scraped | ||
machine_cpu_cores | ||
machine_memory_bytes | ||
scrape_samples_post_metric_relabeling | ||
scrape_series_added | ||
up | ||
_arms-prom/kube-apiserver/cadvisor | Basic metric | scrape_duration_seconds |
up | ||
scrape_samples_scraped | ||
scrape_samples_post_metric_relabeling | ||
scrape_series_added |
Job names and basic metrics related to ACK Scheduler data collection
Job name | Metric type | Metric name |
ack-scheduler | Basic metric | rest_client_request_duration_seconds_bucket |
scheduler_pod_scheduling_attempts_bucket | ||
rest_client_requests_total | ||
scheduler_pending_pods | ||
scheduler_scheduler_cache_size | ||
up |
Job names and basic metrics related to etcd data collection
Job name | Metric type | Metric name |
etcd | Basic metric | etcd_disk_backend_commit_duration_seconds_bucket |
up | ||
etcd_server_has_leader | ||
etcd_debugging_mvcc_keys_total | ||
etcd_debugging_mvcc_db_total_size_in_bytes | ||
etcd_server_leader_changes_seen_total |
Job names and basic metrics related to node data collection
Job name | Metric type | Metric name | Description |
node-exporter | Basic metric | node_filesystem_size_bytes | - |
node_filesystem_readonly | - | ||
node_filesystem_free_bytes | - | ||
node_filesystem_avail_bytes | - | ||
node_cpu_seconds_total | - | ||
node_network_receive_bytes_total | - | ||
node_network_receive_errs_total | - | ||
node_network_transmit_bytes_total | - | ||
node_network_receive_packets_total | - | ||
node_network_transmit_drop_total | - | ||
node_network_transmit_errs_total | - | ||
node_network_up | - | ||
node_network_transmit_packets_total | - | ||
node_network_receive_drop_total | - | ||
go_gc_duration_seconds | The value is obtained by calling the | ||
node_load5 | - | ||
node_filefd_allocated | - | ||
node_exporter_build_info | - | ||
node_disk_written_bytes_total | - | ||
node_disk_writes_completed_total | - | ||
node_disk_write_time_seconds_total | - | ||
node_nf_conntrack_entries | - | ||
node_nf_conntrack_entries_limit | - | ||
node_processes_max_processes | - | ||
node_processes_pids | - | ||
node_sockstat_TCP_alloc | - | ||
node_sockstat_TCP_inuse | - | ||
node_sockstat_TCP_tw | - | ||
node_timex_offset_seconds | - | ||
node_timex_sync_status | - | ||
node_uname_info | - | ||
node_vmstat_pgfault | - | ||
node_vmstat_pgmajfault | - | ||
node_vmstat_pgpgin | - | ||
node_vmstat_pgpgout | - | ||
node_disk_reads_completed_total | - | ||
node_disk_read_time_seconds_total | - | ||
process_cpu_seconds_total | The value is obtained based on the | ||
node_disk_read_bytes_total | - | ||
node_disk_io_time_weighted_seconds_total | - | ||
node_disk_io_time_seconds_total | - | ||
node_disk_io_now | - | ||
node_context_switches_total | - | ||
node_boot_time_seconds | - | ||
process_resident_memory_bytes | The RSS. The value indicates the actual memory used by processes, including the shared memory. The memory that is allocated but not used, or the memory that is swapped out is not included. | ||
node_intr_total | - | ||
node_load1 | - | ||
go_goroutines | The value is obtained by calling the | ||
scrape_duration_seconds | - | ||
node_load15 | - | ||
scrape_samples_post_metric_relabeling | - | ||
node_netstat_Tcp_PassiveOpens | - | ||
scrape_samples_scraped | - | ||
node_netstat_Tcp_CurrEstab | - | ||
scrape_series_added | - | ||
node_netstat_Tcp_ActiveOpens | - | ||
node_memory_MemTotal_bytes | - | ||
node_memory_MemFree_bytes | - | ||
node_memory_MemAvailable_bytes | - | ||
node_memory_Cached_bytes | - | ||
up | - | ||
node_memory_Buffers_bytes | - |
Job names and basic metrics related to GPU data collection
Job name | Metric type | Metric name | Description |
gpu-exporter | Basic metric | go_gc_duration_seconds | The value is obtained by calling the |
promhttp_metric_handler_requests_total | - | ||
scrape_series_added | - | ||
up | - | ||
scrape_duration_seconds | - | ||
scrape_samples_scraped | - | ||
scrape_samples_post_metric_relabeling | - | ||
go_memstats_mcache_inuse_bytes | The amount of memory used by the mcache structure. | ||
process_virtual_memory_max_bytes | - | ||
process_virtual_memory_bytes | The VSS. The value indicates all allocated memory, including the memory that is allocated but not used, and the memory that is shared and swapped out. | ||
process_start_time_seconds | The value is obtained based on the | ||
go_memstats_next_gc_bytes | The amount of heap memory during the next GC cycle. GC is used to ensure that the value is greater than the value of the | ||
go_memstats_heap_objects | The number of objects allocated on the heap. The number varies based on GC and the allocation of new objects. | ||
process_resident_memory_bytes | The RSS. The value indicates the actual memory used by processes, including the shared memory. The memory that is allocated but not used, or the memory that is swapped out is not included. | ||
process_open_fds | The value is obtained by calculating the total number of files in the | ||
process_max_fds | The value is obtained by reading the value of the | ||
go_memstats_other_sys_bytes | The amount of memory used for other runtime allocations. | ||
go_gc_duration_seconds_count | - | ||
go_memstats_heap_alloc_bytes | The number of memory bytes allocated for heap objects. The heap objects include all reachable heap objects and the unreachable objects that are not removed during GC. | ||
process_cpu_seconds_total | The value is obtained based on the | ||
nvidia_gpu_temperature_celsius (deprecated by default) | - | ||
go_memstats_stack_inuse_bytes | The amount of used memory on a stack memory span on which at least one stack object is allocated. | ||
nvidia_gpu_power_usage_milliwatts (deprecated by default) | - | ||
nvidia_gpu_num_devices (deprecated by default) | - | ||
nvidia_gpu_memory_used_bytes (deprecated by default) | - | ||
nvidia_gpu_memory_total_bytes (deprecated by default) | - | ||
go_memstats_stack_sys_bytes | The number of stack memory bytes obtained from the operating system. The value is obtained based on the value of the | ||
nvidia_gpu_memory_allocated_bytes (deprecated by default) | - | ||
nvidia_gpu_duty_cycle (deprecated by default) | - | ||
nvidia_gpu_allocated_num_devices (deprecated by default) | - | ||
promhttp_metric_handler_requests_in_flight | - | ||
go_memstats_sys_bytes | The number of memory bytes that Go has obtained from the system. | ||
go_memstats_gc_sys_bytes | The amount of memory occupied by GC metadata. | ||
go_memstats_gc_cpu_fraction | The percentage of CPU time consumed by GC since the program was started. | ||
go_memstats_heap_released_bytes | The number of free spans that have been returned to the operating system. | ||
go_memstats_frees_total | A counter value that shows the number of removed heap objects. You can call the | ||
go_threads | The value is obtained by calling the | ||
go_memstats_mspan_sys_bytes | The amount of memory allocated from the operating system for the mspan structure. | ||
go_memstats_buck_hash_sys_bytes | The amount of memory occupied by the hash tables used for profiling. | ||
go_memstats_alloc_bytes_total | The value of the metric increases as objects are allocated in the heap, but does not decrease when objects are removed. Similar to Prometheus counters, the | ||
go_memstats_heap_sys_bytes | The number of memory bytes allocated for the heap from the operating system, including the virtual address space that is reserved but not used. | ||
go_memstats_mspan_inuse_bytes | The amount of memory used by the mspan structure. | ||
go_memstats_alloc_bytes | The number of memory bytes allocated for heap objects. The value is the same as the value of the | ||
go_info | The information about the Go version. The value is obtained by calling the | ||
go_memstats_last_gc_time_seconds | The timestamp when the last GC was complete. | ||
go_memstats_heap_inuse_bytes | The number of bytes occupied by the spans in use. | ||
go_memstats_mcache_sys_bytes | The amount of memory allocated from the operating system for the mcache structure. | ||
go_memstats_lookups_total | A counter value that shows the number of dereferenced pointers. You can call the | ||
go_memstats_mallocs_total | A counter value that shows the number of allocated heap objects. You can call the | ||
go_gc_duration_seconds_sum | - | ||
go_goroutines | The value is obtained by calling the | ||
go_memstats_heap_idle_bytes | The number of memory bytes occupied by free spans. |
Job names and basic metrics related to PV data collection
Job name | Metric type | Metric name |
k8s-csi-cluster-pv | Basic metric | cluster_pvc_detail_num_total |
cluster_pv_detail_num_total | ||
cluster_pv_status_num_total | ||
cluster_scrape_collector_success | ||
cluster_scrape_collector_duration_seconds | ||
alibaba_cloud_storage_operator_build_info | ||
cluster_pvc_status_num_total | ||
scrape_duration_seconds | ||
scrape_samples_post_metric_relabeling | ||
scrape_samples_scraped | ||
scrape_series_added | ||
up | ||
k8s-csi-node-pv | Basic metric | cluster_scrape_collector_duration_seconds |
cluster_scrape_collector_success | ||
alibaba_cloud_csi_driver_build_info | ||
up | ||
scrape_series_added | ||
scrape_samples_post_metric_relabeling | ||
scrape_samples_scraped | ||
scrape_duration_seconds |
Hybrid Cloud Monitoring
The following table describes the metrics of Hybrid Cloud Monitoring that are supported by Managed Service for Prometheus.
Category | Metric type | Metric name | Description |
ECS | Custom metric | cpu_util_lization | The CPU utilization of an Elastic Compute Service (ECS) instance. |
internet_in_rate | The average rate of inbound traffic from the Internet to an ECS instance. | ||
internet_out_rate | The average rate of outbound traffic from an ECS instance to the Internet. | ||
disk_read_bps | The bit rate of reads to all disks of an ECS instance. | ||
disk_write_bps | The number of reads per second to all disks of an ECS instance. | ||
vpc_public_ip_internet_in_Rate | The average rate of inbound traffic from the Internet to the IP address of an ECS instance. | ||
vpc_public_ip_internet_out_Rate | The average rate of outbound traffic from the IP address of an ECS instance to the Internet. | ||
cpu_total | (Agent) cpu.total | ||
memory_totalspace | (Agent) memory.total.space | ||
memory_usedutilization | (Agent) memory.used.utilization | ||
diskusage_utilization | (Agent) disk.usage.utilization_device | ||
RDS | Custom metric | cpu_usage_average | The CPU utilization. |
disk_usage | The disk usage. | ||
iops_usage | The IOPS usage. | ||
connection_usage | The connection usage. | ||
data_delay | The latency of read-only instances. | ||
memory_usage | The memory usage. | ||
mysql_network_in_new | The inbound bandwidth of an ApsaraDB RDS for MySQL instance. | ||
mysql_network_out_new | The outbound bandwidth of an ApsaraDB RDS for MySQL instance. | ||
mysql_active_sessions | MySQL_ActiveSessions | ||
sqlserver_network_in_new | The inbound bandwidth of an ApsaraDB RDS for SQL Server instance. | ||
sqlserver_network_out_new | The outbound bandwidth of an ApsaraDB RDS for SQL Server instance. | ||
NAT | Custom metric | snat_connection | The number of SNAT connections. |
snat_connection_drop_limit | The cumulative number of SNAT connections dropped due to the limit on the number of concurrent connections. | ||
snat_connection_drop_rate_limit | The cumulative number of SNAT connections dropped due to the limit on the number of new connections. | ||
net_rx_rate | The inbound bandwidth. | ||
net_tx_rate | The outbound bandwidth. | ||
net_rx_pkgs | The rate of inbound packets. | ||
net_tx_pkgs | The rate of outbound packets. | ||
RocketMQ | Custom metric | consumer_lag_gid | The number of accumulated messages. |
receive_message_count_gid | The number of messages received per minute by a consumer group. | ||
send_message_count_gid | The number of messages sent per minute by a producer group. | ||
consumer_lag_topic | The number of accumulated messages of a topic or group. | ||
receive_message_count_topic | The number of messages of a topic received per minute by a consumer group. | ||
send_message_count_topic | The number of messages of a topic sent per minute by a producer group. | ||
receive_message_count | The number of messages received per minute. | ||
send_message_count | The number of messages sent per minute. | ||
SLB | Custom metric | healthy_server_count | The number of healthy backend ECS instances. |
unhealthy_server_count | The number of unhealthy backend ECS instances. | ||
packet_tx | The number of inbound packets per second. | ||
packet_rx | The number of outbound packets per second. | ||
traffic_rx_new | The inbound bandwidth. | ||
traffic_tx_new | The outbound bandwidth. | ||
active_connection | The number of active connections over TCP. | ||
inactive_connection | The number of inactive connections on a port. | ||
new_connection | The number of new connections over TCP. | ||
max_connection | The number of concurrent connections on a port. | ||
instance_active_connection | The number of active connections established to an instance. | ||
instance_new_connection | The number of new connections established to an instance per second. | ||
instance_max_connection | The maximum number of concurrent connections established to an instance per second. | ||
instance_drop_connection | The number of connections that are dropped per second on an instance. | ||
instance_traffic_rx | The inbound traffic per second of an instance. Unit: bit. | ||
instance_traffic_tx | The outbound traffic per second of an instance. Unit: bit. | ||
E-MapReduce (EMR) | Custom metric | active_applications | The number of active jobs. |
active_users | The number of active users. | ||
aggregate_containers_allocated | The total number of allocated containers. | ||
aggregate_containers_released | The total number of released containers. | ||
allocated_containers | The number of allocated containers. | ||
apps_completed | The number of completed jobs. | ||
apps_failed | The number of failed jobs. | ||
apps_killed | The number of terminated jobs. | ||
apps_pending | The number of pending jobs. | ||
apps_running | The number of jobs that are running. | ||
apps_submitted | The number of submitted jobs. | ||
available_mb | The amount of memory available to the current queue. | ||
available_vcores | The number of vCores available to the current queue. | ||
pending_containers | The number of pending containers. | ||
reserved_containers | The number of reserved containers. | ||
EIP | Custom metric | net_rx_rate | The inbound bandwidth. |
net_tx_rate | The outbound bandwidth. | ||
net_rx_pkgs_rate | The rate of inbound packets. | ||
net_tx_pkgs_rate | The rate of outbound packets. | ||
out_ratelimit_drop_speed | The rate at which packets are dropped due to throttling. | ||
OSS | Custom metric | availability | The availability. |
request_valid_rate | The ratio of valid requests. | ||
success_rate | The ratio of successful requests. | ||
network_error_rate | The ratio of failed requests due to network issues. | ||
total_request_count | The total number of requests. | ||
valid_count | The number of valid requests. | ||
internet_send | The outbound traffic over the Internet. | ||
internet_recv | The inbound traffic over the Internet. | ||
intranet_send | The outbound traffic over the internal network. | ||
intranet_recv | The inbound traffic over the internal network. | ||
success_count | The total number of successful requests. | ||
network_error_count | The total number of failed requests due to network issues. | ||
client_timeout_count | The total number of failed requests due to client timeouts. | ||
Elasticsearch | Custom metric | node_cpu_utilization | The CPU utilization of a node. |
node_heap_memory_utilization | The heap memory usage of a node. | ||
node_stats_exception_log_count | The number of exceptions. | ||
node_stats_full_gc_collection_count | The number of full heap garbage collections (full GCs). | ||
node_disk_utilization | The disk usage of a node. | ||
node_load_1m | The average load of a node over the last 1 minute. | ||
cluster_query_qps | The queries per second (QPS) of a cluster. | ||
cluster_index_qps | ClusterIndexQPS | ||
Logstash | Custom metric | cpu_percent | The CPU utilization of a node. |
node_heap_memory | The memory usage of a node. | ||
node_disk_usage | The disk usage of a node. | ||
DRDS | Custom metric | cpu_utilization | The CPU utilization. |
connection_count | The number of connections. | ||
logic_qps | The logical QPS. | ||
logic_rt | The logical response time (RT). | ||
memory_utilization | The memory usage. | ||
network_input_traffic | The inbound bandwidth. | ||
network_output_traffic | The outbound bandwidth. | ||
physics_qps | The physical QPS. | ||
physics_rt | The physical RT. | ||
thread_count | The number of active threads. | ||
com_insert_select | The number of INSERT and SELECT statements that are executed per second on a private ApsaraDB RDS for MySQL instance. | ||
com_replace | The number of REPLACE statements that are executed per second on a private ApsaraDB RDS for MySQL instance. | ||
com_replace_select | The number of REPLACE and SELECT statements that are executed per second on a private ApsaraDB RDS for MySQL instance. | ||
com_select | The number of SELECT statements that are executed per second on a private ApsaraDB RDS for MySQL instance. | ||
com_update | The number of UPDATE statements that are executed per second on a private ApsaraDB RDS for MySQL instance. | ||
conn_usage | The connection usage of a private ApsaraDB RDS for MySQL instance. | ||
cpu_usage | The CPU utilization of a private ApsaraDB RDS for MySQL instance. | ||
disk_usage | The disk usage of a private ApsaraDB RDS for MySQL instance. | ||
ibuf_dirty_ratio | The dirty page ratio of the buffer pool of a private ApsaraDB RDS for MySQL instance. | ||
ibuf_pool_reads | The number of physical reads per second on a private ApsaraDB RDS for MySQL instance. | ||
ibuf_read_hit | The read hit ratio of the buffer pool of a private ApsaraDB RDS for MySQL instance. | ||
ibuf_request_r | The number of logical reads per second on a private ApsaraDB RDS for MySQL instance. | ||
ibuf_request_w | The number of logical writes per second on a private ApsaraDB RDS for MySQL instance. | ||
ibuf_use_ratio | The utilization of the buffer pool of a private ApsaraDB RDS for MySQL instance. | ||
inno_data_read | The amount of data read per second on a private ApsaraDB RDS for MySQL instance that uses InnoDB. | ||
inno_data_written | The amount of data written per second to a private ApsaraDB RDS for MySQL instance that uses InnoDB. | ||
inno_row_delete | The number of rows deleted per second from a private ApsaraDB RDS for MySQL instance that uses InnoDB. | ||
inno_row_insert | The number of rows inserted per second to a private ApsaraDB RDS for MySQL instance that uses InnoDB. | ||
inno_row_readed | The number of rows read per second on a private ApsaraDB RDS for MySQL instance that uses InnoDB. | ||
inno_row_update | The number of rows updated per second on a private ApsaraDB RDS for MySQL instance that uses InnoDB. | ||
innodb_log_write_requests | The number of write requests per second to the logs of a private ApsaraDB RDS for MySQL instance that uses InnoDB. | ||
innodb_log_writes | The number of logical writes per second to the logs of a private ApsaraDB RDS for MySQL instance that uses InnoDB. | ||
innodb_os_log_fsyncs | The number of times fsync is called per second to write data to the logs of a private ApsaraDB RDS for MySQL instance that uses InnoDB. | ||
input_traffic_ps | The inbound bandwidth of a private ApsaraDB RDS for MySQL instance. | ||
iops_usage | The IOPS usage of a private ApsaraDB RDS for MySQL instance. | ||
mem_usage | The memory usage of a private ApsaraDB RDS for MySQL instance. | ||
output_traffic_ps | The outbound bandwidth of a private ApsaraDB RDS for MySQL instance. | ||
qps | The QPS of a private ApsaraDB RDS for MySQL instance. | ||
slave_lag | The latency of a private read-only ApsaraDB RDS for MySQL instance. | ||
slow_queries | The slow queries per second of a private ApsaraDB RDS for MySQL instance. | ||
tb_tmp_disk | The number of temporary tables created per second on a private ApsaraDB RDS for MySQL instance. | ||
Kafka | Custom metric | instance_disk_capacity | The disk usage of an instance. |
instance_message_input | The number of messages produced on an instance. | ||
instance_message_output | The number of messages consumed on an instance. | ||
topic_message_input | The number of messages produced in a topic. | ||
topic_message_output | The number of messages consumed in a topic. | ||
MongoDB | Custom metric | cpu_utilization | The CPU utilization. |
memory_utilization | The memory usage. | ||
disk_utilization | The disk usage. | ||
iops_utilization | The IOPS usage. | ||
qps | The QPS. | ||
connect_amount | The number of used connections. | ||
instance_disk_amount | The disk space occupied by an instance. | ||
data_disk_amount | The disk space occupied by data. | ||
log_disk_amount | The disk space occupied by logs. | ||
intranet_in | The inbound traffic over the internal network. | ||
intranet_out | The outbound traffic over the internal network. | ||
number_requests | The number of requests. | ||
op_insert | The number of insert operations. | ||
op_query | The number of query operations. | ||
op_update | The number of update operations. | ||
op_delete | The number of delete operations. | ||
op_getmore | The number of getMore operations. | ||
op_command | The number of operations performed by running commands. | ||
PolarDB | Custom metric | active_connections | The number of active connections. |
blks_read_delta | The number of reads to a data block. | ||
cluster_active_sessions | The number of active connections. | ||
cluster_connection_utilization | The connection usage. | ||
cluster_cpu_utilization | The CPU utilization. | ||
cluster_data_io | The I/O throughput per second of a storage engine. | ||
cluster_data_iops | The IOPS of a storage engine. | ||
cluster_mem_hit_ratio | The cache hit ratio. | ||
cluster_memory_utilization | The memory usage. | ||
cluster_qps | The QPS. | ||
cluster_slow_queries_ps | The number of slow queries per second. | ||
cluster_tps | The number of transactions per second. | ||
conn_usage | The connection usage. | ||
cpu_total | The CPU utilization. | ||
db_age | The maximum database age. | ||
instance_connection_utilization | The connection usage of an instance. | ||
instance_cpu_utilization | The CPU utilization of an instance. | ||
instance_input_bandwidth | The inbound bandwidth of an instance. | ||
instance_memory_utilization | The memory usage of an instance. | ||
instance_output_bandwidth | The outbound bandwidth of an instance. | ||
mem_usage | The memory usage. | ||
pls_data_size | The disk data size of a PolarDB for PostgreSQL cluster. | ||
pls_iops | pg IOPS | ||
pls_iops_read | The read IOPS of a PolarDB for PostgreSQL cluster. | ||
pls_iops_write | The write IOPS of a PolarDB for PostgreSQL cluster. | ||
pls_pg_wal_dir_size | The size of write-ahead logging (WAL) logs of a PolarDB for PostgreSQL cluster. | ||
pls_throughput | The I/O throughput of a PolarDB for PostgreSQL cluster. | ||
pls_throughput_read | The read I/O throughput of a PolarDB for PostgreSQL cluster. | ||
pls_throughput_write | The write I/O throughput of a PolarDB for PostgreSQL cluster. | ||
swell_time | The point in time at which data bloat occurs in a PolarDB for PostgreSQL cluster. | ||
tps | pg TPS | ||
cluster_iops | The IOPS. | ||
Redis | Custom metric | intranet_in_ratio | The bandwidth utilization of writes. |
intranet_out_ratio | The bandwidth utilization of reads. | ||
failed_count | The number of failed operations. | ||
cpu_usage | The CPU utilization. | ||
used_memory | The memory usage. | ||
used_connection | The number of used connections. | ||
used_qps | The number of used QPS. |
Cloud service monitoring
The following table describes the metrics of cloud service monitoring that are supported by Managed Service for Prometheus.
ApsaraMQ for RocketMQ
Category | Metric type | Metric name | Description |
Producer | Custom metric | rocketmq_producer_requests | The number of API calls that are made to send messages. |
rocketmq_producer_messages | The number of sent messages. | ||
rocketmq_producer_message_size_bytes | The total size of sent messages. | ||
rocketmq_producer_send_success_rate | The success rate of message sending. | ||
rocketmq_producer_failure_api_calls | The number of failed API calls that are made to send messages. | ||
rocketmq_producer_send_rt_milliseconds_avg | The average time required to send messages. | ||
rocketmq_producer_send_rt_milliseconds_min | The minimum time required to send messages. | ||
rocketmq_producer_send_rt_milliseconds_max | The maximum time required to send messages. | ||
rocketmq_producer_send_rt_milliseconds_p95 | The 95th percentile of the time required to send messages. | ||
rocketmq_producer_send_rt_milliseconds_p99 | The 99th percentile of the time required to send messages. | ||
Consumer | Custom metric | rocketmq_consumer_requests | The number of API calls that are made to consume messages. |
rocketmq_consumer_send_back_requests | The number of API calls that are made to return messages after consumers fail to consume messages. | ||
rocketmq_consumer_send_back_messages | The messages returned from consumers after consumers fail to consume messages. | ||
rocketmq_consumer_messages | The number of consumed messages. | ||
rocketmq_consumer_message_size_bytes | The total size of messages consumed within 1 minute. | ||
rocketmq_consumer_ready_and_inflight_messages | The number of lagging messages, including ready messages and inflight messages. | ||
rocketmq_consumer_ready_messages | The number of ready messages. | ||
rocketmq_consumer_inflight_messages | The number of inflight messages. | ||
rocketmq_consumer_queue_time_milliseconds | The queuing duration of messages. | ||
rocketmq_consumer_message_await_time_milliseconds_avg | The average time required for consumer clients to allocate resources to process messages. | ||
rocketmq_consumer_message_await_time_milliseconds_min | The minimum time required for consumer clients to allocate resources to process messages. | ||
rocketmq_consumer_message_await_time_milliseconds_max | The maximum time required for consumer clients to allocate resources to process messages. | ||
rocketmq_consumer_message_await_time_milliseconds_p95 | The 95th percentile of the time required for consumer clients to allocate resources to process messages. | ||
rocketmq_consumer_message_await_time_milliseconds_p99 | The 99th percentile of the time required for consumer clients to allocate resources to process messages. | ||
rocketmq_consumer_message_process_time_milliseconds_avg | The average time required for consumers to process messages. | ||
rocketmq_consumer_message_process_time_milliseconds_min | The minimum time required for consumers to process messages. | ||
rocketmq_consumer_message_process_time_milliseconds_max | The maximum time required for consumers to process messages. | ||
rocketmq_consumer_message_process_time_milliseconds_p95 | The 95th percentile of the time required for consumers to process messages. | ||
rocketmq_consumer_message_process_time_milliseconds_p99 | The 99th percentile of the time required for consumers to process messages. | ||
rocketmq_consumer_consume_success_rate | The success rate of message consumption. | ||
rocketmq_consumer_failure_api_calls | The number of failed API calls that are made to consume messages. | ||
rocketmq_consumer_to_dlq_messages | The number of dead-letter messages. | ||
Overview | Custom metric | rabbitmq_instance_api_total | The number of instance-level API calls that are initiated within seconds. |
rabbitmq_connections_opened_total | The total number of opened connections. | ||
rabbitmq_connections_closed_total | The total number of closed connections. | ||
rabbitmq_channels_opened_total | The total number of opened channels. | ||
rabbitmq_channels_closed_total | The total number of closed channels. | ||
rabbitmq_queues_declared_total | The total number of declared queues. | ||
rabbitmq_queues_deleted_total | The total number of deleted queues. | ||
rabbitmq_exchange_declared_total | - | ||
rabbitmq_exchange_deleted_total | - | ||
rabbitmq_exchange_bind_total | - | ||
rabbitmq_exchange_unbind_total | - | ||
rabbitmq_queue_bind_total | - | ||
rabbitmq_queue_unbind_total | - | ||
rabbitmq_connections | The number of connections that are being opened. | ||
rabbitmq_channels | The number of channels that are being opened. | ||
Connections | Custom metric | rabbitmq_connection_channels | The number of channels on connections. |
Exchange | Custom metric | rabbitmq_exchange_messages_published_in_total | The number of inbound messages. |
rabbitmq_exchange_messages_published_out_total | The number of outbound messages. | ||
Queues | Custom metric | rabbitmq_queue_messages_published_total | The total number of messages published to queues. |
rabbitmq_queue_messages_ready | The number of messages that are ready to be delivered to consumers. | ||
rabbitmq_queue_messages_unacked | The number of messages that are being scheduled. | ||
rabbitmq_queue_deliver_total | The total number of messages that have been delivered to consumers but not yet consumed. | ||
rabbitmq_queue_get_total | - | ||
rabbitmq_queue_ack_total | - | ||
rabbitmq_queue_uack_total | - | ||
rabbitmq_queue_recover_total | - | ||
rabbitmq_queue_reject_total | - | ||
rabbitmq_queue_consumers | The number of consumers in queues. |
MongoDB
Metric type | Metric name | Description |
Custom metric | avg_rt | The average response time of an instance. |
bytes_in | The inbound traffic of an instance. | |
bytes_out | The outbound traffic of an instance. | |
bytes_read_into_cache | The amount of data read from the WiredTiger cache. | |
bytes_written_from_cache | The amount of data written into the WiredTiger cache. | |
command | The QPS of protocol command operations. | |
conn_usage | The connection usage of an instance. The value is generated by dividing the number of current connections by the maximum number of connections. | |
connections_active | The number of active connections of an instance. | |
cpu_usage | The CPU utilization of an instance. | |
current_conn | The total number of current connections of an instance. | |
data_iops | The IOPS usage of the data disk. | |
data_size | The used data disk space of an instance. | |
delete | The QPS of delete operations. | |
disk_usage | The disk usage of an instance. The value is generated by dividing the used space by the maximum space. | |
document_deleted_ps | The number of documents deleted from an instance. | |
document_inserted_ps | The number of documents inserted into an instance. | |
document_returned_ps | The number of documents returned by an instance. | |
document_updated_ps | The number of documents updated by an instance. | |
getmore | The QPS of read operations. | |
gl_ac_readers | The number of global read locks currently used by an instance. | |
gl_ac_writers | The number of global write locks currently used by an instance. | |
gl_cq_readers | The length of the queue waiting for the global read locks. | |
gl_cq_total | The length of the queue waiting for the global locks. | |
gl_cq_writers | The length of the queue waiting for global write locks. | |
ins_size | The used disk space of an instance. | |
insert | The QPS of insert operations. | |
iocheck_cost | The I/O latency. The value indicates the I/O performance. | |
iops_usage | The IOPS usage. | |
job_cursors_closed | The number of cursors that are closed with closed sessions. | |
log_iops | The IOPS usage of the log disk. | |
log_size | The used log disk space of an instance. | |
maximum_bytes_configured | The maximum size of the WiredTiger disk. | |
mem_usage | The memory usage. | |
moveChunk_donor_started_ps | The number of times that the current node is used as the moveChunk source shard. | |
moveChunk_recip_stared_ps | The number of times that the current node is used as the moveChunk destination shard. | |
noTimeout_open | The number of opened cursors without a timeout period. | |
operation_exactIDCount_ps | The number of requests that need to be broadcasted to obtain information about the matched IDs. | |
operation_scanAndOrder_ps | The number of requests for which indexes cannot be used for sorting. | |
operation_writeConflicts_ps | The number of write conflicts. | |
pinned_open | The number of opened cursors with a timeout period. | |
query | The QPS of query operations. | |
queryExecutor_scannedObject_ps | The number of queried documents. | |
queryExecutor_scanned_ps | The number of queried indexes. | |
read_concurrent_trans_available | The number of concurrent read requests available in a WiredTiger request queue. | |
read_concurrent_trans_out | The number of concurrent read requests sent from a WiredTiger request queue. | |
repl_lag | The data synchronization latency of the primary and secondary nodes of an instance. | |
timed_out | The number of cursors that are closed due to timeout. | |
total_open | The number of cursors that are being opened. | |
ttl_deletedDocuments_ps | The number of documents that are deleted due to time-to-live (TTL) indexes. | |
ttl_passes_ps | The number of delete operations that the background TTL threads perform. | |
update | The QPS of update operations. | |
write_concurrent_trans_available | The number of concurrent write requests available in a WiredTiger request queue. | |
write_concurrent_trans_out | The number of concurrent write requests sent from a WiredTiger request queue. | |
wt_cache_dirty_usage | The dirty cache usage of the WiredTiger storage engine of an instance. | |
wt_cache_usage | The dirty cache usage of the WiredTiger storage engine of an instance. |
Flink
Flink metrics
Metric name | Definition | Description | Unit | Metric type |
| The number of times that a job is restarted when a job failover occurs. | This metric indicates the number of times that a job is restarted when a job failover occurs. The number of times that the job is restarted when a JobManager failover occurs is not included. | Count | Custom metric |
| The processing latency. | If the value of this metric is large, data latency may occur in the job when the system pulls or processes data. | Milliseconds | Custom metric |
| The transmission latency. | If the value of this metric is large, data latency may occur in the job when the system pulls data. | Milliseconds | Custom metric |
| The total number of input data records of all operators. | If the value of this metric does not increase for an extended period of time for an operator, data may be missing from the source. Therefore, data fails to be transmitted. In this case, you must check the data of the source. | Count | Custom metric |
| The total number of output data records. | If the value of this metric does not increase for an extended period of time for an operator, an error may occur in the code logic of the job and data is missing. Therefore, data fails to be transmitted. In this case, you must check the code logic of the job. | Count | Custom metric |
| The total number of input bytes. | This metric measures the size of the input data records of the source. This helps observe the job throughput. | Byte | Custom metric |
| The total number of output bytes. | This metric measures the size of the output data records of the source. This helps observe the job throughput. | Byte | Custom metric |
| The total number of input data records of all operators. | If the value of this metric does not increase for an extended period of time for an operator, data may be missing from the source. Therefore, data fails to be transmitted. In this case, you must check the data of the source. | Count | Custom metric |
| The number of input data records per second for all data streams. | This metric measures the overall processing speed of data streams. For example, the value of this metric helps determine whether the overall processing speed of data streams meets the expected requirements and how the job performance changes under different input data loads. | Count/s | Custom metric |
| The total number of output data records. | If the value of this metric does not increase for an extended period of time for an operator, an error may occur in the code logic of the job and data is missing. Therefore, data fails to be transmitted. In this case, you must check the code logic of the job. | Count | Custom metric |
| The number of output data records per second for all data streams. | This metric measures the overall output speed of data streams. The speed indicates the number of output data records per second for all data streams. For example, the value of this metric helps determine whether the overall output speed of data streams meets the expected requirements and how the job performance changes under different output data loads. | Count/s | Custom metric |
| The total number of data records that flow into the source operator. | This metric measures the number of data records that flow into the source. | Count | Custom metric |
| The total number of output data records in a sink. | This metric measures the number of data records that are exported by the source. | Count | Custom metric |
| The number of input data records per second for all data streams. | This metric measures the overall processing speed of data streams. For example, the value of this metric helps determine whether the overall processing speed of data streams meets the expected requirements and how the job performance changes under different input data loads. | Count/s | Custom metric |
| The number of output data records per second for all data streams. | This metric measures the overall output speed of data streams. The speed indicates the number of output data records per second for all data streams. For example, the value of this metric helps determine whether the overall output speed of data streams meets the expected requirements and how the job performance changes under different output data loads. | Count/s | Custom metric |
| The number of input data records per second in a source. | This metric measures the speed at which data records are generated in a source. The speed indicates the number of input data records per second in the source. For example, the number of data records that can be generated varies based on the type of each source in a data stream. The value of this metric helps determine the speed at which data records are generated in a source and adjust data streams to improve performance. This metric is also used for monitoring and alerting. If the value of this metric is 0, data may be missing from the source. In this case, you must check whether data output is blocked because the data of the source is not consumed. | Count/s | Custom metric |
| The number of output data records per second in a sink. | This metric measures the speed at which data records are exported from a sink. The speed indicates the number of output data records per second in the sink. For example, the number of data records that can be exported varies based on the type of each sink in a data stream. The value of the numRecordsOutOfSinkPerSecond metric helps determine the speed at which data records are exported from a sink and adjust data streams to improve performance. This metric is also used for monitoring and alerting. If the value of this metric is 0, the code logic of the job may be invalid and all data is filtered out. In this case, you must check the code logic of the job. | Count/s | Custom metric |
| The number of locally consumed data buffers per second. | If the value of this metric is large, inter-task communication is frequent on the local node . | Count/s | Custom metric |
| The number of buffers received from the remote TaskManager per second. | This metric indicates the frequency of inter-TaskManager communication. | Count/s | Custom metric |
| The number of buffers sent to other tasks per second. | This metric helps understand the output pressure of tasks and the usage of network bandwidth. | Count/s | Custom metric |
| The total number of input bytes per second. | This metric measures the rate at which data flows into the source. This helps observe the job throughput. | Byte/s | Custom metric |
| The total number of output bytes per second. | This metric measures the rate at which data is exported by the source. This helps observe the job throughput. | Byte/s | Custom metric |
| The number of data records that are not read by the source. | This metric measures the number of data records that are not pulled by the source from the external system. | Count | Custom metric |
| The duration for which data is not processed in the source. | This metric specifies whether the source is idle. If the value of this metric is large, your data is generated at a low speed in the external system. | Milliseconds | Custom metric |
| The total number of input bytes per second. | None. | Byte/s | Custom metric |
| The total number of output bytes per second. | None. | Byte/s | Custom metric |
| The time consumed to send the latest record. | None. | Milliseconds | Custom metric |
| The total number of checkpoints. | None. | Count | Custom metric |
| The number of failed checkpoints. | None. | Count | Custom metric |
| The number of completed checkpoints. | None. | Count | Custom metric |
| The number of checkpoints that are in progress. | None. | Count | Custom metric |
| The duration for which the last checkpoint is used. | If the checkpoint takes an extended period of time or times out, the possible cause is that the storage space occupied by state data is excessively large, a temporary network error occurs, barriers are not aligned, or data backpressure exists. | Milliseconds | Custom metric |
| The size of the last checkpoint. | This metric measures the size of the last checkpoint that is uploaded. This metric helps analyze the checkpoint performance when a bottleneck occurs. | Byte | Custom metric |
| The maximum latency of a Clear operation on state data. | This metric measures the performance of a Clear operation on state data. | Nanoseconds | Custom metric |
| The maximum latency of a Get operation on ValueState data. | This metric measures the performance of accessing ValueState data by an operator. | Nanoseconds | Custom metric |
| The maximum latency of an Update operation on ValueState data. | This metric measures the performance of an Update operation on ValueState data. | Nanoseconds | Custom metric |
| The maximum latency of a Get operation on AggregatingState data. | This metric measures the performance of accessing AggregatingState data by an operator. | Nanoseconds | Custom metric |
| The maximum latency of an Add operation on AggregatingState data. | This metric measures the performance of an Add operation on AggregatingState data. | Nanoseconds | Custom metric |
| The maximum latency of a Merge Namespace operation on AggregatingState data. | This metric measures the performance of a Merge Namespace operation on AggregatingState data. | Nanoseconds | Custom metric |
| The maximum latency of a Get operation on ReducingState data. | This metric measures the performance of accessing ReducingState data by an operator. | Nanoseconds | Custom metric |
| The maximum latency of an Add operation on ReducingState data. | This metric measures the performance of an Add operation on ReducingState data. | Nanoseconds | Custom metric |
| The maximum latency of a Merge Namespace operation on ReducingState data. | This metric measures the performance of a Merge Namespace operation on ReducingState data. | Nanoseconds | Custom metric |
| The maximum latency of a Get operation on MapState data. | This metric measures the performance of accessing MapState data by an operator. | Nanoseconds | Custom metric |
| The maximum latency of a Put operation on MapState data. | This metric measures the performance of a Put operation on MapState data. | Nanoseconds | Custom metric |
| The maximum latency of a PutAll operation on MapState data. | This metric measures the performance of a PutAll operation on MapState data. | Nanoseconds | Custom metric |
| The maximum latency of a Remove operation on MapState data. | This metric measures the performance of a Remove operation on MapState data. | Nanoseconds | Custom metric |
| The maximum latency of a Contains operation on MapState data. | This metric measures the performance of a Contains operation on MapState data. | Nanoseconds | Custom metric |
| The maximum latency of an Init operation on MapState entries. | This metric measures the performance of an Init operation on MapState entries. | Nanoseconds | Custom metric |
| The maximum latency of an Init operation on MapState keys. | This metric measures the performance of an Init operation on MapState keys. | Nanoseconds | Custom metric |
| The maximum latency of an Init operation on MapState values. | This metric measures the performance of an Init operation on MapState values. | Nanoseconds | Custom metric |
| The maximum latency of an Init operation on MapState Iterator. | This metric measures the performance of an Init operation on MapState Iterator. | Nanoseconds | Custom metric |
| The maximum latency of an Empty operation on MapState data. | This metric measures the performance of an Empty operation on MapState data. | Nanoseconds | Custom metric |
| The maximum latency of a HasNext operation on MapState Iterator. | This metric measures the performance of a HasNext operation on MapState Iterator. | Nanoseconds | Custom metric |
| The maximum latency of a Next operation on MapState Iterator. | This metric measures the performance of a Next operation on MapState Iterator. | Nanoseconds | Custom metric |
| The maximum latency of a Remove operation on MapState Iterator. | This metric measures the performance of a Remove operation on MapState Iterator. | Nanoseconds | Custom metric |
| The maximum latency of a Get operation on ListState data. | This metric measures the performance of accessing ListState data by an operator. | Nanoseconds | Custom metric |
| The maximum latency of an Add operation on ListState data. | This metric measures the performance of an Add operation on ListState data. | Nanoseconds | Custom metric |
| The maximum latency of an AddAll operation on ListState data. | This metric measures the performance of an AddAll operation on ListState data. | Nanoseconds | Custom metric |
| The maximum latency of an Update operation on ListState data. | This metric measures the performance of an Update operation on ListState data. | Nanoseconds | Custom metric |
| The maximum latency of a Merge Namespace operation on ListState data. | This metric measures the performance of a Merge Namespace operation on ListState data. | Nanoseconds | Custom metric |
| The maximum latency of accessing the first entry of SortedMapState data. | This metric measures the performance of accessing SortedMapState data by an operator. | Nanoseconds | Custom metric |
| The maximum latency of accessing the last entry of SortedMapState data. | This metric measures the performance of accessing SortedMapState data by an operator. | Nanoseconds | Custom metric |
| The size of the state data. | This metric helps you perform the following operations:
| Byte | Custom metric |
| The size of the state data file. | This metric helps you perform the following operations:
| Byte | Custom metric |
| The time when each task receives the latest watermark. | This metric measures the latency of data receiving by the TaskManager. | N/A | Custom metric |
| The latency of watermarks. | This metric measures the latency of subtasks. | Milliseconds | Custom metric |
| The CPU load of the JobManager. | If the value of this metric is greater than 100% for an extended period of time, the CPU is busy and the CPU load is high. This may affect the system performance. As a result, issues such as system stuttering and slow response occur. | N/A | Basic metric |
| The amount of heap memory of the JobManager. | None. | Byte | Basic metric |
| The amount of heap memory committed by the JobManager. | None. | Byte | Basic metric |
| The maximum amount of heap memory of the JobManager. | None. | Byte | Basic metric |
| The amount of non-heap memory of the JobManager. | None. | Byte | Basic metric |
| The amount of non-heap memory committed by the JobManager. | None. | Byte | Basic metric |
| The maximum amount of non-heap memory of the JobManager. | None. | Byte | Basic metric |
| The number of threads of the JobManager. | A large number of threads of the JobManager occupies excessive memory space. This reduces the job stability. | Count | Basic metric |
| The number of GCs performed within the JobManager. | Frequent GCs can lead to excessive memory consumption and negatively affect job performance. This metric helps diagnose job issues and identify the causes of job failures. | Count | Basic metric |
| The number of young-generation GCs performed by the G1 garbage collector of the JobManager. | None. | Count | Custom metric |
| The number of old-generation GCs performed by the G1 garbage collector of the JobManager. | None. | Count | Custom metric |
| The time consumed by the G1 garbage collector of the JobManager to perform a young-generation GC. | None. | Milliseconds | Custom metric |
| The time consumed by the G1 garbage collector of the JobManager to perform a old-generation GC. | None. | Milliseconds | Custom metric |
| The number of GCs performed by the Concurrent Mark Sweep (CMS) garbage collector of the JobManager. | None. | Count | Basic metric |
| The duration for which each GC of the JobManager lasts. | If GC of the JobManager lasts for an extended period of time, excessive memory space is occupied. This affects the job performance. This metric helps diagnose job issues and identify the causes of job failures. | Milliseconds | Basic metric |
| The time consumed by the CMS garbage collector of the JobManager to perform a GC. | None. | Milliseconds | Basic metric |
| The total number of classes that are loaded after the Java Virtual Machine (JVM) in which the JobManager resides is created. | If the total number of classes that are loaded is excessively large after the JVM in which the JobManager resides is created, excessive memory space is occupied. This affects the job performance. | N/A | Basic metric |
| The total number of classes that are unloaded after the JVM in which the JobManager resides is created. | If the total number of classes that are unloaded is excessively large after the JVM in which the JobManager resides is created, excessive memory space is occupied. This affects the job performance. | N/A | Basic metric |
| The CPU load of the TaskManager. | This metric indicates the total number of processes in which the CPU is calculating data and processes in which data waits to be calculated by the CPU. In most cases, this metric indicates how busy the CPU is. The value of this metric is related to the number of CPU cores that are used. The CPU load in Flink is calculated by using the following formula: CPU load = CPU utilization/Number of CPU cores. If the value of the | N/A | Basic metric |
| The CPU utilization of the JobManager. | This metric indicates the utilization of CPU time slices that are occupied by Flink.
If the value of this metric is greater than 100% for an extended period of time, the CPU is busy. If the CPU load is high but the CPU utilization is low, a large number of processes that are in the uninterruptible sleep state may be running due to frequent read and write operations. | N/A | Basic metric |
| The CPU utilization of the TaskManager. | This metric indicates the utilization of CPU time slices that are occupied by Flink.
If the value of this metric is greater than 100% for an extended period of time, the CPU is busy. If the CPU load is high but the CPU utilization is low, a large number of processes that are in the uninterruptible sleep state may be running due to frequent read and write operations. | N/A | Basic metric |
| The amount of heap memory of the TaskManager. | None. | Byte | Basic metric |
| The amount of heap memory committed by the TaskManager. | None. | Byte | Basic metric |
| The maximum amount of heap memory of the TaskManager. | None. | Byte | Basic metric |
| The amount of non-heap memory of the TaskManager. | None. | Byte | Basic metric |
| The amount of non-heap memory committed by the TaskManager. | None. | Byte | Basic metric |
| The maximum amount of non-heap memory of the TaskManager. | None. | Byte | Basic metric |
| The amount of memory consumed by the entire process on Linux. | This metric tracks changes in memory consumption of the process. | Byte | Basic metric |
| The number of threads of the TaskManager. | A large number of threads of the TaskManager occupies excessive memory space. This reduces the job stability. | Count | Basic metric |
| The number of GCs performed within the TaskManager. | Frequent GCs can lead to excessive memory consumption and negatively affect job performance. This metric helps diagnose job issues and identify the causes of job failures. | Count | Basic metric |
| The number of young-generation GCs performed by the G1 garbage collector of the TaskManager. | None. | Count | Custom metric |
| The number of old-generation GCs performed by the G1 garbage collector of the TaskManager. | None. | Count | Custom metric |
| The time consumed by the G1 garbage collector of the TaskManager to perform a young-generation GC. | None. | Milliseconds | Custom metric |
| The time consumed by the G1 garbage collector of the TaskManager to perform a old-generation GC. | None. | Milliseconds | Custom metric |
| The number of GCs performed by the CMS garbage collector of the TaskManager. | None. | Count | Basic metric |
| The duration for which each GC of the TaskManager lasts. | If GC of the TaskManager lasts for an extended period of time, excessive memory space is occupied. This affects the job performance. This metric helps diagnose job issues and identify the causes of job failures. | Milliseconds | Basic metric |
| The time consumed by the CMS garbage collector of the TaskManager to perform a GC. | None. | Milliseconds | Basic metric |
| The total number of classes that are loaded after the JVM in which the TaskManager resides is created. | If the total number of classes that are loaded is excessively large after the JVM in which the TaskManager resides is created, excessive memory space is occupied. This affects the job performance. | None. | Basic metric |
| The total number of classes that are unloaded after the JVM in which the TaskManager resides is created. | If the total number of classes that are unloaded is excessively large after the JVM in which the TaskManager resides is created, excessive memory space is occupied. This affects the job performance. | None. | Basic metric |
| The period during which the job runs. | None. | Milliseconds | Custom metric |
| The number of jobs that are running. | None. | None. | Custom metric |
| The number of available task slots. | None. | None. | Custom metric |
| The total number of task slots. | None. | None. | Custom metric |
| The number of registered TaskManagers. | None. | None. | Custom metric |
| The number of bytes read from the remote source per second. | None. | Byte/s | Custom metric |
| The number of packets dropped due to window latency. | None. | Count | Custom metric |
| The window latency rate. | None. | None. | Custom metric |
| Specifies whether the job is in the snapshot phase. | This metric indicates the job processing phase. | None. | Custom metric |
| Specifies whether the job is in the incremental phase. | This metric indicates the job processing phase. | None. | Custom metric |
| Specifies whether the job is in the incremental phase. | This metric measures the number of unprocessed tables. | Count | Custom metric |
| The number of tables that are waiting to be processed in the snapshot phase. | This metric measures the number of unprocessed tables. | Count | Custom metric |
| The number of processed tables in the snapshot phase. | This metric measures the number of processed tables. | Count | Custom metric |
| The number of processed shards in the snapshot phase. | This metric measures the number of processed shards. | Count | Custom metric |
| The number of shards that are waiting to be processed in the snapshot phase. | This metric measures the number of unprocessed shards. | Count | Custom metric |
| The number of shards that are waiting to be processed in the snapshot phase. | This metric measures the number of unprocessed shards. | Count | Custom metric |
| The timestamp of the latest data record that is read. | This metric measures the time of the latest binary log data. | Milliseconds | Custom metric |
| The number of processed data records in the snapshot phase. | This metric measures the number of processed data records in the snapshot phase. | Count | Custom metric |
| The number of data records that are read from each table. | This metric measures the total number of processed data records in each table. | Count | Custom metric |
| The number of processed data records in each table in the snapshot phase. | This metric measures the number of processed data records in each table in the snapshot phase. | Count | Custom metric |
| The number of executed INSERT DML statements for each table in the incremental phase. | This metric measures the number of executed INSERT statements for each table. | Count | Custom metric |
| The number of executed UPDATE DML statements for each table in the incremental phase. | This metric measures the number of executed UPDATE statements for each table. | Count | Custom metric |
| The number of executed DELETE DML statements for each table in the incremental phase. | This metric measures the number of executed DELETE statements for each table. | Count | Custom metric |
| The number of executed DDL statements for each table in the incremental phase. | This metric measures the number of executed DDL statements for each table. | Count | Custom metric |
| The number of executed INSERT DML statements in the incremental phase. | This metric measures the number of executed INSERT statements. | Count | Custom metric |
| The number of executed UPDATE DML statements in the incremental phase. | This metric measures the number of executed UPDATE statements. | Count | Custom metric |
| The number of executed DELETE DML statements in the incremental phase. | This metric measures the number of executed DELETE statements. | Count | Custom metric |
| The number of executed DDL statements in the incremental phase. | This metric measures the number of executed DDL statements. | Count | Custom metric |
Common metric labels
Label | Description |
| The name of the namespace. |
| The name of the deployment. |
| The deployment ID. |
| The job ID. |
Others
For more information about the metrics of Application Real-Time Monitoring Service (ARMS) Application Monitoring, see Application Monitoring metrics.