This topic describes the basic self-monitoring metrics supported by Managed Service for Prometheus. These basic metrics are provided at no cost.
Metric | Description |
aliyun_prometheus_agent_all_series_num | The total number of time series scraped in each cycle. |
aliyun_prometheus_agent_all_targets_num | The total number of targets scraped in each cycle. |
aliyun_prometheus_agent_blackbox_probe_total | The total number of times the Blackbox Probe action was executed. |
aliyun_prometheus_agent_cluster_node_num | The total number of nodes within the container cluster. |
aliyun_prometheus_agent_cpu_limit | The maximum CPU allocated to the agent. |
aliyun_prometheus_agent_cpu_usage_rate | The CPU utilization percentage of the agent. |
aliyun_prometheus_agent_dns_not_available_total | The number of times DNS was unavailable when sending data. |
aliyun_prometheus_agent_drop_metrics_exist | Indicates whether deprecated metrics exist. |
aliyun_prometheus_agent_heartbeat | The heartbeat metric of the agent, incrementing by 1 every 15 seconds. |
aliyun_prometheus_agent_job_discovery_status | Indicates whether the collection configuration was successfully loaded. |
aliyun_prometheus_agent_job_scrape_status | Indicates whether the default collection configuration was successfully scraped. |
aliyun_prometheus_agent_local_storage_conflicts_total | Indicates whether internal queues in the agent were blocked. |
aliyun_prometheus_agent_master_send_targets_time | The time that the master node of the agent spent distributing scraping jobs. |
aliyun_prometheus_agent_master_send_targets_total | The number of times that the master node of the agent distributed scraping jobs. |
aliyun_prometheus_agent_memory_usage_rate | The memory usage percentage of the agent. |
aliyun_prometheus_agent_memorybackpressure_total | The number of times that backpressure occurred due to failed data transmission in the agent. |
aliyun_prometheus_agent_memorylimit_alloc_mb | The size of memory allocated to the agent. |
aliyun_prometheus_agent_memorylimit_limit_mb | The maximum memory allocated to the agent. |
aliyun_prometheus_agent_regis_fail_total | The total number of times the agent failed to initialize resources. |
aliyun_prometheus_agent_relabel_error_num | The total number of errors in the relabel configurations of the agent. |
aliyun_prometheus_agent_remote_write_duration_ms | The duration of remote write operations performed by the agent in milliseconds. |
aliyun_prometheus_agent_remote_write_failed_batch_total | The total number of failed batches during remote write operations. |
aliyun_prometheus_agent_remote_write_failed_down_grade_total | The total number of downgrades caused by failed remote write operations. |
aliyun_prometheus_agent_remote_write_succeed_batch_total | The total number of successful batches during remote write operations. |
aliyun_prometheus_agent_remote_write_succeed_bytes_total | The total number of bytes successfully written during remote write operations. |
aliyun_prometheus_agent_replica_current_num | The expected number of replicas currently running. |
aliyun_prometheus_agent_restart_by_oom_num | The total number of times the agent restarted due to Out Of Memory (OOM). |
aliyun_prometheus_agent_scale_out_fail_count | The number of times agent scaling out failed. |
aliyun_prometheus_agent_scale_out_failed | Indicates whether a scale-out operation failed. |
aliyun_prometheus_agent_scrape_base_delay_15 | Indicates whether the delay in base scraping jobs exceeded 15 seconds. |
aliyun_prometheus_agent_scrape_base_delay_60 | Indicates whether the delay in base scraping jobs exceeded 60 seconds. |
aliyun_prometheus_agent_scrape_base_error | Indicates whether errors occurred in base scraping jobs. |
aliyun_prometheus_agent_scrape_custom_delay_15 | Indicates whether the delay in custom scraping jobs exceeded 15 seconds. |
aliyun_prometheus_agent_scrape_custom_delay_60 | Indicates whether the delay in custom scraping jobs exceeded 60 seconds. |
aliyun_prometheus_agent_scrape_custom_error | Indicates whether errors occurred in custom scraping jobs. |
aliyun_prometheus_agent_scrape_error_targets_num | The total number of scraping targets that failed in the scraping job. |
aliyun_prometheus_agent_scrape_latency | The latency distribution of scraping jobs. |
aliyun_prometheus_agent_scrape_samples_total | The total number of time series collected by scraping jobs. |
aliyun_prometheus_agent_send_batch_compressed_bytes_exceeds_limit_total | The number of times that compressed batch sizes exceeded the allowed limit. |
aliyun_prometheus_agent_send_batch_duration_seconds | The duration of data batch sending operations in seconds. |
aliyun_prometheus_agent_send_config_fail_total | The number of times that the master node of the agent failed to synchronize the collection configuration. |
aliyun_prometheus_agent_send_data_queue_capicaty | The capacity of data sending queues of the agent. |
aliyun_prometheus_agent_send_data_queue_length | The actual usage of data sending queues of the agent. |
aliyun_prometheus_agent_send_discovery_config_fail_total | The number of times that the master node of the agent failed to synchronize scraping jobs. |
aliyun_prometheus_agent_sync_worker_series_duration_ms | The time taken by the master node of the agent to synchronize worker series in milliseconds. |
aliyun_prometheus_agent_target_info | The detailed scraping target information of the scraping job. |
aliyun_prometheus_agent_unzip_response_body_exceed_512M | The number of times that the uncompressed response body from a target exceeded 512 MB in a single scrape. |
aliyun_prometheus_agent_worker_series_num | The total number of time series collected by worker nodes. |
aliyun_prometheus_agent_worker_targets_num | The total number of scraping jobs handled by worker nodes. |
aliyun_prometheus_agent_write_arms_duration_num | The duration of data writing operations to the agent. |
aliyun_prometheus_agent_write_fail500_total | The total number of retry attempts for write failures. |
aliyun_prometheus_agent_write_fail_batch_total | The total number of failed write batches. |
aliyun_prometheus_agent_write_fail_bytes_total | The total number of bytes that failed to be written. |
aliyun_prometheus_agent_write_fail_down_grade_total | The total number of write downgrades caused by failures. |
aliyun_prometheus_agent_write_fail_total | The total number of write failures (after all retries have failed). |
aliyun_prometheus_agent_write_succeed_batch_total | The total number of successful write batches. |
aliyun_prometheus_agent_write_succeed_bytes_total | The total number of bytes successfully written. |
aliyun_prometheus_agent_write_total | The total number of write operations. |
aliyun_prometheus_agent_writert_total | The duration of write operations. |
aliyun_prometheus_agent_hpa_max_limit | The maximum number of replicas allowed during horizontal scaling of the agent. |
scrape_bytes_scraped | The number of bytes scraped in a single scraping job. |