All Products
Search
Document Center

Managed Service for Prometheus:Basic self-monitoring metrics

Last Updated:May 07, 2025

This topic describes the basic self-monitoring metrics supported by Managed Service for Prometheus. These basic metrics are provided at no cost.

Metric

Description

aliyun_prometheus_agent_all_series_num

The total number of time series scraped in each cycle.

aliyun_prometheus_agent_all_targets_num

The total number of targets scraped in each cycle.

aliyun_prometheus_agent_blackbox_probe_total

The total number of times the Blackbox Probe action was executed.

aliyun_prometheus_agent_cluster_node_num

The total number of nodes within the container cluster.

aliyun_prometheus_agent_cpu_limit

The maximum CPU allocated to the agent.

aliyun_prometheus_agent_cpu_usage_rate

The CPU utilization percentage of the agent.

aliyun_prometheus_agent_dns_not_available_total

The number of times DNS was unavailable when sending data.

aliyun_prometheus_agent_drop_metrics_exist

Indicates whether deprecated metrics exist.

aliyun_prometheus_agent_heartbeat

The heartbeat metric of the agent, incrementing by 1 every 15 seconds.

aliyun_prometheus_agent_job_discovery_status

Indicates whether the collection configuration was successfully loaded.

aliyun_prometheus_agent_job_scrape_status

Indicates whether the default collection configuration was successfully scraped.

aliyun_prometheus_agent_local_storage_conflicts_total

Indicates whether internal queues in the agent were blocked.

aliyun_prometheus_agent_master_send_targets_time

The time that the master node of the agent spent distributing scraping jobs.

aliyun_prometheus_agent_master_send_targets_total

The number of times that the master node of the agent distributed scraping jobs.

aliyun_prometheus_agent_memory_usage_rate

The memory usage percentage of the agent.

aliyun_prometheus_agent_memorybackpressure_total

The number of times that backpressure occurred due to failed data transmission in the agent.

aliyun_prometheus_agent_memorylimit_alloc_mb

The size of memory allocated to the agent.

aliyun_prometheus_agent_memorylimit_limit_mb

The maximum memory allocated to the agent.

aliyun_prometheus_agent_regis_fail_total

The total number of times the agent failed to initialize resources.

aliyun_prometheus_agent_relabel_error_num

The total number of errors in the relabel configurations of the agent.

aliyun_prometheus_agent_remote_write_duration_ms

The duration of remote write operations performed by the agent in milliseconds.

aliyun_prometheus_agent_remote_write_failed_batch_total

The total number of failed batches during remote write operations.

aliyun_prometheus_agent_remote_write_failed_down_grade_total

The total number of downgrades caused by failed remote write operations.

aliyun_prometheus_agent_remote_write_succeed_batch_total

The total number of successful batches during remote write operations.

aliyun_prometheus_agent_remote_write_succeed_bytes_total

The total number of bytes successfully written during remote write operations.

aliyun_prometheus_agent_replica_current_num

The expected number of replicas currently running.

aliyun_prometheus_agent_restart_by_oom_num

The total number of times the agent restarted due to Out Of Memory (OOM).

aliyun_prometheus_agent_scale_out_fail_count

The number of times agent scaling out failed.

aliyun_prometheus_agent_scale_out_failed

Indicates whether a scale-out operation failed.

aliyun_prometheus_agent_scrape_base_delay_15

Indicates whether the delay in base scraping jobs exceeded 15 seconds.

aliyun_prometheus_agent_scrape_base_delay_60

Indicates whether the delay in base scraping jobs exceeded 60 seconds.

aliyun_prometheus_agent_scrape_base_error

Indicates whether errors occurred in base scraping jobs.

aliyun_prometheus_agent_scrape_custom_delay_15

Indicates whether the delay in custom scraping jobs exceeded 15 seconds.

aliyun_prometheus_agent_scrape_custom_delay_60

Indicates whether the delay in custom scraping jobs exceeded 60 seconds.

aliyun_prometheus_agent_scrape_custom_error

Indicates whether errors occurred in custom scraping jobs.

aliyun_prometheus_agent_scrape_error_targets_num

The total number of scraping targets that failed in the scraping job.

aliyun_prometheus_agent_scrape_latency

The latency distribution of scraping jobs.

aliyun_prometheus_agent_scrape_samples_total

The total number of time series collected by scraping jobs.

aliyun_prometheus_agent_send_batch_compressed_bytes_exceeds_limit_total

The number of times that compressed batch sizes exceeded the allowed limit.

aliyun_prometheus_agent_send_batch_duration_seconds

The duration of data batch sending operations in seconds.

aliyun_prometheus_agent_send_config_fail_total

The number of times that the master node of the agent failed to synchronize the collection configuration.

aliyun_prometheus_agent_send_data_queue_capicaty

The capacity of data sending queues of the agent.

aliyun_prometheus_agent_send_data_queue_length

The actual usage of data sending queues of the agent.

aliyun_prometheus_agent_send_discovery_config_fail_total

The number of times that the master node of the agent failed to synchronize scraping jobs.

aliyun_prometheus_agent_sync_worker_series_duration_ms

The time taken by the master node of the agent to synchronize worker series in milliseconds.

aliyun_prometheus_agent_target_info

The detailed scraping target information of the scraping job.

aliyun_prometheus_agent_unzip_response_body_exceed_512M

The number of times that the uncompressed response body from a target exceeded 512 MB in a single scrape.

aliyun_prometheus_agent_worker_series_num

The total number of time series collected by worker nodes.

aliyun_prometheus_agent_worker_targets_num

The total number of scraping jobs handled by worker nodes.

aliyun_prometheus_agent_write_arms_duration_num

The duration of data writing operations to the agent.

aliyun_prometheus_agent_write_fail500_total

The total number of retry attempts for write failures.

aliyun_prometheus_agent_write_fail_batch_total

The total number of failed write batches.

aliyun_prometheus_agent_write_fail_bytes_total

The total number of bytes that failed to be written.

aliyun_prometheus_agent_write_fail_down_grade_total

The total number of write downgrades caused by failures.

aliyun_prometheus_agent_write_fail_total

The total number of write failures (after all retries have failed).

aliyun_prometheus_agent_write_succeed_batch_total

The total number of successful write batches.

aliyun_prometheus_agent_write_succeed_bytes_total

The total number of bytes successfully written.

aliyun_prometheus_agent_write_total

The total number of write operations.

aliyun_prometheus_agent_writert_total

The duration of write operations.

aliyun_prometheus_agent_hpa_max_limit

The maximum number of replicas allowed during horizontal scaling of the agent.

scrape_bytes_scraped

The number of bytes scraped in a single scraping job.