All Products
Search
Document Center

Container Compute Service:Enable Prometheus monitoring for Agent Sandbox

Last Updated:Jun 05, 2026

ACS Agent Sandbox exposes Prometheus metrics for instance lifecycle, resource status, and runtimes via its two core components: Sandbox Controller and Sandbox Manager. You can collect these metrics with either Managed Service for Prometheus or a self-hosted Prometheus instance and visualize them on a Grafana dashboard.

Prerequisites

  • On your cluster's Add-ons page, ensure the following components meet the minimum version requirements:

    • ack-agent-sandbox-controller: v0.5.14 or later.

    • ack-sandbox-manager: v0.6.1 or later.

Enable Prometheus monitoring for Agent Sandbox

Alibaba Cloud Prometheus

  1. Go to the ARMS Prometheus Integration Management page. In the upper-left corner, select the region of your cluster. On the Integrated Environments tab, locate your cluster and click its instance name to open the instance details page.

  2. On the instance details page, click Add Integration next to Addon Type. In the panel that appears, find and click Agent Sandbox Monitoring, keep the default integration name, and then click OK.

Self-managed Prometheus

Configure scrape rules

Agent Sandbox monitoring involves scraping metrics from two components:

  • Sandbox Controller: Exposes metrics through the /metrics endpoint on the Kubernetes API server.

  • Sandbox Manager: Deployed in the sandbox-system namespace and exposes metrics on HTTP port 8080 at the /metrics path.

Open Source Prometheus

In prometheus.yml, add the following scrape jobs to the scrape_configs section.

Sandbox Controller

scrape_configs:
- job_name: agent-sandbox-controller
  scrape_interval: 30s
  scrape_timeout: 30s
  metrics_path: /metrics
  scheme: https
  honor_labels: true
  honor_timestamps: true
  params:
    hosting: ["true"]
    job: ["agent-sandbox-controller"]
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names: [default]
  authorization:
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    insecure_skip_verify: false
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server_name: kubernetes
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_component]
    separator: ;
    regex: apiserver
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_provider]
    separator: ;
    regex: kubernetes
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: https
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_component]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: https
    action: replace

Sandbox Manager

scrape_configs:
- job_name: sandbox-manager
  scrape_interval: 30s
  scrape_timeout: 30s
  metrics_path: /metrics
  scheme: http
  honor_labels: true
  honor_timestamps: true
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - sandbox-system
  relabel_configs:
  - source_labels:
    - __meta_kubernetes_endpoint_port_name
    separator: ;
    regex: manager
    replacement: $1
    action: keep

Prometheus Operator

The community edition of Prometheus Operator uses the ServiceMonitor custom resource to define scrape rules.

Sandbox Controller

  1. Save the following content as sandbox-controller-servicemonitor.yaml.

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        release: ack-prometheus-operator # Change this as needed based on the labelSelector configuration of your Prometheus Operator.
      name: sandbox-controller
      namespace: monitoring
    spec:
      endpoints:
        - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
          bearerTokenSecret:
            key: ''
          honorLabels: true
          honorTimestamps: true
          interval: 30s
          params:
            hosting:
              - 'true'
            job:
              - agent-sandbox-controller
          path: /metrics
          port: https
          relabelings:
            - action: keep
              regex: https
              sourceLabels:
                - __meta_kubernetes_endpoint_port_name
            - action: replace
              sourceLabels:
                - __meta_kubernetes_namespace
              targetLabel: namespace
            - action: replace
              regex: Node;(.*)
              replacement: '${1}'
              separator: ;
              sourceLabels:
                - __meta_kubernetes_endpoint_address_target_kind
                - __meta_kubernetes_endpoint_address_target_name
              targetLabel: node
            - action: replace
              regex: Pod;(.*)
              replacement: '${1}'
              separator: ;
              sourceLabels:
                - __meta_kubernetes_endpoint_address_target_kind
                - __meta_kubernetes_endpoint_address_target_name
              targetLabel: pod
            - action: replace
              sourceLabels:
                - __meta_kubernetes_service_name
              targetLabel: service
            - action: replace
              regex: ^$
              sourceLabels:
                - __meta_kubernetes_service_label_component
              targetLabel: __tmp_job_fallback
            - action: replace
              regex: (.+);
              replacement: '${1}'
              separator: ;
              sourceLabels:
                - __meta_kubernetes_service_name
                - __meta_kubernetes_service_label_component
              targetLabel: job
            - action: replace
              replacement: https
              targetLabel: endpoint
          scheme: https
          scrapeTimeout: 30s
          tlsConfig:
            ca: {}
            caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            cert: {}
            serverName: kubernetes
      jobLabel: component
      namespaceSelector:
        matchNames:
          - default
      selector:
        matchLabels:
          component: apiserver
          provider: kubernetes
  2. Create the ServiceMonitor resource.

    kubectl apply -f sandbox-controller-servicemonitor.yaml

Sandbox Manager

Save the following ServiceMonitor content as a YAML file and run kubectl apply -f to create the resource.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    release: ack-prometheus-operator # This must match the serviceMonitorSelector of your Prometheus Operator for discovery and scraping to work. Change this value as needed based on your actual labelSelector.
    app.kubernetes.io/instance: ack-sandbox-manager
    app.kubernetes.io/name: ack-sandbox-manager
    component: sandbox-manager
  name: sandbox-manager
  namespace: sandbox-system
spec:
  endpoints:
  - interval: 30s
    path: /metrics
    port: manager
  namespaceSelector:
    matchNames:
    - sandbox-system
  selector:
    matchLabels:
      app.kubernetes.io/instance: ack-sandbox-manager
      app.kubernetes.io/name: ack-sandbox-manager
      component: sandbox-manager

View monitoring dashboards

Alibaba Cloud Prometheus

Log on to the Alibaba Cloud console. In the left navigation bar, select Operations > Prometheus Monitoring. On the Others tab, you can view the following Sandbox monitoring dashboards.

  • Sandbox Instance: View the status, lifecycle, and resource usage of specific Sandbox instances.

  • Sandbox Controller: View cloud-side lifecycle management for Sandbox resources, including resource statistics and management performance.

  • Sandbox Manager: View the execution status of Sandbox resource declarations, such as the execution performance of the E2B protocol.

Self-managed Prometheus

If you use self-managed Prometheus, you can import the following Grafana dashboard JSON templates and configure the data source.

Name

Version

Description

Download

Sandbox Instance

v1.0.0

Monitors the metadata, current status, and resource usage of Sandbox instances.

Sandbox Instance-v1.0.0.json

Sandbox Controller

v1.0.0

Monitors cloud-side lifecycle management for Sandbox resources, including resource statistics and management performance.

Sandbox Controller-v1.0.0.json

Sandbox Manager

v1.0.0

Monitors the execution status of Sandbox resource declarations, such as the execution performance of the E2B protocol.

Sandbox Manager-v1.0.0.json

Billing

Enabling Alibaba Cloud Prometheus Monitoring may incur additional charges. For details, see the billing overview.

Metrics

This section lists the Prometheus metrics exposed by the Sandbox Controller and Sandbox Manager components. You can use these metrics to configure alerting rules or create custom dashboards.

Sandbox controller metrics

The Sandbox Controller manages the lifecycle of Sandbox instances and SandboxSet resources. The following metrics are exposed from the controller's/metrics endpoint.

Sandbox instance metrics

Use these metrics to monitor the basic information, lifecycle status, and readiness of each Sandbox instance in the cluster.

Note

Status-based metrics, such assandbox_status_unpaused,sandbox_status_unpaused_time, andsandbox_status_inplace_updating, generate a time series only when an instance enters the corresponding state. If no instances in the cluster are in that state (for example, if a pause operation has never been performed), querying the corresponding metric returns no data. This is expected and does not indicate a scrape failure.

Metric name

Type

Description

Labels

sandbox_created

Gauge

The Unix timestamp of the Sandbox instance's creation.

name, namespace

sandbox_status_phase

Gauge

The current phase of the Sandbox instance. The value is 1 for the current phase. Possible values for thephase label include: Pending, Running, Paused, Resuming, Failed, Succeeded, and Terminating.

name, namespace, phase

sandbox_status_ready

Gauge

Indicates if the Sandbox instance is ready. 1 for true; 0 for false.

name, namespace

sandbox_status_ready_time

Gauge

The Unix timestamp of when the instance last entered the Ready state.

name, namespace

sandbox_status_inplace_updating

Gauge

Indicates if the InplaceUpdate condition of the Sandbox instance is False. 1 if False; 0 if True.

name, namespace

sandbox_status_unpaused

Gauge

Indicates if the SandboxPaused condition of the Sandbox instance is False. 1 if False; 0 if True.

name, namespace

sandbox_status_unpaused_time

Gauge

The Unix timestamp when the SandboxPaused condition transitioned to False.

name, namespace

sandbox_status_inplace_updating_time

Gauge

The Unix timestamp when the InplaceUpdate condition transitioned to False.

name, namespace

Sandbox instance resource metrics

Each Sandbox instance corresponds to a pod. Its resource metrics are the same as the cluster's cAdvisor pod resource metrics. For detailed metric descriptions, see Container cluster basic metrics.

SandboxSet metrics

Use these metrics to monitor the replica status of SandboxSet resources and to identify issues such as insufficient replicas or scaling anomalies.

Metric name

Type

Description

Labels

sandboxset_replicas

Gauge

The current number of SandboxSet replicas.

name, namespace

sandboxset_available_replicas

Gauge

The current number of available SandboxSet replicas.

name, namespace

sandboxset_desired_replicas

Gauge

The desired number of SandboxSet replicas.

name, namespace

Controller runtime metrics

Use these metrics to monitor the reconcile performance and error conditions of the controller-runtime framework. This helps you determine if the controller is healthy.

Metric name

Type

Description

Labels

controller_runtime_reconcile_total

Counter

Total number of reconcile operations for each controller.

controller, result

controller_runtime_reconcile_errors_total

Counter

Total number of reconcile errors for each controller.

controller

controller_runtime_terminal_reconcile_errors_total

Counter

Total number of terminal reconcile errors for each controller.

controller

controller_runtime_active_workers

Gauge

Current number of active workers for each controller.

controller

controller_runtime_webhook_requests_total

Counter

Total number of admission requests, broken down by HTTP status code.

webhook, code

Workqueue metrics

Use these metrics to monitor the backlog and processing status of the controller's workqueue, and to identify issues such as queue buildup or stuck threads.

Metric name

Type

Description

Labels

workqueue_depth

Gauge

Current depth of the workqueue.

controller, name

workqueue_unfinished_work_seconds

Gauge

The number of seconds of work that is in progress but not yet complete. A high value may indicate a stuck thread.

controller, name

workqueue_longest_running_processor_seconds

Gauge

The number of seconds the longest-running processor in the workqueue has been active.

controller, name

API server request metrics

Use these metrics to monitor the controller's requests to the Kubernetes API Server. This helps you troubleshoot throttling or connection issues.

Metric name

Type

Description

Labels

rest_client_requests_total

Counter

Total number of HTTP requests, categorized by status code and method.

code, method

Process and runtime metrics

Use these metrics to monitor the resource usage of the controller process, including memory, garbage collection (GC), and goroutines to identify potential resource leaks.

Metric name

Type

Description

up

Gauge

Indicates the scrape connectivity status. 1 if the target is up and healthy.

go_goroutines

Gauge

The current number of goroutines.

go_gc_duration_seconds

Summary

A summary of the stop-the-world garbage collection pause durations.

process_resident_memory_bytes

Gauge

Resident memory size of the process, in bytes.

process_open_fds

Gauge

The number of open file descriptors for the process.

go_memstats_alloc_bytes

Gauge

Number of heap bytes allocated and still in use.

go_memstats_sys_bytes

Gauge

Total bytes of memory obtained from the operating system.

go_memstats_heap_inuse_bytes

Gauge

Number of heap bytes that are in use.

go_memstats_heap_objects

Gauge

The current number of allocated objects.

go_memstats_heap_alloc_bytes

Gauge

Alias for go_memstats_alloc_bytes. This metric is identical to go_memstats_alloc_bytes.

go_memstats_heap_idle_bytes

Gauge

Number of heap bytes waiting to be used.

go_memstats_heap_released_bytes

Gauge

Number of heap bytes released to the operating system.

go_memstats_heap_sys_bytes

Gauge

Number of heap bytes obtained from the system.

go_memstats_alloc_bytes_total

Counter

Total number of bytes allocated to date (including freed bytes).

go_memstats_next_gc_bytes

Gauge

The heap size threshold in bytes that triggers the next garbage collection.

go_memstats_last_gc_time_seconds

Gauge

The Unix timestamp of the last garbage collection.

go_memstats_gc_sys_bytes

Gauge

Bytes of memory used for garbage collection system metadata.

go_memstats_buck_hash_sys_bytes

Gauge

Bytes of memory used by the profiling bucket hash table.

go_memstats_mspan_sys_bytes

Gauge

Bytes of memory used for mspan structures.

go_memstats_mcache_sys_bytes

Gauge

Bytes of memory used for mcache structures.

go_memstats_other_sys_bytes

Gauge

Bytes of memory used for other system allocations.

go_memstats_stack_sys_bytes

Gauge

Bytes of memory obtained from the system for stack space.

Sandbox manager metrics

Sandbox Manager handles sandbox resource claims, lifecycle operations (such as clone, delete, pause, resume, and snapshot), and the routing proxy. The Manager's /metrics endpoint exposes the following metrics.

Sandbox claim metrics

Use these metrics to monitor sandbox resource claim operations, including success rate, duration, and retry count.

Metric name

Type

Description

Label

sandbox_claim_total

Counter

Total number of claim operations.

-

sandbox_claim_creation_responses

Counter

The number of sandbox creation requests, broken down by result.

result

sandbox_claim_duration_seconds

Histogram

Duration of claim operations, in seconds.

-

sandbox_claim_retries

Histogram

Number of retries per claim operation.

-

Lifecycle operation metrics

Use these metrics to monitor the duration and outcome of Sandbox lifecycle operations. This helps you evaluate performance and troubleshoot failures.

Metric name

Type

Description

Label

sandbox_clone_duration_seconds

Histogram

Duration of the Sandbox clone operation, in seconds.

-

sandbox_delete_duration_seconds

Histogram

Duration of the Sandbox delete operation, in seconds.

-

sandbox_delete_responses

Counter

Total Sandbox delete requests by result.

result

sandbox_pause_duration_seconds

Histogram

Duration of the Sandbox pause operation, in seconds.

-

sandbox_resume_duration_seconds

Histogram

Duration of the Sandbox resume operation, in seconds.

-

sandbox_snapshot_duration_seconds

Histogram

Duration of the Sandbox snapshot creation, in seconds.

-

Route and network metrics

Use these metrics to monitor the proxy routing table and peer nodes in Sandbox Manager, and the performance of route synchronization.

Metric name

Type

Description

Tag

sandbox_routes

Gauge

Current number of routes in the proxy routing table.

-

sandbox_peers

Gauge

Current number of connected peer nodes.

-

sandbox_route_sync_duration_seconds

Histogram

Duration of route synchronization, in seconds.

-

sandbox_route_sync_total

Counter

Total number of route synchronizations.

-

controller-runtime metrics (Manager)

Use these metrics to monitor reconciliation performance and errors in the controller-runtime framework.

Parameter

Type

Description

Label

controller_runtime_reconcile_total

Counter

Total number of reconciliations for each controller.

controller, result

controller_runtime_reconcile_errors_total

Counter

Total number of reconciliation errors for each controller.

controller

controller_runtime_terminal_reconcile_errors_total

Counter

Total number of terminal reconciliation errors for each controller.

controller

controller_runtime_active_workers

Gauge

Current number of active workers for each controller.

controller

controller_runtime_reconcile_time_seconds

Histogram

Reconciliation duration in seconds.

controller

controller_runtime_max_concurrent_reconciles

Gauge

Maximum number of concurrent reconciles for each controller.

controller

controller_runtime_reconcile_panics_total

Counter

Total number of reconciliation panics for each controller.

controller

controller_runtime_webhook_panics_total

Counter

Total number of Webhook panics.

-

Work queue metrics (manager)

Use these metrics to monitor the backlog and processing status of the controller's work queue.

Metric name

Type

Description

Labels

workqueue_depth

Gauge

Current depth of the work queue.

controller, name

workqueue_unfinished_work_seconds

Gauge

Total runtime, in seconds, of work currently in progress.

controller, name

workqueue_longest_running_processor_seconds

Gauge

Runtime in seconds of the longest-running processor for the work queue.

controller, name

workqueue_adds_total

Counter

Total number of items added to the work queue.

controller, name

workqueue_retries_total

Counter

Total number of retries handled by the work queue.

controller, name

workqueue_queue_duration_seconds

Histogram

The duration an item spends in the work queue before being processed.

controller, name

workqueue_work_duration_seconds

Histogram

The time taken to process an item from the work queue.

controller, name

API Server request metrics (Manager)

Use these metrics to monitor requests from the Manager to the Kubernetes API Server.

Metric name

Type

Description

Label

rest_client_requests_total

Counter

Number of HTTP requests, broken down by status code, method, and host.

code, method, host

Process and runtime metrics (Manager)

These metrics monitor the resource usage of the Manager process.

Metric name

Type

Description

Label

process_cpu_seconds_total

counter

Total CPU time used by the process, in seconds.

-

process_resident_memory_bytes

gauge

Resident memory size of the process, in bytes.

-

process_open_fds

gauge

Number of open file descriptors for the process.

-

process_max_fds

gauge

Maximum number of file descriptors for the process.

-

process_virtual_memory_bytes

gauge

Virtual memory size of the process, in bytes.

-

process_virtual_memory_max_bytes

gauge

Maximum virtual memory size for the process, in bytes.

-

process_start_time_seconds

gauge

Start time of the process, in seconds since the Unix epoch.

-

process_network_receive_bytes_total

counter

Total bytes received by the process over the network.

-

process_network_transmit_bytes_total

counter

Total bytes transmitted by the process over the network.

-

go_goroutines

gauge

Current number of goroutines.

-

go_threads

gauge

Number of OS threads created.

-

go_info

gauge

Information about the Go environment.

version

go_gc_duration_seconds

summary

Distribution of stop-the-world GC pause durations.

quantile

go_memstats_alloc_bytes

gauge

Bytes of allocated heap objects.

-

go_memstats_alloc_bytes_total

counter

Total bytes allocated for heap objects (includes freed bytes).

-

go_memstats_sys_bytes

gauge

Total bytes of memory obtained from the operating system.

-

go_memstats_heap_alloc_bytes

gauge

Bytes of allocated heap objects.

-

go_memstats_heap_idle_bytes

gauge

Bytes of memory in idle spans.

-

go_memstats_heap_inuse_bytes

gauge

Bytes of memory in in-use spans.

-

go_memstats_heap_objects

gauge

Number of allocated heap objects.

-

go_memstats_heap_released_bytes

gauge

Bytes of memory released to the operating system.

-

go_memstats_heap_sys_bytes

gauge

Bytes of memory obtained from the operating system for the heap.

-

go_memstats_stack_sys_bytes

gauge

Bytes of memory obtained from the operating system for stack space.

-

go_memstats_stack_inuse_bytes

gauge

Bytes of memory in use by the stack allocator.

-

go_memstats_mspan_sys_bytes

gauge

Bytes of memory used for mspan structures.

-

go_memstats_mspan_inuse_bytes

gauge

Bytes of memory in use by mspan structures.

-

go_memstats_mcache_sys_bytes

gauge

Bytes of memory used for mcache structures.

-

go_memstats_mcache_inuse_bytes

gauge

Bytes of memory in use by mcache structures.

-

go_memstats_buck_hash_sys_bytes

gauge

Bytes of memory used by the profiling bucket hash table.

-

go_memstats_gc_sys_bytes

gauge

Bytes of memory used for garbage collection system metadata.

-

go_memstats_other_sys_bytes

gauge

Bytes of memory used for other system allocations.

-

go_memstats_next_gc_bytes

gauge

Target heap size in bytes for the next garbage collection cycle.

-

go_memstats_last_gc_time_seconds

gauge

Timestamp of the last completed garbage collection cycle, in seconds since the Unix epoch.

-

go_memstats_frees_total

counter

Cumulative count of heap objects freed.

-

go_memstats_mallocs_total

counter

Cumulative count of heap objects allocated.

-

go_gc_cycles_automatic_gc_cycles_total

counter

Total number of automatic garbage collection cycles initiated by the Go runtime.

-

go_gc_cycles_forced_gc_cycles_total

counter

Total number of garbage collection cycles forced by the application.

-

go_gc_cycles_total_gc_cycles_total

counter

Total number of completed garbage collection cycles.

-

go_gc_gogc_percent

gauge

User-configured heap growth target percentage.

-

go_gc_gomemlimit_bytes

gauge

User-configured Go runtime memory limit, in bytes.

-

go_gc_heap_goal_bytes

gauge

Target heap size at the end of a garbage collection cycle, in bytes.

-

go_gc_heap_live_bytes

gauge

Heap memory occupied by objects that were marked live by the previous garbage collection cycle.

-

go_gc_heap_objects_objects

gauge

Number of objects on the heap (live or unswept).

-

go_gc_heap_tiny_allocs_objects_total

counter

Total number of tiny allocations combined into blocks.

-

go_gc_heap_allocs_bytes_total

counter

Cumulative memory allocated to the heap by the application.

-

go_gc_heap_allocs_objects_total

counter

Cumulative count of heap allocations triggered by the application.

-

go_gc_heap_frees_bytes_total

counter

Cumulative heap memory freed by the garbage collector.

-

go_gc_heap_frees_objects_total

counter

Cumulative count of heap allocations freed by the garbage collector.

-

go_gc_heap_allocs_by_size_bytes

histogram

Distribution of heap allocations by approximate size.

le

go_gc_heap_frees_by_size_bytes

histogram

Distribution of freed heap allocations by approximate size.

le

go_gc_scan_globals_bytes

gauge

Total scannable global variable space, in bytes.

-

go_gc_scan_heap_bytes

gauge

Total scannable heap space, in bytes.

-

go_gc_scan_stack_bytes

gauge

Total stack bytes scanned in the previous garbage collection cycle.

-

go_gc_scan_total_bytes

gauge

Total scannable space, in bytes.

-

go_gc_stack_starting_size_bytes

gauge

Stack size of new goroutines, in bytes.

-

go_gc_limiter_last_enabled_gc_cycle

gauge

The GC cycle when the GC CPU limiter was last enabled.

-

go_gc_pauses_seconds

histogram

Distribution of GC pause durations (deprecated).

le

go_sched_gomaxprocs_threads

gauge

The current runtime.GOMAXPROCS setting.

-

go_sched_goroutines_goroutines

gauge

Number of active goroutines.

-

go_sched_latencies_seconds

histogram

Distribution of time goroutines spend waiting in the scheduler's run queue.

le

go_sched_pauses_stopping_gc_seconds

histogram

Distribution of GC-related stop-the-world stop latencies.

le

go_sched_pauses_stopping_other_seconds

histogram

Distribution of non-GC-related stop-the-world stop latencies.

le

go_sched_pauses_total_gc_seconds

histogram

Distribution of GC-related stop-the-world pause latencies.

le

go_sched_pauses_total_other_seconds

histogram

Distribution of non-GC-related stop-the-world pause latencies.

le

go_sync_mutex_wait_total_seconds_total

counter

Cumulative time goroutines have spent blocked on sync.Mutex or sync.RWMutex, in seconds.

-

go_cgo_go_to_c_calls_calls_total

counter

Total number of CGO calls from Go to C in the current process.

-

go_cpu_classes_gc_mark_assist_cpu_seconds_total

counter

Estimated total CPU time goroutines spent assisting the GC mark phase, in seconds.

-

go_cpu_classes_gc_mark_dedicated_cpu_seconds_total

counter

Estimated total CPU time spent on dedicated processors for the GC mark phase, in seconds.

-

go_cpu_classes_gc_mark_idle_cpu_seconds_total

counter

Estimated total CPU time spent on idle CPU resources for the GC mark phase, in seconds.

-

go_cpu_classes_gc_pause_cpu_seconds_total

counter

Estimated total CPU time the application spent paused for GC, in seconds.

-

go_cpu_classes_gc_total_cpu_seconds_total

counter

Estimated total CPU time spent running GC work, in seconds.

-

go_cpu_classes_idle_cpu_seconds_total

counter

Estimated total available CPU time not spent running any Go or Go runtime code, in seconds.

-

go_cpu_classes_scavenge_assist_cpu_seconds_total

counter

Estimated total CPU time spent returning unused memory in response to memory pressure, in seconds.

-

go_cpu_classes_scavenge_background_cpu_seconds_total

counter

Estimated total CPU time spent returning unused memory in the background, in seconds.

-

go_cpu_classes_scavenge_total_cpu_seconds_total

counter

Estimated total CPU time spent returning unused memory, in seconds.

-

go_cpu_classes_total_cpu_seconds_total

counter

Estimated total available CPU time for user Go code or the Go runtime, in seconds.

-

go_cpu_classes_user_cpu_seconds_total

counter

Estimated total CPU time spent running user Go code, in seconds.

-

go_memory_classes_heap_free_bytes

gauge

Memory that is completely free and eligible to be returned to the operating system, but has not yet been returned, in bytes.

-

go_memory_classes_heap_objects_bytes

gauge

Memory occupied by live objects and dead objects that have not yet been marked as free, in bytes.

-

go_memory_classes_heap_released_bytes

gauge

Memory that is completely free and has been returned to the operating system, in bytes.

-

go_memory_classes_heap_stacks_bytes

gauge

Memory allocated from the heap and reserved for stack space, in bytes.

-

go_memory_classes_heap_unused_bytes

gauge

Memory reserved for heap objects but not currently in use, in bytes.

-

go_memory_classes_metadata_mcache_free_bytes

gauge

Memory reserved for runtime mcache structures but not in use, in bytes.

-

go_memory_classes_metadata_mcache_inuse_bytes

gauge

Memory occupied by runtime mcache structures that are currently in use, in bytes.

-

go_memory_classes_metadata_mspan_free_bytes

gauge

Memory reserved for runtime mspan structures but not in use, in bytes.

-

go_memory_classes_metadata_mspan_inuse_bytes

gauge

Memory occupied by runtime mspan structures that are currently in use, in bytes.

-

go_memory_classes_metadata_other_bytes

gauge

Memory reserved for or used by runtime metadata, in bytes.

-

go_memory_classes_os_stacks_bytes

gauge

Stack memory allocated by the underlying operating system, in bytes.

-

go_memory_classes_other_bytes

gauge

Memory used for trace buffers, debug structures, and similar data, in bytes.

-

go_memory_classes_profiling_buckets_bytes

gauge

Memory used for profiling stack trace hash maps, in bytes.

-

go_memory_classes_total_bytes

gauge

Total memory mapped by the Go runtime into the current process (read-write), in bytes.

-

Certificate monitoring metrics

Use these metrics to monitor certificate read operations.

Metric name

Type

Description

certwatcher_read_certificate_total

Counter

Counts the total number of certificate reads.

certwatcher_read_certificate_errors_total

Counter

Counts the total number of certificate read errors.