All Products
Search
Document Center

Container Service for Kubernetes:Collect data plane metrics for Gateway with Inference Extension

Last Updated:Aug 21, 2025

The Gateway with Inference Extension component exports data plane metrics to Prometheus. This topic describes how to use Managed Service for Prometheus to monitor the health of the component's data plane.

Prerequisites

Metric collection methods

The inference extension for Gateway with Inference Extension provides comprehensive metrics for generative AI inference services, such as time to first token (TTFT) and token throughput rate. The metric format complies with the OpenTelemetry generative AI semantic conventions.

Manually configure a collection rule

Note

Default service discovery is not required when you manually configure a collection rule.

  1. Log on to the Prometheus console. In the navigation pane on the left, click Integration Center.

  2. In the search box, enter `gateway`. Under Artificial Intelligence, click Gateway with Inference Extension.

  3. In the dialog box that appears, select the target cluster from the Select Container Service Cluster drop-down list and click OK.

    Keep the default configurations in the dialog box.

To get started quickly, you can use the mock application described in Quick start.

Custom collection

The manual collection rule collects all data plane metrics for the component by default. You can also add a custom collection job to customize the monitoring metrics for the Gateway with Inference Extension component. The following code shows a sample custom configuration for common metrics.

scrape_configs:
  - job_name: 'ack-gateway'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - envoy-gateway-system
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_managed_by]
        regex: envoy-gateway
        action: keep
    scrape_interval: 15s
    metrics_path: /stats/prometheus
    scheme: http
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: |
          (envoy_server_live|envoy_server_uptime|envoy_server_memory_allocated|envoy_server_memory_heap_size|
          envoy_cluster_membership_healthy|envoy_cluster_membership_total|envoy_cluster_upstream_cx_active|
          envoy_cluster_upstream_rq_total|envoy_cluster_upstream_cx_rx_bytes_total|envoy_cluster_upstream_cx_tx_bytes_total|
          envoy_http_downstream_cx_rx_bytes_total|envoy_http_downstream_cx_tx_bytes_total|envoy_cluster_upstream_rq_time_bucket|
          envoy_cluster_upstream_rq_xx|envoy_http_downstream_rq_total|envoy_http_downstream_cx_total|envoy_http_downstream_rq_time_bucket|
          envoy_listener_downstream_cx_active|envoy_tcp_downstream_cx_total|envoy_tcp_downstream_cx_rx_bytes_total|
          envoy_tcp_downstream_cx_tx_bytes_total|envoy_cluster_upstream_cx_total)
        action: keep

Metric dashboards

Gateway with Inference Extension also provides Grafana dashboards. To view these dashboards for your cluster, navigate to Operations Management > Prometheus Monitoring > Others.

  • ACK Gateway GenAI: This dashboard displays various metrics for generative AI inference services in the current cluster.

    image

  • Envoy Global: This dashboard provides overall monitoring for the gateway. It includes metrics such as gateway resource usage, an overview of upstream and downstream connections, and endpoint health.

    image

  • Envoy Clusters: This is a dashboard at the Envoy Cluster level. In Envoy, a Cluster represents a set of endpoints. In Gateway with Inference Extension, a Cluster typically represents a routing target, such as the first target service of the first rule in an HTTPRoute. This dashboard provides more detailed information at the Cluster level.

    image