The Gateway with Inference Extension component exports data plane metrics to Prometheus. This topic describes how to use Managed Service for Prometheus to monitor the health of the component's data plane.
Prerequisites
Gateway with Inference Extension 1.4.0 is installed and Enable Gateway API Inference Extension is selected. For more information about the operation entry, see Step 2: Install the Gateway with Inference Extension component.
You have activated Managed Service for Prometheus.
Metric collection methods
The inference extension for Gateway with Inference Extension provides comprehensive metrics for generative AI inference services, such as time to first token (TTFT) and token throughput rate. The metric format complies with the OpenTelemetry generative AI semantic conventions.
Manually configure a collection rule
Default service discovery is not required when you manually configure a collection rule.
Log on to the Prometheus console. In the navigation pane on the left, click Integration Center.
In the search box, enter `gateway`. Under Artificial Intelligence, click Gateway with Inference Extension.
In the dialog box that appears, select the target cluster from the Select Container Service Cluster drop-down list and click OK.
Keep the default configurations in the dialog box.
To get started quickly, you can use the mock application described in Quick start.
Custom collection
The manual collection rule collects all data plane metrics for the component by default. You can also add a custom collection job to customize the monitoring metrics for the Gateway with Inference Extension component. The following code shows a sample custom configuration for common metrics.
scrape_configs:
- job_name: 'ack-gateway'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- envoy-gateway-system
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_managed_by]
regex: envoy-gateway
action: keep
scrape_interval: 15s
metrics_path: /stats/prometheus
scheme: http
metric_relabel_configs:
- source_labels: [__name__]
regex: |
(envoy_server_live|envoy_server_uptime|envoy_server_memory_allocated|envoy_server_memory_heap_size|
envoy_cluster_membership_healthy|envoy_cluster_membership_total|envoy_cluster_upstream_cx_active|
envoy_cluster_upstream_rq_total|envoy_cluster_upstream_cx_rx_bytes_total|envoy_cluster_upstream_cx_tx_bytes_total|
envoy_http_downstream_cx_rx_bytes_total|envoy_http_downstream_cx_tx_bytes_total|envoy_cluster_upstream_rq_time_bucket|
envoy_cluster_upstream_rq_xx|envoy_http_downstream_rq_total|envoy_http_downstream_cx_total|envoy_http_downstream_rq_time_bucket|
envoy_listener_downstream_cx_active|envoy_tcp_downstream_cx_total|envoy_tcp_downstream_cx_rx_bytes_total|
envoy_tcp_downstream_cx_tx_bytes_total|envoy_cluster_upstream_cx_total)
action: keepMetric dashboards
Gateway with Inference Extension also provides Grafana dashboards. To view these dashboards for your cluster, navigate to .
ACK Gateway GenAI: This dashboard displays various metrics for generative AI inference services in the current cluster.

Envoy Global: This dashboard provides overall monitoring for the gateway. It includes metrics such as gateway resource usage, an overview of upstream and downstream connections, and endpoint health.

Envoy Clusters: This is a dashboard at the Envoy Cluster level. In Envoy, a Cluster represents a set of endpoints. In Gateway with Inference Extension, a Cluster typically represents a routing target, such as the first target service of the first rule in an HTTPRoute. This dashboard provides more detailed information at the Cluster level.
