全部產品
Search
文件中心

Container Service for Kubernetes:使用Gateway with Inference Extension觀測產生式AI請求

更新時間:Oct 23, 2025

Gateway with Inference Extension組件支援根據OpenTelemetry的產生式AI語義約定(OpenTelemetry Gen AI Semantic Conventions)輸出產生式AI請求的相關指標和日誌。本文介紹如何通過Gateway with Inference Extension組件輸出產生式AI請求的相關指標和日誌。

背景資訊

OpenTelemetry Gen AI Semantic Conventions是為產生式人工智慧(如大型語言模型LLM、文本產生、映像產生等)的監控和追蹤而制定的一套標準化語義約定。其目標是統一產生式AI請求的指標、日誌和追蹤資料,便於跨系統分析和故障排查。該規範的核心目標是:

  • 標準化資料擷取:

    定義產生式AI請求的通用屬性(如模型名稱、輸入輸出token數、配置參數)。

  • 支援全鏈路追蹤:

    將產生式AI請求與其他系統(如資料庫、API Gateway)的追蹤資料關聯。

  • 統一分析與監控:

    通過標準化標籤,便於Prometheus、Grafana等工具彙總和可視化資料。

前提條件

配置輸出可觀測資料

部署產生式AI可觀測外掛程式

Gateway with Inference Extension組件需要結合gen-ai-telemetry可觀測外掛程式來實現可觀測資料的輸出。gen-ai-telemetry可觀測外掛程式採用鏡像形式提供,無固定更新頻率。可查看gen-ai-telemetry外掛程式發布記錄擷取最新版本鏡像。

kubectl apply -f - <<EOF
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyExtensionPolicy
metadata:
  name: ack-gateway-llm-telemetry
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: mock-route
  wasm:
  - name: llm-telemetry
    rootID: ack-gateway-extension
    code:
      type: Image
      image:
        url: registry-cn-hangzhou.ack.aliyuncs.com/acs/gen-ai-telemetry-wasmplugin:g76f5a66-aliyun
EOF

gen-ai-telemetry可觀測外掛程式鏡像支援通過內網拉取,若叢集無法通過公網拉取鏡像,可以將鏡像地址改為指定地區的VPC內網端點。例如,叢集地區為華北2(北京),可使用registry-cn-beijing-vpc.ack.aliyuncs.com/acs/gen-ai-telemetry-wasmplugin:{image_tag}來快速擷取鏡像。

配置網關的Metrics Tag規則

在部署mock-vllm應用時,會同步建立名為custom-proxy-config的EnvoyProxy資源。輸出網關的Metrics資料需要在此資源中添加Metrics Tag規則。

  1. 編輯EnvoyProxy資源。

    kubectl edit envoyproxy custom-proxy-config
  2. 將以下YAML中的spec.bootstrap內容更新到custom-proxy-config中。

    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyProxy
    metadata:
      name: custom-proxy-config
      namespace: default
    spec:
      bootstrap:
        type: JSONPatch
        jsonPatches:
        - op: add
          path: /stats_config
          value:
            stats_tags:
              - tag_name: gen_ai.operation.name
                regex: "(\\|gen_ai.operation.name=([^|]*))"
              - tag_name: gen_ai.system
                regex: "(\\|gen_ai.system=([^|]*))"
              - tag_name: gen_ai.token.type
                regex: "(\\|gen_ai.token.type=([^|]*))"
              - tag_name: gen_ai.request.model
                regex: "(\\|gen_ai.request.model=([^|]*))"
              - tag_name: gen_ai.response.model
                regex: "(\\|gen_ai.response.model=([^|]*))"
              - tag_name: gen_ai.error.type
                regex: "(\\|gen_ai.error.type=([^|]*))"
              - tag_name: server.port
                regex: "(\\|server.port=([^|]*))"
              - tag_name: server.address
                regex: "(\\|server.address=([^|]*))"

    儲存並退出後,配置即時生效。此時網關已經可以輸出產生式AI相關的Metrics資料。

配置日誌輸出

輸出網關日誌同樣需要修改EnvoyProxy資源。可根據實際需求添加相應配置。

  1. 編輯EnvoyProxy資源。

    kubectl edit envoyproxy custom-proxy-config
  2. 將以下YAML中的spec.telemetry內容更新到custom-proxy-config中。

    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyProxy
    metadata:
      name: custom-proxy-config
      namespace: default
    spec:
      telemetry:
        accessLog:
          disable: false
          settings:
          - sinks:
            - type: File
              file:
                path: /dev/stdout
            format:
              type: JSON
              json:
                # 預設的訪問日誌欄位
                start_time: "%START_TIME%"
                method: "%REQ(:METHOD)%"
                x-envoy-origin-path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
                protocol: "%PROTOCOL%"
                response_code: "%RESPONSE_CODE%"
                response_flags: "%RESPONSE_FLAGS%"
                response_code_details: "%RESPONSE_CODE_DETAILS%"
                connection_termination_details: "%CONNECTION_TERMINATION_DETAILS%"
                upstream_transport_failure_reason: "%UPSTREAM_TRANSPORT_FAILURE_REASON%"
                bytes_received: "%BYTES_RECEIVED%"
                bytes_sent: "%BYTES_SENT%"
                duration: "%DURATION%"
                x-envoy-upstream-service-time: "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%"
                x-forwarded-for: "%REQ(X-FORWARDED-FOR)%"
                user-agent: "%REQ(USER-AGENT)%"
                x-request-id: "%REQ(X-REQUEST-ID)%"
                :authority: "%REQ(:AUTHORITY)%"
                upstream_host: "%UPSTREAM_HOST%"
                upstream_cluster: "%UPSTREAM_CLUSTER%"
                upstream_local_address: "%UPSTREAM_LOCAL_ADDRESS%"
                downstream_local_address: "%DOWNSTREAM_LOCAL_ADDRESS%"
                downstream_remote_address: "%DOWNSTREAM_REMOTE_ADDRESS%"
                requested_server_name: "%REQUESTED_SERVER_NAME%"
                route_name: "%ROUTE_NAME%"
                # 新增產生式AI請求相關資訊
                gen_ai.operation.name: "%FILTER_STATE(wasm.gen_ai.operation.name:PLAIN)%"
                gen_ai.system: "%FILTER_STATE(wasm.gen_ai.system:PLAIN)%"
                gen_ai.request.model: "%FILTER_STATE(wasm.gen_ai.request.model:PLAIN)%"
                gen_ai.response.model: "%FILTER_STATE(wasm.gen_ai.response.model:PLAIN)%"
                gen_ai.error.type: "%FILTER_STATE(wasm.gen_ai.error.type:PLAIN)%"
                gen_ai.prompt.tokens: "%FILTER_STATE(wasm.gen_ai.prompt.tokens:PLAIN)%"
                gen_ai.completion.tokens: "%FILTER_STATE(wasm.gen_ai.completion.tokens:PLAIN)%"
                gen_ai.server.time_per_output_token: "%FILTER_STATE(wasm.gen_ai.server.time_per_output_token:PLAIN)%"
                gen_ai.server.time_to_first_token: "%FILTER_STATE(wasm.gen_ai.server.time_to_first_token:PLAIN)%"

發起測試請求

多次執行發起測試中的步驟,產生網關的可觀測資料。

查看可觀測資料

  1. 擷取網關工作負載的名稱。

    export GATEWAY_DEPLOYMENT=$(kubectl -n envoy-gateway-system get deployment -l gateway.envoyproxy.io/owning-gateway-name=mock-gateway -o jsonpath='{.items[0].metadata.name}')
    echo $GATEWAY_DEPLOYMENT
  2. 在本地監聽網關的admin連接埠。

    kubectl -n envoy-gateway-system port-forward deployments/$GATEWAY_DEPLOYMENT 19000:19000
  3. 重新開啟一個終端視窗,擷取網關Metrics資料。

    curl -s localhost:19000/stats/prometheus | grep gen_ai

    預期輸出:

    # TYPE gen_ai_client_operation_duration histogram
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="0.5"} 0
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="1"} 0
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="5"} 9
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="10"} 9
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="25"} 14
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="50"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="100"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="250"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="500"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="1000"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="2500"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="5000"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="10000"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="30000"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="60000"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="300000"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="600000"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="1800000"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="3600000"} 16
    gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="+Inf"} 16
    gen_ai_client_operation_duration_sum{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000"} 140.9499999999999886313162278384
    gen_ai_client_operation_duration_count{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000"} 16
  4. 查看訪問日誌。

    kubectl -n envoy-gateway-system logs deployments/$GATEWAY_DEPLOYMENT | tail -1

    預期輸出:

    Defaulted container "envoy" out of: envoy, shutdown-manager 
    {                                                                                                                                                                          
     ":authority": "example.com",                                                                                                                                             
     "bytes_received": 184,                                                                                                                                                   
     "bytes_sent": 355,                                                                                                                                                       
     "connection_termination_details": null,                                                                                                                                  
     "downstream_local_address": "10.3.0.38:10080",                                                                                                                           
     "downstream_remote_address": "10.3.15.252:45492",                                                                                                                        
     "duration": 2,                                                                                                                                                           
     "gen_ai.completion.tokens": "76",                                                                                                                                        
     "gen_ai.error.type": "",                                                                                                                                                 
     "gen_ai.operation.name": "chat",                                                                                                                                         
     "gen_ai.prompt.tokens": "18",                                                                                                                                            
     "gen_ai.request.model": "mock",                                                                                                                                          
     "gen_ai.response.model": "mock",                                                                                                                                         
     "gen_ai.server.time_per_output_token": "0",                                                                                                                              
     "gen_ai.server.time_to_first_token": "2",                                                                                                                                
     "gen_ai.system": "example.com",                                                                                                                                          
     "method": "POST",                                                                                                                                                        
     "protocol": "HTTP/1.1",                                                                                                                                                  
     "requested_server_name": null,                                                                                                                                           
     "response_code": 200,                                                                                                                                                    
     "response_code_details": "via_upstream",                                                                                                                                 
     "response_flags": "-",                                                                                                                                                   
     "route_name": "httproute/default/mock-route/rule/0/match/0/*",                                                                                                           
     "start_time": "2025-05-28T06:13:31.190Z",                                                                                                                                
     "upstream_cluster": "httproute/default/mock-route/rule/0/backend/0",                                                                                                     
     "upstream_host": "10.3.0.9:8000",                                                                                                                                        
     "upstream_local_address": "10.3.0.38:33370",                                                                                                                             
     "upstream_transport_failure_reason": null,                                                                                                                               
     "user-agent": "curl/8.8.0",                                                                                                                                              
     "x-envoy-origin-path": "/v1/chat/completions",                                                                                                                           
     "x-envoy-upstream-service-time": null,                                                                                                                                   
     "x-forwarded-for": "10.3.15.252",                                                                                                                                        
     "x-request-id": "0e67d734-aca7-4c80-bda3-79641cd63e2c"                                                                                                                   
    } 

    對應的指標說明和日誌欄位含義,請參見OpenTelemetry Gen AI Semantic Conventions

FAQ

如何解決網關請求報錯413 Request Entity Too Large

問題原因: 開啟可觀測外掛程式後,網關需要緩衝整個請求體來解析內容。如果請求體過大,超過了預設的緩衝區限制,就會導致請求失敗並返回HTTP錯誤碼413 Request Entity Too Large。

解決方案: 可通過建立一個 ClientTrafficPolicy 資源來調大網關的緩衝區限制。

  1. 建立一個名為 client-buffer-limit.yaml 的檔案,內容如下。請將 ${網關名稱} 替換為實際的網關名稱(即 Gateway 資源的 metadata.name)。

    client-buffer-limit.yaml

    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: ClientTrafficPolicy
    metadata:
      name: client-buffer-limit
      # 如果網關不在 default 命名空間,請補充 namespace 欄位
      # namespace: 
    spec:
      targetRefs:
        - group: gateway.networking.k8s.io
          kind: Gateway
          name: ${網關名稱}
      connection:
        bufferLimit: 20Mi     # 可根據需要調整大小
  2. 執行以下命令應用該配置:

    kubectl apply -f client-buffer-limit.yaml

gen-ai-telemetry外掛程式發布記錄

鏡像標籤

發布時間

描述

g31af794-aliyun

2025年10月

最佳化:

  • 修複“請求體過大時偶發的推理請求400”的問題。

g76f5a66-aliyun

2025年8月

最佳化:

  • 修複“流式請求token數記錄不準確”的問題。

g2ad0869-aliyun

2025年5月

新功能:

  • 支援產生式AI請求的監控指標和日誌增強。