Gateway with Inference Extension組件支援根據OpenTelemetry的產生式AI語義約定(OpenTelemetry Gen AI Semantic Conventions)輸出產生式AI請求的相關指標和日誌。本文介紹如何通過Gateway with Inference Extension組件輸出產生式AI請求的相關指標和日誌。
背景資訊
OpenTelemetry Gen AI Semantic Conventions是為產生式人工智慧(如大型語言模型LLM、文本產生、映像產生等)的監控和追蹤而制定的一套標準化語義約定。其目標是統一產生式AI請求的指標、日誌和追蹤資料,便於跨系統分析和故障排查。該規範的核心目標是:
標準化資料擷取:
定義產生式AI請求的通用屬性(如模型名稱、輸入輸出token數、配置參數)。
支援全鏈路追蹤:
將產生式AI請求與其他系統(如資料庫、API Gateway)的追蹤資料關聯。
統一分析與監控:
通過標準化標籤,便於Prometheus、Grafana等工具彙總和可視化資料。
前提條件
已安裝1.4.0版本的Gateway with Inference ExtensionGateway with Inference Extension並勾選啟用Gateway API推理擴充。操作入口,請參見安裝組件。
已部署mock-vllm應用。
配置輸出可觀測資料
部署產生式AI可觀測外掛程式
Gateway with Inference Extension組件需要結合gen-ai-telemetry可觀測外掛程式來實現可觀測資料的輸出。gen-ai-telemetry可觀測外掛程式採用鏡像形式提供,無固定更新頻率。可查看gen-ai-telemetry外掛程式發布記錄擷取最新版本鏡像。
kubectl apply -f - <<EOF
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyExtensionPolicy
metadata:
name: ack-gateway-llm-telemetry
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: mock-route
wasm:
- name: llm-telemetry
rootID: ack-gateway-extension
code:
type: Image
image:
url: registry-cn-hangzhou.ack.aliyuncs.com/acs/gen-ai-telemetry-wasmplugin:g76f5a66-aliyun
EOFgen-ai-telemetry可觀測外掛程式鏡像支援通過內網拉取,若叢集無法通過公網拉取鏡像,可以將鏡像地址改為指定地區的VPC內網端點。例如,叢集地區為華北2(北京),可使用registry-cn-beijing-vpc.ack.aliyuncs.com/acs/gen-ai-telemetry-wasmplugin:{image_tag}來快速擷取鏡像。
配置網關的Metrics Tag規則
在部署mock-vllm應用時,會同步建立名為custom-proxy-config的EnvoyProxy資源。輸出網關的Metrics資料需要在此資源中添加Metrics Tag規則。
編輯EnvoyProxy資源。
kubectl edit envoyproxy custom-proxy-config將以下YAML中的
spec.bootstrap內容更新到custom-proxy-config中。apiVersion: gateway.envoyproxy.io/v1alpha1 kind: EnvoyProxy metadata: name: custom-proxy-config namespace: default spec: bootstrap: type: JSONPatch jsonPatches: - op: add path: /stats_config value: stats_tags: - tag_name: gen_ai.operation.name regex: "(\\|gen_ai.operation.name=([^|]*))" - tag_name: gen_ai.system regex: "(\\|gen_ai.system=([^|]*))" - tag_name: gen_ai.token.type regex: "(\\|gen_ai.token.type=([^|]*))" - tag_name: gen_ai.request.model regex: "(\\|gen_ai.request.model=([^|]*))" - tag_name: gen_ai.response.model regex: "(\\|gen_ai.response.model=([^|]*))" - tag_name: gen_ai.error.type regex: "(\\|gen_ai.error.type=([^|]*))" - tag_name: server.port regex: "(\\|server.port=([^|]*))" - tag_name: server.address regex: "(\\|server.address=([^|]*))"儲存並退出後,配置即時生效。此時網關已經可以輸出產生式AI相關的Metrics資料。
配置日誌輸出
輸出網關日誌同樣需要修改EnvoyProxy資源。可根據實際需求添加相應配置。
編輯EnvoyProxy資源。
kubectl edit envoyproxy custom-proxy-config將以下YAML中的
spec.telemetry內容更新到custom-proxy-config中。apiVersion: gateway.envoyproxy.io/v1alpha1 kind: EnvoyProxy metadata: name: custom-proxy-config namespace: default spec: telemetry: accessLog: disable: false settings: - sinks: - type: File file: path: /dev/stdout format: type: JSON json: # 預設的訪問日誌欄位 start_time: "%START_TIME%" method: "%REQ(:METHOD)%" x-envoy-origin-path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%" protocol: "%PROTOCOL%" response_code: "%RESPONSE_CODE%" response_flags: "%RESPONSE_FLAGS%" response_code_details: "%RESPONSE_CODE_DETAILS%" connection_termination_details: "%CONNECTION_TERMINATION_DETAILS%" upstream_transport_failure_reason: "%UPSTREAM_TRANSPORT_FAILURE_REASON%" bytes_received: "%BYTES_RECEIVED%" bytes_sent: "%BYTES_SENT%" duration: "%DURATION%" x-envoy-upstream-service-time: "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%" x-forwarded-for: "%REQ(X-FORWARDED-FOR)%" user-agent: "%REQ(USER-AGENT)%" x-request-id: "%REQ(X-REQUEST-ID)%" :authority: "%REQ(:AUTHORITY)%" upstream_host: "%UPSTREAM_HOST%" upstream_cluster: "%UPSTREAM_CLUSTER%" upstream_local_address: "%UPSTREAM_LOCAL_ADDRESS%" downstream_local_address: "%DOWNSTREAM_LOCAL_ADDRESS%" downstream_remote_address: "%DOWNSTREAM_REMOTE_ADDRESS%" requested_server_name: "%REQUESTED_SERVER_NAME%" route_name: "%ROUTE_NAME%" # 新增產生式AI請求相關資訊 gen_ai.operation.name: "%FILTER_STATE(wasm.gen_ai.operation.name:PLAIN)%" gen_ai.system: "%FILTER_STATE(wasm.gen_ai.system:PLAIN)%" gen_ai.request.model: "%FILTER_STATE(wasm.gen_ai.request.model:PLAIN)%" gen_ai.response.model: "%FILTER_STATE(wasm.gen_ai.response.model:PLAIN)%" gen_ai.error.type: "%FILTER_STATE(wasm.gen_ai.error.type:PLAIN)%" gen_ai.prompt.tokens: "%FILTER_STATE(wasm.gen_ai.prompt.tokens:PLAIN)%" gen_ai.completion.tokens: "%FILTER_STATE(wasm.gen_ai.completion.tokens:PLAIN)%" gen_ai.server.time_per_output_token: "%FILTER_STATE(wasm.gen_ai.server.time_per_output_token:PLAIN)%" gen_ai.server.time_to_first_token: "%FILTER_STATE(wasm.gen_ai.server.time_to_first_token:PLAIN)%"
發起測試請求
多次執行發起測試中的步驟,產生網關的可觀測資料。
查看可觀測資料
擷取網關工作負載的名稱。
export GATEWAY_DEPLOYMENT=$(kubectl -n envoy-gateway-system get deployment -l gateway.envoyproxy.io/owning-gateway-name=mock-gateway -o jsonpath='{.items[0].metadata.name}') echo $GATEWAY_DEPLOYMENT在本地監聽網關的admin連接埠。
kubectl -n envoy-gateway-system port-forward deployments/$GATEWAY_DEPLOYMENT 19000:19000重新開啟一個終端視窗,擷取網關Metrics資料。
curl -s localhost:19000/stats/prometheus | grep gen_ai預期輸出:
# TYPE gen_ai_client_operation_duration histogram gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="0.5"} 0 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="1"} 0 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="5"} 9 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="10"} 9 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="25"} 14 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="50"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="100"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="250"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="500"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="1000"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="2500"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="5000"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="10000"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="30000"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="60000"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="300000"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="600000"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="1800000"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="3600000"} 16 gen_ai_client_operation_duration_bucket{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000",le="+Inf"} 16 gen_ai_client_operation_duration_sum{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000"} 140.9499999999999886313162278384 gen_ai_client_operation_duration_count{gen_ai_operation_name="chat",gen_ai_system="example.com",gen_ai_request_model="mock",gen_ai_response_model="mock",gen_ai_error_type="",server_port="8000",server_address="10.3.0.9:8000"} 16查看訪問日誌。
kubectl -n envoy-gateway-system logs deployments/$GATEWAY_DEPLOYMENT | tail -1預期輸出:
Defaulted container "envoy" out of: envoy, shutdown-manager { ":authority": "example.com", "bytes_received": 184, "bytes_sent": 355, "connection_termination_details": null, "downstream_local_address": "10.3.0.38:10080", "downstream_remote_address": "10.3.15.252:45492", "duration": 2, "gen_ai.completion.tokens": "76", "gen_ai.error.type": "", "gen_ai.operation.name": "chat", "gen_ai.prompt.tokens": "18", "gen_ai.request.model": "mock", "gen_ai.response.model": "mock", "gen_ai.server.time_per_output_token": "0", "gen_ai.server.time_to_first_token": "2", "gen_ai.system": "example.com", "method": "POST", "protocol": "HTTP/1.1", "requested_server_name": null, "response_code": 200, "response_code_details": "via_upstream", "response_flags": "-", "route_name": "httproute/default/mock-route/rule/0/match/0/*", "start_time": "2025-05-28T06:13:31.190Z", "upstream_cluster": "httproute/default/mock-route/rule/0/backend/0", "upstream_host": "10.3.0.9:8000", "upstream_local_address": "10.3.0.38:33370", "upstream_transport_failure_reason": null, "user-agent": "curl/8.8.0", "x-envoy-origin-path": "/v1/chat/completions", "x-envoy-upstream-service-time": null, "x-forwarded-for": "10.3.15.252", "x-request-id": "0e67d734-aca7-4c80-bda3-79641cd63e2c" }對應的指標說明和日誌欄位含義,請參見OpenTelemetry Gen AI Semantic Conventions。
FAQ
如何解決網關請求報錯413 Request Entity Too Large?
問題原因: 開啟可觀測外掛程式後,網關需要緩衝整個請求體來解析內容。如果請求體過大,超過了預設的緩衝區限制,就會導致請求失敗並返回HTTP錯誤碼413 Request Entity Too Large。
解決方案: 可通過建立一個 ClientTrafficPolicy 資源來調大網關的緩衝區限制。
建立一個名為
client-buffer-limit.yaml的檔案,內容如下。請將${網關名稱}替換為實際的網關名稱(即Gateway資源的metadata.name)。執行以下命令應用該配置:
kubectl apply -f client-buffer-limit.yaml
gen-ai-telemetry外掛程式發布記錄
鏡像標籤 | 發布時間 | 描述 |
g31af794-aliyun | 2025年10月 | 最佳化:
|
g76f5a66-aliyun | 2025年8月 | 最佳化:
|
g2ad0869-aliyun | 2025年5月 | 新功能:
|