流量觀測：使用ASM高效管理LLM流量 - Alibaba Cloud Service Mesh

除了上一篇文檔中涉及到的LLM請求路由能力，ASM還在多個方面對可觀測能力進行了增強，用來適應LLM情境中更加進階的觀測需求。本文將介紹如何使用ASM的訪問日誌和監控指標來觀測LLM請求資訊。

重要

為了展示更加豐富的效果，本文示範建立在完整完成了流量路由：使用ASM高效管理LLM流量中所有步驟的基礎上。若您只完成了步驟一和步驟二，使用步驟二的測試命令發送測試請求即可，查看可觀測資料使用的命令與本文相同。

步驟一：使用訪問日誌觀測LLM請求

配置訪問日誌

ASM對LLM請求日誌進行了增強，您只需要在自訂訪問日誌格式中做相應的配置，即可在訪問日誌中查看這些資訊。具體操作，請參見自訂資料面訪問日誌。

登入ASM控制台，在左側導覽列，選擇服務網格 > 網格管理。
在網格管理頁面，單擊目標執行個體名稱，然後在左側導覽列，選擇可觀測管理中心 > 可觀測配置。
在全域的日誌設定中，新增三個欄位，如下圖：
具體常值內容如下：
```
request_model				FILTER_STATE(wasm.asm.llmproxy.request_model:PLAIN)
request_prompt_tokens			FILTER_STATE(wasm.asm.llmproxy.request_prompt_tokens:PLAIN)
request_completion_tokens		FILTER_STATE(wasm.asm.llmproxy.request_completion_tokens:PLAIN)
```
這三個欄位含義分別為：
- request_model：當前LLM請求的實際model，比如qwen-turbo或qwen1.5-72b-chat。
- request_prompt_tokens：當前請求的輸入token數量。
- request_completion_tokens：當前請求輸出的token數量。
當前的大模型服務提供者大都以token消耗量來計費。使用者可以基於此資料，精準的查看當前token的消耗請求，以及請求具體使用了哪些模型。

驗證

使用ACK的kubeconfig分別執行以下兩個命令。

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {"role": "user", "content": "請介紹你自己"}
    ]
}'

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
--header 'Content-Type: application/json' \
--header 'user-type: subscriber' \
--data '{
    "messages": [
        {"role": "user", "content": "請介紹你自己"}
    ]
}'

執行以下命令查看訪問日誌。

kubectl logs deployments/sleep -c istio-proxy | tail -2

預期輸出：

{"bytes_received":"85","bytes_sent":"617","downstream_local_address":"47.93.xxx.xx:80","downstream_remote_address":"192.168.34.235:39066","duration":"7640","istio_policy_status":"-","method":"POST","path":"/compatible-mode/v1/chat/completions","protocol":"HTTP/1.1","request_id":"d0e17f66-f300-411a-8c32-xxxxxxxxxxxxx","requested_server_name":"-","response_code":"200","response_flags":"-","route_name":"-","start_time":"2024-07-12T03:20:03.993Z","trace_id":"-","upstream_cluster":"outbound|80||dashscope.aliyuncs.com","upstream_host":"47.93.xxx.xx:443","upstream_local_address":"192.168.34.235:38476","upstream_service_time":"7639","upstream_response_time":"7639","upstream_transport_failure_reason":"-","user_agent":"curl/8.8.0","x_forwarded_for":"-","authority_for":"dashscope.aliyuncs.com","request_model":"qwen1.5-72b-chat","request_prompt_tokens":"3","request_completion_tokens":"55"}
{"bytes_received":"85","bytes_sent":"809","downstream_local_address":"47.93.xxx.xx:80","downstream_remote_address":"192.168.34.235:41090","duration":"2759","istio_policy_status":"-","method":"POST","path":"/compatible-mode/v1/chat/completions","protocol":"HTTP/1.1","request_id":"d89faada-6af3-4ac3-b4fd-xxxxxxxxxxxxx","requested_server_name":"-","response_code":"200","response_flags":"-","route_name":"vip-route","start_time":"2024-07-12T03:20:30.854Z","trace_id":"-","upstream_cluster":"outbound|80||dashscope.aliyuncs.com","upstream_host":"47.93.xxx.xx:443","upstream_local_address":"192.168.34.235:38476","upstream_service_time":"2759","upstream_response_time":"2759","upstream_transport_failure_reason":"-","user_agent":"curl/8.8.0","x_forwarded_for":"-","authority_for":"dashscope.aliyuncs.com","request_model":"qwen-turbo","request_prompt_tokens":"11","request_completion_tokens":"90"}

對日誌內容進行格式化和內容處理後結果分別如下。

{
    "duration": "7640",
    "response_code": "200",
    "authority_for": "dashscope.aliyuncs.com",  --實際訪問的大模型provider
    "request_model": "qwen1.5-72b-chat",    	--當前請求使用的模型
    "request_prompt_tokens": "3",		--當前請求的輸入token數
    "request_completion_tokens": "55"		--當前請求的輸出token數
}

{
  "duration": "2759",
  "response_code": "200",
  "authority_for": "dashscope.aliyuncs.com",  --實際訪問的大模型provider
  "request_model": "qwen-turbo",    	      --當前請求使用的模型
  "request_prompt_tokens": "11",	      --當前請求的輸入token數
  "request_completion_tokens": "90"	      --當前請求的輸出token數 
}

ASM已經和阿里雲Log Service進行了整合，您可以通過訪問日誌觀測到請求層級的LLM調用情況。您也可以直接將日誌採集並儲存起來，基於這些訪問日誌可以定製特定的警示規則以及更加清晰的日誌大盤。具體操作，請參見啟用資料平面日誌採集。

步驟二：新增指標，展示當前工作負載消耗的token數

訪問日誌是細粒度的資訊記錄，監控指標可以代表更加宏觀的資料。ASM的網格代理支援以監控指標的形式輸出工作負載層級的token消耗數，您可以通過這些指標即時觀測到當前工作負載的token消耗情況。

ASM新增兩個指標：

asm_llm_proxy_prompt_tokens：輸入token數。
asm_llm_proxy_completion_tokens：輸出token數。

這兩個指標預設具有以下維度：

llmproxy_source_workload：發出請求的工作負載名稱。
llmproxy_source_workload_namespace：請求源所在的命名空間。
llmproxy_destination_service：目標provider。
llmproxy_model：當前請求的模型。

修改工作負載配置，輸出新增指標

本步驟以default命名空間下的sleep deployment為例。

使用ACK的kubeconfig建立asm-llm-proxy-bootstrap-config.yaml。

apiVersion: v1
kind: ConfigMap
metadata:
  name: asm-llm-proxy-bootstrap-config
data:
  custom_bootstrap.json: |
    "stats_config": {
      "stats_tags":[
        {
        "tag_name": "llmproxy_source_workload",
        "regex": "(\\|llmproxy_source_workload=([^|]*))"
        },
        {
          "tag_name": "llmproxy_source_workload_namespace",
          "regex": "(\\|llmproxy_source_workload_namespace=([^|]*))"
        },
        {
          "tag_name": "llmproxy_destination_service",
          "regex": "(\\|llmproxy_destination_service=([^|]*))"
        },
        {
          "tag_name": "llmproxy_model",
          "regex": "(\\|llmproxy_model=([^|]*))"
        }
      ]
    }

執行以下命令，建立名為asm-llm-proxy-bootstrap-config的ConfigMap。
```
kubectl apply -f asm-llm-proxy-bootstrap-config.yaml
```

執行以下命令，修改sleep的deployment，為Pod增加一個annotation。

kubectl patch deployment sleep -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.istio.io/bootstrapOverride":"asm-llm-proxy-bootstrap-config"}}}}}'

驗證

分別執行以下兩條命令，發起測試請求。

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {"role": "user", "content": "請介紹你自己"}
    ]
}'

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
--header 'Content-Type: application/json' \
--header 'user-type: subscriber' \
--data '{
    "messages": [
        {"role": "user", "content": "請介紹你自己"}
    ]
}'

執行以下命令，查看sleep應用的Sidecar輸出的Prometheus指標。

kubectl exec deployments/sleep -it -c istio-proxy -- curl localhost:15090/stats/prometheus | grep llmproxy

預期輸出：

asm_llm_proxy_completion_tokens{llmproxy_source_workload="sleep",llmproxy_source_workload_namespace="default",llmproxy_destination_service="dashscope.aliyuncs.com",llmproxy_model="qwen1.5-72b-chat"} 72
asm_llm_proxy_completion_tokens{llmproxy_source_workload="sleep",llmproxy_source_workload_namespace="default",llmproxy_destination_service="dashscope.aliyuncs.com",llmproxy_model="qwen-turbo"} 85
asm_llm_proxy_prompt_tokens{llmproxy_source_workload="sleep",llmproxy_source_workload_namespace="default",llmproxy_destination_service="dashscope.aliyuncs.com",llmproxy_model="qwen1.5-72b-chat"} 3
asm_llm_proxy_prompt_tokens{llmproxy_source_workload="sleep",llmproxy_source_workload_namespace="default",llmproxy_destination_service="dashscope.aliyuncs.com",llmproxy_model="qwen-turbo"} 11

可以看到sidecar已經輸出了對應的指標。並且攜帶了4個預設的維度。

ASM已經整合了ARMS服務，您可以通過配置採集規則將這些指標採集到ARMS Prometheus中，進行更詳細的分析及展示。具體操作，請參見將監控指標採集到可觀測監控Prometheus版。

步驟三：給服務網格原生指標新增LLM相關維度

服務網格預設提供了諸多指標，可以展示HTTP或TCP協議的詳細資料，並且這些指標提供了豐富的維度，ASM也已經根據這些指標以及維度內建了功能強大的Prometheus Dashboard。

但是這些指標目前並不具有LLM請求的資訊，為此ASM進行了最佳化，您可以通過自訂指標維度，為已有指標添加LLM請求資訊。

配置自訂維度：model

本小節以REQUEST_COUNT指標為例，為其增加model維度。

登入ASM控制台，在左側導覽列，選擇服務網格 > 網格管理。
在網格管理頁面，單擊目標執行個體名稱，然後在左側導覽列，選擇可觀測管理中心 > 可觀測配置。
單擊REQUEST_COUNT的編輯維度，選擇自訂維度標籤，添加自訂維度：model，取值為：filter_state["wasm.asm.llmproxy.request_model"]。

驗證

分別執行以下兩條命令，發起測試請求。

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {"role": "user", "content": "請介紹你自己"}
    ]
}'

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
--header 'Content-Type: application/json' \
--header 'user-type: subscriber' \
--data '{
    "messages": [
        {"role": "user", "content": "請介紹你自己"}
    ]
}'

執行以下命令，查看sleep應用的Sidecar輸出的Prometheus指標。

kubectl exec deployments/sleep -it -c istio-proxy -- curl localhost:15090/stats/prometheus | grep llmproxy

預期輸出：

istio_requests_total{reporter="source",source_workload="sleep",source_canonical_service="sleep",source_canonical_revision="latest",source_workload_namespace="default",source_principal="unknown",source_app="sleep",source_version="",source_cluster="cce8d2c1d1e8d4abc8d5c180d160669cc",destination_workload="unknown",destination_workload_namespace="unknown",destination_principal="unknown",destination_app="unknown",destination_version="unknown",destination_service="dashscope.aliyuncs.com",destination_canonical_service="unknown",destination_canonical_revision="latest",destination_service_name="dashscope.aliyuncs.com",destination_service_namespace="unknown",destination_cluster="unknown",request_protocol="http",response_code="200",grpc_response_status="",response_flags="-",connection_security_policy="unknown",model="qwen1.5-72b-chat"} 1
istio_requests_total{reporter="source",source_workload="sleep",source_canonical_service="sleep",source_canonical_revision="latest",source_workload_namespace="default",source_principal="unknown",source_app="sleep",source_version="",source_cluster="cce8d2c1d1e8d4abc8d5c180d160669cc",destination_workload="unknown",destination_workload_namespace="unknown",destination_principal="unknown",destination_app="unknown",destination_version="unknown",destination_service="dashscope.aliyuncs.com",destination_canonical_service="unknown",destination_canonical_revision="latest",destination_service_name="dashscope.aliyuncs.com",destination_service_namespace="unknown",destination_cluster="unknown",request_protocol="http",response_code="200",grpc_response_status="",response_flags="-",connection_security_policy="unknown",model="qwen-turbo"} 1

可以看出，請求的model已經作為一個指標被添加到了istio_requests_total中。

得到上述監控指標後，您可以在ARMS配置分析規則，進行更細緻的分析。例如：

訪問某個模型的請求成功率。
某個model或者provider的平均響應延遲。

總結

本文在流量路由：使用ASM高效管理LLM流量的基礎上，介紹了如何使用ASM對LLM流量進行細粒度以及宏觀的觀測。在服務網格原生的可觀測能力上，您只需要稍加修改叢集中的配置就可以實現多維度可觀測功能。ASM會持續增強LLM流量的可觀測能力，為您提供更加細緻和靈活的方案。