When multiple workloads call LLM providers, tracking token consumption and model usage per request becomes difficult without infrastructure-level telemetry. Service Mesh (ASM) captures LLM-specific metadata -- model name, input tokens, and output tokens -- directly in the sidecar proxy, so you can monitor costs, debug requests, and analyze model performance without modifying application code.
ASM provides three levels of LLM observability, each building on the previous:
| Capability | What it tracks | Use case |
|---|---|---|
| Access logs | Per-request model name, input tokens, output tokens | Debug individual requests, audit per-request costs |
| Token consumption metrics | Aggregated token counts per workload and model | Monitor token usage in real time, set alerting thresholds |
| Custom metric dimensions | LLM model as a dimension on native Istio metrics (istio_requests_total) | Analyze success rates and latency by model |
Prerequisites
Before you begin, make sure that you have:
A Service Mesh (ASM) instance
A Container Service for Kubernetes (ACK) cluster added to the mesh
Completion of at least Step 1 and Step 2 in Use ASM to route LLM traffic
The examples below build on all steps from the traffic routing guide. If you completed only Step 1 and Step 2, use the test commands from Step 2 to generate traffic for the verification steps.
Add LLM fields to access logs
Most LLM providers charge by token usage. By adding LLM-specific fields to sidecar access logs, you get per-request visibility into which model handled each request and how many tokens it consumed -- enabling direct cost tracking from infrastructure logs.
For background on access log customization, see Custom data plane access logs.
Configure log fields
Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.
On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Observability Management Center > Observability Settings.
In the Log Settings section, add the following three fields:
Field name FILTER_STATE expression Description request_modelFILTER_STATE(wasm.asm.llmproxy.request_model:PLAIN)Model used for the request (for example, qwen-turboorqwen1.5-72b-chat)request_prompt_tokensFILTER_STATE(wasm.asm.llmproxy.request_prompt_tokens:PLAIN)Number of input tokens request_completion_tokensFILTER_STATE(wasm.asm.llmproxy.request_completion_tokens:PLAIN)Number of output tokens 
Verify access logs
Send two test requests using the kubeconfig file of the ACK cluster. Run each command separately:
kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ {"role": "user", "content": "Please introduce yourself."} ] }'kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \ --header 'Content-Type: application/json' \ --header 'user-type: subscriber' \ --data '{ "messages": [ {"role": "user", "content": "Please introduce yourself."} ] }'View the access logs: Expected output:
kubectl logs deployments/sleep -c istio-proxy | tail -2{"bytes_received":"85","bytes_sent":"617","downstream_local_address":"47.93.xxx.xx:80","downstream_remote_address":"192.168.34.235:39066","duration":"7640","istio_policy_status":"-","method":"POST","path":"/compatible-mode/v1/chat/completions","protocol":"HTTP/1.1","request_id":"d0e17f66-f300-411a-8c32-xxxxxxxxxxxxx","requested_server_name":"-","response_code":"200","response_flags":"-","route_name":"-","start_time":"2024-07-12T03:20:03.993Z","trace_id":"-","upstream_cluster":"outbound|80||dashscope.aliyuncs.com","upstream_host":"47.93.xxx.xx:443","upstream_local_address":"192.168.34.235:38476","upstream_service_time":"7639","upstream_response_time":"7639","upstream_transport_failure_reason":"-","user_agent":"curl/8.8.0","x_forwarded_for":"-","authority_for":"dashscope.aliyuncs.com","request_model":"qwen1.5-72b-chat","request_prompt_tokens":"3","request_completion_tokens":"55"} {"bytes_received":"85","bytes_sent":"809","downstream_local_address":"47.93.xxx.xx:80","downstream_remote_address":"192.168.34.235:41090","duration":"2759","istio_policy_status":"-","method":"POST","path":"/compatible-mode/v1/chat/completions","protocol":"HTTP/1.1","request_id":"d89faada-6af3-4ac3-b4fd-xxxxxxxxxxxxx","requested_server_name":"-","response_code":"200","response_flags":"-","route_name":"vip-route","start_time":"2024-07-12T03:20:30.854Z","trace_id":"-","upstream_cluster":"outbound|80||dashscope.aliyuncs.com","upstream_host":"47.93.xxx.xx:443","upstream_local_address":"192.168.34.235:38476","upstream_service_time":"2759","upstream_response_time":"2759","upstream_transport_failure_reason":"-","user_agent":"curl/8.8.0","x_forwarded_for":"-","authority_for":"dashscope.aliyuncs.com","request_model":"qwen-turbo","request_prompt_tokens":"11","request_completion_tokens":"90"}The following formatted excerpt highlights the LLM-specific fields from the log output: Each log entry shows the LLM provider (
authority_for), the model that handled the request, and the number of tokens consumed.{ "duration": "7640", "response_code": "200", "authority_for": "dashscope.aliyuncs.com", "request_model": "qwen1.5-72b-chat", "request_prompt_tokens": "3", "request_completion_tokens": "55" }{ "duration": "2759", "response_code": "200", "authority_for": "dashscope.aliyuncs.com", "request_model": "qwen-turbo", "request_prompt_tokens": "11", "request_completion_tokens": "90" }
Forward logs to Simple Log Service (SLS)
ASM integrates with Simple Log Service (SLS) for centralized log collection. After you enable log collection, you can:
Search and filter logs by model name, token count, or response code
Create alerting rules -- for example, alert when a single request exceeds a token threshold
Build dashboards for LLM usage analytics
For setup instructions, see Enable data plane log collection.
Export token consumption as Prometheus metrics
Access logs capture per-request detail. For aggregated, real-time monitoring, configure the sidecar proxy to export token consumption as Prometheus metrics.
ASM exposes two LLM-specific metrics:
| Metric | Description |
|---|---|
asm_llm_proxy_prompt_tokens | Number of input tokens |
asm_llm_proxy_completion_tokens | Number of output tokens |
These metrics include four default dimensions:
| Dimension | Description |
|---|---|
llmproxy_source_workload | Workload that initiated the request |
llmproxy_source_workload_namespace | Namespace of the source workload |
llmproxy_destination_service | Destination LLM service |
llmproxy_model | Model used for the request |
Configure the sidecar to emit metrics
This example uses the sleep Deployment in the default namespace.
Create a file named
asm-llm-proxy-bootstrap-config.yamlwith the following content:apiVersion: v1 kind: ConfigMap metadata: name: asm-llm-proxy-bootstrap-config data: custom_bootstrap.json: | "stats_config": { "stats_tags":[ { "tag_name": "llmproxy_source_workload", "regex": "(\\|llmproxy_source_workload=([^|]*))" }, { "tag_name": "llmproxy_source_workload_namespace", "regex": "(\\|llmproxy_source_workload_namespace=([^|]*))" }, { "tag_name": "llmproxy_destination_service", "regex": "(\\|llmproxy_destination_service=([^|]*))" }, { "tag_name": "llmproxy_model", "regex": "(\\|llmproxy_model=([^|]*))" } ] }Apply the ConfigMap:
kubectl apply -f asm-llm-proxy-bootstrap-config.yamlAdd the bootstrap override annotation to the Deployment. This tells the sidecar to load the custom stats configuration:
kubectl patch deployment sleep -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.istio.io/bootstrapOverride":"asm-llm-proxy-bootstrap-config"}}}}}'
Verify token metrics
Send test requests using the commands from the previous section.
Query the sidecar's Prometheus endpoint: Expected output: Each metric line shows the token count broken down by source workload, destination service, and model.
kubectl exec deployments/sleep -it -c istio-proxy -- curl localhost:15090/stats/prometheus | grep llmproxyasm_llm_proxy_completion_tokens{llmproxy_source_workload="sleep",llmproxy_source_workload_namespace="default",llmproxy_destination_service="dashscope.aliyuncs.com",llmproxy_model="qwen1.5-72b-chat"} 72 asm_llm_proxy_completion_tokens{llmproxy_source_workload="sleep",llmproxy_source_workload_namespace="default",llmproxy_destination_service="dashscope.aliyuncs.com",llmproxy_model="qwen-turbo"} 85 asm_llm_proxy_prompt_tokens{llmproxy_source_workload="sleep",llmproxy_source_workload_namespace="default",llmproxy_destination_service="dashscope.aliyuncs.com",llmproxy_model="qwen1.5-72b-chat"} 3 asm_llm_proxy_prompt_tokens{llmproxy_source_workload="sleep",llmproxy_source_workload_namespace="default",llmproxy_destination_service="dashscope.aliyuncs.com",llmproxy_model="qwen-turbo"} 11
Forward metrics to Managed Service for Prometheus
ASM integrates with Application Real-Time Monitoring Service (ARMS) for Prometheus-based metric collection. After you configure collection rules, you can build Grafana dashboards and set up alerting rules based on these LLM metrics.
For setup instructions, see Collect metrics to Managed Service for Prometheus.
Add LLM dimensions to native Istio metrics
ASM natively provides Istio standard metrics such as istio_requests_total, which track HTTP and TCP traffic with dimensions like source workload, destination service, and response code. ASM has developed a Prometheus dashboard utilizing these metrics and dimensions. By default, these metrics do not include LLM-specific information.
To enable per-model analysis on native metrics, add a custom model dimension that extracts the model name from LLM requests.
Configure the model dimension
This example adds the model dimension to the REQUEST_COUNT metric.
Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.
On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Observability Management Center > Observability Settings.
Select REQUEST_COUNT and click Edit Dimension. On the Custom Dimension tab, enter the following values:
Dimension Name:
modelValue:
filter_state["wasm.asm.llmproxy.request_model"]


Verify the custom dimension
Send test requests using the commands from the access log section.
Query the sidecar's Prometheus endpoint: Expected output: The
modeldimension now appears inistio_requests_total, enabling per-model queries on native Istio metrics.kubectl exec deployments/sleep -it -c istio-proxy -- curl localhost:15090/stats/prometheus | grep llmproxyistio_requests_total{reporter="source",source_workload="sleep",source_canonical_service="sleep",source_canonical_revision="latest",source_workload_namespace="default",source_principal="unknown",source_app="sleep",source_version="",source_cluster="cce8d2c1d1e8d4abc8d5c180d160669cc",destination_workload="unknown",destination_workload_namespace="unknown",destination_principal="unknown",destination_app="unknown",destination_version="unknown",destination_service="dashscope.aliyuncs.com",destination_canonical_service="unknown",destination_canonical_revision="latest",destination_service_name="dashscope.aliyuncs.com",destination_service_namespace="unknown",destination_cluster="unknown",request_protocol="http",response_code="200",grpc_response_status="",response_flags="-",connection_security_policy="unknown",model="qwen1.5-72b-chat"} 1 istio_requests_total{reporter="source",source_workload="sleep",source_canonical_service="sleep",source_canonical_revision="latest",source_workload_namespace="default",source_principal="unknown",source_app="sleep",source_version="",source_cluster="cce8d2c1d1e8d4abc8d5c180d160669cc",destination_workload="unknown",destination_workload_namespace="unknown",destination_principal="unknown",destination_app="unknown",destination_version="unknown",destination_service="dashscope.aliyuncs.com",destination_canonical_service="unknown",destination_canonical_revision="latest",destination_service_name="dashscope.aliyuncs.com",destination_service_namespace="unknown",destination_cluster="unknown",request_protocol="http",response_code="200",grpc_response_status="",response_flags="-",connection_security_policy="unknown",model="qwen-turbo"} 1
Example analysis queries
With the model dimension on istio_requests_total, set up analysis rules in Application Real-Time Monitoring Service (ARMS). For example:
Success rate by model: Compare
response_code="200"counts against total counts, grouped bymodel.Latency by model or provider: Add the same
modeldimension to latency metrics to track average response times per model.
What's next
Enable data plane log collection -- Set up centralized log collection to Simple Log Service (SLS) for alerting and dashboards
Collect metrics to Managed Service for Prometheus -- Configure ARMS to scrape and store the LLM metrics for long-term analysis