Cloud-native API Gateway provides an AI observability plug-in to generate metrics and logs and perform tracing analysis. This plug-in works with the ai-proxy plug-in or specific custom settings.
Running attributes
Plug-in execution stage: default stage. Plug-in execution priority: 200.
Configuration description
By default, requests that are processed by the plug-in are API requests. You can use this plug-in to generate metrics and logs without additional configurations.
Metrics: Metrics, such as the input token, output token, response time (RT) of the first token for streaming requests, and total RT of requests, are supported. You can generate the metrics at the gateway, route, service, and model levels.
Logs: Fields, such as input_token, output_token, model, llm_service_duration, and llm_first_token_duration, are included in logs.
You can extend the observable items by configuring the attributes field. The following table describes the field:
Field | Data type | Required | Default value | Description |
| []Attribute | No | - | The information that you want to record in logs or spans. |
The following table describes the fields in the attributes field:
Field | Data type | Required | Default value | Description |
| string | Yes | - | The name of the attribute. |
| string | Yes | - | The source of the attribute value. Valid values: |
| string | Yes | - | The value of the attribute, which can be the key value or a path. |
| string | No | - | The rule for extracting attributes from streaming responses. Valid values: |
| bool | No | false | Specifies whether to log the extracted information. |
| bool | No | false | Specifies whether to record the extracted information in spans for tracing analysis. |
Valid values of value_source:
fixed_value: A fixed value is used.request_header: The attribute value is obtained from the HTTP request header. In this case, value is set to the header key of the request.request_body: The attribute value is obtained from the request body. In this case, value is specified in the JSONPath format of GJSON.response_header: The attribute value is obtained from the HTTP response header. In this case, value is set to the header key of the response.response_body: The attribute value is obtained from the response body. In this case, value is specified in the JSONPath format of GJSON.response_streaming_body: The attribute value is obtained from the streaming response body. In this case, value is specified in the JSONPath format of GJSON.
If you set the value_source parameter to response_streaming_body, you must configure the rule parameter to define how to obtain a value from the streaming response body. Valid values:
first: The value of the first valid chunk among multiple chunks is obtained.replace: The value of the last valid chunk among multiple chunks is obtained.append: Values of multiple valid chunks are concatenated. The concatenated value can be used to obtain the content of the answer.
Configuration examples
You want to record AI statistics in gateway access logs and add a field to log_format.
'{"ai_log":"%FILTER_STATE(wasm.ai_log:PLAIN)%"}'Empty configuration
Monitoring
route_upstream_model_metric_input_token{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 10
route_upstream_model_metric_llm_duration_count{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 1
route_upstream_model_metric_llm_first_token_duration{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 309
route_upstream_model_metric_llm_service_duration{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 1955
route_upstream_model_metric_output_token{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 69Log
{
"ai_log":"{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
}Tracing analysis
If the configuration is empty, no additional attributes are added to the span.
Extract token usage information when a non-OpenAI protocol is used
In this example, Alibaba Cloud Bailian is used and the protocol for the AI-Proxy plug-in is set to original. The following sample code is used to obtain the model, input token, and output token attributes.
attributes:
- key: model
value_source: response_body
value: usage.models.0.model_id
apply_to_log: true
apply_to_span: false
- key: input_token
value_source: response_body
value: usage.models.0.input_tokens
apply_to_log: true
apply_to_span: false
- key: output_token
value_source: response_body
value: usage.models.0.output_tokens
apply_to_log: true
apply_to_span: falseMonitoring
route_upstream_model_metric_input_token{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 343
route_upstream_model_metric_output_token{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 153
route_upstream_model_metric_llm_service_duration{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 3725
route_upstream_model_metric_llm_duration_count{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 1Log
The following sample code shows a log that is generated by the preceding configuration:
{
"ai_log": "{\"model\":\"qwen-max\",\"input_token\":\"343\",\"output_token\":\"153\",\"llm_service_duration\":\"19110\"}"
}Tracing analysis
The span for tracing analysis includes the model, input_token, and output_token attributes.
Record consumers for authentication
Example:
attributes:
- key: consumer # Record consumers for authentication.
value_source: request_header
value: x-mse-consumer
apply_to_log: trueRecord questions and answers
attributes:
- key: question # Record questions.
value_source: request_body
value: messages.@reverse.0.content
apply_to_log: true
-key: answer # Extract answers provided by the large language model (LLM) service from streaming responses.
value_source: response_streaming_body
value: choices.0.delta.content
rule: append
apply_to_log: true
- key: answer # Extract answers provided by the LLM service from non-streaming responses.
value_source: response_body
value: choices.0.message.content
apply_to_log: trueAdvanced configurations
You can use Simple Log Service to extract and process AI-related fields. The following sample code shows the original log:
ai_log:{"question":"What is 2 to the power of 3 in Python?","answer":"You can use the power operator of Python (**) to calculate the power of a number. You can use the following code to perform the calculation: \n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n. The result is 8, which is returned after you run the code.","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}You can run the following data processing script to extract the question and answer:
e_regex("ai_log", grok("%{EXTRACTJSON}"))
e_set("question", json_select(v("json"), "question", default="-"))
e_set("answer", json_select(v("json"), "answer", default="-"))The question and answer fields are added to the log. Example:
ai_log:{"question":"What is 2 to the power of 3 in Python?","answer":"You can use the power operator of Python (**) to calculate the power of a number. You can use the following code to perform the calculation: \n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n. The result is 8, which is returned after you run the code.","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}
question: What is 2 to the power of 3 in Python?
answer: You can use the power operator of Python (**) to calculate the power of a number. You can use the following code to perform the calculation:
result = 2 ** 3
print(result)
The result is 8, which is returned after you run the code.