All Products
Search
Document Center

API Gateway:AI observability

Last Updated:May 22, 2025

Cloud-native API Gateway provides an AI observability plug-in to generate metrics and logs and perform tracing analysis. This plug-in works with the ai-proxy plug-in or specific custom settings.

Running attributes

Plug-in execution stage: default stage. Plug-in execution priority: 200.

Configuration description

By default, requests that are processed by the plug-in are API requests. You can use this plug-in to generate metrics and logs without additional configurations.

  • Metrics: Metrics, such as the input token, output token, response time (RT) of the first token for streaming requests, and total RT of requests, are supported. You can generate the metrics at the gateway, route, service, and model levels.

  • Logs: Fields, such as input_token, output_token, model, llm_service_duration, and llm_first_token_duration, are included in logs.

You can extend the observable items by configuring the attributes field. The following table describes the field:

Field

Data type

Required

Default value

Description

attributes

[]Attribute

No

-

The information that you want to record in logs or spans.

The following table describes the fields in the attributes field:

Field

Data type

Required

Default value

Description

key

string

Yes

-

The name of the attribute.

value_source

string

Yes

-

The source of the attribute value. Valid values: fixed_value, request_header, request_body, response_header, response_body, and response_streaming_body.

value

string

Yes

-

The value of the attribute, which can be the key value or a path.

rule

string

No

-

The rule for extracting attributes from streaming responses. Valid values: first, replace, and append.

apply_to_log

bool

No

false

Specifies whether to log the extracted information.

apply_to_span

bool

No

false

Specifies whether to record the extracted information in spans for tracing analysis.

Valid values of value_source:

  • fixed_value: A fixed value is used.

  • request_header: The attribute value is obtained from the HTTP request header. In this case, value is set to the header key of the request.

  • request_body: The attribute value is obtained from the request body. In this case, value is specified in the JSONPath format of GJSON.

  • response_header: The attribute value is obtained from the HTTP response header. In this case, value is set to the header key of the response.

  • response_body: The attribute value is obtained from the response body. In this case, value is specified in the JSONPath format of GJSON.

  • response_streaming_body: The attribute value is obtained from the streaming response body. In this case, value is specified in the JSONPath format of GJSON.

If you set the value_source parameter to response_streaming_body, you must configure the rule parameter to define how to obtain a value from the streaming response body. Valid values:

  • first: The value of the first valid chunk among multiple chunks is obtained.

  • replace: The value of the last valid chunk among multiple chunks is obtained.

  • append: Values of multiple valid chunks are concatenated. The concatenated value can be used to obtain the content of the answer.

Configuration examples

You want to record AI statistics in gateway access logs and add a field to log_format.

'{"ai_log":"%FILTER_STATE(wasm.ai_log:PLAIN)%"}'

Empty configuration

Monitoring

route_upstream_model_metric_input_token{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 10
route_upstream_model_metric_llm_duration_count{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 1
route_upstream_model_metric_llm_first_token_duration{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 309
route_upstream_model_metric_llm_service_duration{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 1955
route_upstream_model_metric_output_token{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 69

Log

{
  "ai_log":"{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
}

Tracing analysis

If the configuration is empty, no additional attributes are added to the span.

Extract token usage information when a non-OpenAI protocol is used

In this example, Alibaba Cloud Bailian is used and the protocol for the AI-Proxy plug-in is set to original. The following sample code is used to obtain the model, input token, and output token attributes.

attributes:
  - key: model
    value_source: response_body
    value: usage.models.0.model_id
    apply_to_log: true
    apply_to_span: false
  - key: input_token
    value_source: response_body
    value: usage.models.0.input_tokens
    apply_to_log: true
    apply_to_span: false
  - key: output_token
    value_source: response_body
    value: usage.models.0.output_tokens
    apply_to_log: true
    apply_to_span: false

Monitoring

route_upstream_model_metric_input_token{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 343
route_upstream_model_metric_output_token{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 153
route_upstream_model_metric_llm_service_duration{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 3725
route_upstream_model_metric_llm_duration_count{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 1

Log

The following sample code shows a log that is generated by the preceding configuration:

{
  "ai_log": "{\"model\":\"qwen-max\",\"input_token\":\"343\",\"output_token\":\"153\",\"llm_service_duration\":\"19110\"}"  
}

Tracing analysis

The span for tracing analysis includes the model, input_token, and output_token attributes.

Record consumers for authentication

Example:

attributes:
  - key: consumer # Record consumers for authentication.
    value_source: request_header
    value: x-mse-consumer
    apply_to_log: true

Record questions and answers

attributes:
  - key: question # Record questions.
    value_source: request_body
    value: messages.@reverse.0.content
    apply_to_log: true
  -key: answer # Extract answers provided by the large language model (LLM) service from streaming responses.
    value_source: response_streaming_body
    value: choices.0.delta.content
    rule: append
    apply_to_log: true
  - key: answer   # Extract answers provided by the LLM service from non-streaming responses.
    value_source: response_body
    value: choices.0.message.content
    apply_to_log: true

Advanced configurations

You can use Simple Log Service to extract and process AI-related fields. The following sample code shows the original log:

ai_log:{"question":"What is 2 to the power of 3 in Python?","answer":"You can use the power operator of Python (**) to calculate the power of a number. You can use the following code to perform the calculation: \n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n. The result is 8, which is returned after you run the code.","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}

You can run the following data processing script to extract the question and answer:

e_regex("ai_log", grok("%{EXTRACTJSON}"))
e_set("question", json_select(v("json"), "question", default="-"))
e_set("answer", json_select(v("json"), "answer", default="-"))

The question and answer fields are added to the log. Example:

ai_log:{"question":"What is 2 to the power of 3 in Python?","answer":"You can use the power operator of Python (**) to calculate the power of a number. You can use the following code to perform the calculation: \n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n. The result is 8, which is returned after you run the code.","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}

question: What is 2 to the power of 3 in Python?

answer: You can use the power operator of Python (**) to calculate the power of a number. You can use the following code to perform the calculation:

result = 2 ** 3
print(result)

The result is 8, which is returned after you run the code.