AI observability - API Gateway - Alibaba Cloud Documentation Center

AI Gateway provides basic AI observability features. You can view AI request and response information in the statistics and log modules of the AI Gateway console for monitoring, recording, and analysis.

Note

When processing exceptions, the Throttling, Caching policies expose exception logs, which lets you view the complete policy logs.

Procedure

Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.
In the navigation pane on the left, choose Model API, then click the target API name to go to the API Details page.
Click the Statistics tab. You can view the apig-ai-api-dashboard. The key metrics for AI observability include the following:
Important
AI Gateway uses Simple Log Service (SLS) to collect, analyze, and display logs. If you have not enabled gateway log delivery, click Enable Log Shipping to complete the configuration.
- QPS: The number of queries per second (QPS) for AI requests and responses. This includes the QPS for AI requests, streaming responses, and non-streaming responses.
- Request Success Rate: The success rate of AI requests. You can view statistics per second, per 15 seconds, or per minute.
- Tokens Consumed/s: The number of tokens consumed per second, including input tokens, output tokens, and total tokens.
- Average Request RT (ms): The average response time (RT) in milliseconds (ms) for AI requests over a specific period, such as per second, per 15 seconds, or per minute. This metric includes non-streaming RT, streaming RT (the total time for a streaming response), and first-packet streaming RT (the time to receive the first packet of a streaming response).
- Cache Hits And Misses/s: The number of cache hits and misses per second.
- Throttled Requests/s: The number of throttled requests and normal requests per second.
- Model Token Usage Statistics: The token consumption by different models over a specific period.
- Consumer Token Usage Statistics: The token consumption by different consumers over a specific period.
- Threat Type Statistics: The threats detected by Content Moderation, categorized by dimensions such as threat type and consumer.
- Risky Consumer Statistics: The consumer threats detected by consumer authentication.
- Throttled Consumer Statistics: The consumer threats detected by throttling.
Click the Logs tab. You can use SQL to perform statistical analysis on the query results. For more information, see Quick start for query and analysis.