Cloud Monitor 2.0 supports AI applications. You can view data for your model applications in AI Application Observability.
Try in Playground
Alibaba Cloud Playground provides a demo environment where you can experience the main features of Cloud Monitor 2.0.
Visit the Playground Demo. You are directed to the
o11y-demo-cn-hangzhouworkspace by default.In the navigation bar, select AI Application Observability. Alternatively, in the Application Center, select AI Application Observability.
In the AI Application Observability navigation bar, select Model Applications to view the list of model applications.
Click the name of a target application in the list to view its details and topology.
Model applications
The Model Applications page in AI Application Observability displays a list of your model applications.
Query conditions
You can set query conditions to filter the information. By default, the page uses two conditions: `domain = apm` and `type = apm.service`. To filter for model applications, add the query condition `feature_genai = app`.
Page layout
The page displays the following columns:
Application name: The name of the application. Click the name to go to the Application details page.
Source: The source of the application, such as `apm` for application monitoring or `xtrace` for Tracing Analysis.
Language: The programming language of the application, such as Python.
Region: The region where the application is located.
Requests: The number of application requests and a trend line.
Faults: The number of application request faults and a trend line.
Average latency: The average latency of application requests and a trend line.
Application details
Instance overview
On the Instance overview page, you can filter data by time. The page displays the following information:
Requests: The total number of requests, a day-over-day comparison, a trend chart of the request count, and a service ranking by request count.
Faults: The total number of faults, a day-over-day comparison, a trend chart of the fault count, the fault rate, and a service ranking by fault count.
Latency: The average latency, a day-over-day comparison, a latency trend chart, and a service ranking by average latency.
Instance count: The total number of instances and a day-over-day comparison.
CPU usage: A trend chart of peak CPU usage and an instance ranking by peak CPU usage.
Associated instances
On the Associated instances page, you can filter data by time. The page displays the following information:
Application: View the APIs that the application provides and the instances that support the application. You can click an API or instance to view its observable data.
Kubernetes: A list of associated clusters. You can click a target cluster to view its observable data.
Infrastructure: The associated infrastructure. You can click the target infrastructure to view its observable data.
Upstream/Downstream: A list of associated upstream and downstream services. You can click a service to view its observable data.
Associated topology
The following figure shows an example of the upstream and downstream network topology for the application.
Application overview
On the Application overview page, you can filter data by time. The page displays the following information:
Statistics: Model calls, token usage, trace count, span count, session count, user count, and user requests.
Charts: Operation type distribution, average request response trend for large models, request count trend, model call ranking, and session count trend.
Performance analysis
On the Performance analysis page, you can filter data by time. The page displays the following information:
Requests: The number of model calls, a day-over-day comparison, a time-series trend chart for model calls, and a ranking by the number of model calls.
Faults: The number of model call faults, a day-over-day comparison, a trend chart for model call faults, and a ranking by the number of model call faults.
Latency: The model call latency, a day-over-day comparison, the average model call latency trend, a ranking by average model latency, and the time to first packet for model calls.
Token analysis
On the Token analysis page, you can filter data by time. The page displays the following information:
Token usage: The total token usage, a day-over-day comparison, a trend chart for input/output consumption, and a ranking by token usage per model.
Average token usage per session: The average token usage per session, a day-over-day comparison, a trend chart for the average token usage per session, and a ranking by token usage per session.
Average token usage per request: The average token usage per request, a trend chart for the average token usage per request, and a ranking by token usage per user.
Operation analysis
Operation analysis includes four types of data: embedding analysis, retrieval augmentation, tool calling, and method invocation.
Embedding analysis:
Embedding requests: The number of embedding requests, a day-over-day comparison, a time-series trend chart for embedding requests, and a ranking by the number of embedding requests.
Embedding latency: The average latency, a day-over-day comparison, a latency trend chart, and a ranking by latency.
Embedding faults: The total number of embedding faults, a day-over-day comparison, a time-series trend chart for embedding faults, and a ranking by the number of embedding faults.
Retrieval augmentation:
Calls: The number of Retrieval/Rerank calls, a day-over-day comparison, and a trend chart for the number of Retrieval/Rerank calls.
Faults: The number of Retrieval/Rerank faults, a day-over-day comparison, and a trend chart for the number of Retrieval/Rerank faults.
Latency: The average latency of Retrieval/Rerank calls, a day-over-day comparison, and a trend chart for the average latency of Retrieval/Rerank calls.
Tool calling:
Calls: The number of tool calls, a day-over-day comparison, a trend chart for the number of calls, and a ranking by the number of calls.
Faults: The number of tool call faults, a day-over-day comparison, a trend chart for the number of call faults, and a ranking by the number of call faults.
Latency: The average latency, a day-over-day comparison, a latency trend chart, and a ranking by average latency.
Method invocation:
Calls: The number of calls, a day-over-day comparison, a trend chart for the number of calls, and a ranking by the number of calls.
Faults: The number of model call faults, a day-over-day comparison, a trend chart for the number of model call faults, and a ranking by the number of call faults.
Latency: The average latency, a day-over-day comparison, a latency trend chart, and a ranking by average latency.
Trace analysis
On the Trace analysis page, you can filter data by time. You can run query analysis statements or use quick filters to find specific data. The page displays the Span list, Trace list, scatter chart, trace aggregation, trace topology, and faulty/slow trace analysis. You can view details and raw logs for each trace. In the aggregation bar, you can select aggregation dimensions to filter the data that you need.