CloudMonitor 2.0 supports connecting AI applications and allows you to view model application data in AI Application Observability.
Try in Playground
Alibaba Cloud Playground provides a demo environment where you can experience the main features of Cloud Monitor 2.0.
-
Go to the Playground Demo, which opens the default workspace.
-
On the navigation bar, select AI Application Observability or select AI Application Observability under All Features.
-
In the AI Application Observability navigation bar, select Model application to view a list of your model applications.
-
Click an application's name in the list to view its details and topology.
Model application
AI Application Observability > Model Application shows the list of model applications.
Query conditions
You can set query conditions to filter information. The page uses two default conditions: domain = apm and type = apm.service. To filter for model applications, we recommend adding the query condition feature_genai = app.
Page layout
The page includes the following main sections:
-
Application name: The name of the application. Click the name to go to the Application details page.
-
Source: The source of the application data. For example, 'apm' indicates application monitoring, and 'xtrace' indicates distributed tracing.
-
Language: The programming language of the application, such as Python.
-
Region: The region where the application is deployed.
-
Request count: The total number of requests for the application, shown with a trendline.
-
Error count: The number of failed requests for the application, shown with a trendline.
-
Average latency: The average latency of requests for the application, shown with a trendline.
Application details
Instance overview
On the Instance Overview page, you can filter data by time. The page includes:
-
Request count: Total requests, day-over-day comparison, request count trend chart, and a leaderboard of services ranked by request count.
-
Error count: Total errors, day-over-day comparison, error count trend chart, error rate, and a leaderboard of services ranked by error count.
-
Latency: Average latency, day-over-day comparison, latency trend chart, and a leaderboard of services ranked by average latency.
-
Instance count: Total number of instances and day-over-day comparison.
-
CPU usage: A trend chart for peak CPU usage and a leaderboard of instances ranked by peak CPU usage.
Associated instances
On the Associated Instances page, you can filter data by time. The page includes:
-
Application: View the interfaces provided by the application and the instances that support it. Click an interface or instance to see its observability data.
-
Kubernetes: A list of associated clusters. Click a cluster to view its observability data.
-
Infrastructure: The associated infrastructure. Click an infrastructure component to view its observability data.
-
Upstream/downstream: A list of associated upstream and downstream services. Click a service to view its observability data.
Associated topology
This page displays a network topology of the upstream and downstream services for the application. An example is shown below:
Application overview
On the Application Overview page, you can filter data by time. The page includes:
-
Count data: Statistics for model call count, token usage, trace count, span count, session count, user count, and user request count.
-
Chart data: An operation type distribution chart, an average LLM request response time trend chart, a request count trend chart, a model call leaderboard, and a session count trend chart.
Performance analysis
On the Performance Analysis page, you can filter data by time. The page includes:
-
Request count: The number of model calls, day-over-day comparison, a time-series trend chart for model calls, and a leaderboard of models ranked by call count.
-
Error count: The number of model call errors, day-over-day comparison, a trend chart for model call errors, and a leaderboard of models ranked by errors.
-
Latency: The model call latency, day-over-day comparison, a trend chart for average model call latency, a leaderboard of models ranked by average latency, and the time to first byte for model calls.
Token analysis
On the Token Analysis page, you can filter data by time. The page includes:
-
Token usage: Total tokens used, day-over-day comparison, a trend chart for input/output token consumption, and a leaderboard of models ranked by token usage.
-
Average token usage per session: The average number of tokens used per session, day-over-day comparison, a trend chart for average token usage per session, and a leaderboard of sessions ranked by token usage.
-
Average token usage per request: The average number of tokens used per request, a trend chart for average token usage per request, and a leaderboard of users ranked by token usage.
Operation analysis
Operation analysis includes four categories: embedding analysis, retrieval augmentation, tool call, and method call.
Embedding analysis:
-
Embedding request count: The number of embedding requests, day-over-day comparison, a time-series trend chart for embedding requests, and a leaderboard of embedding requests.
-
Embedding latency: Average latency, day-over-day comparison, a time-series trend chart for latency, and a latency leaderboard.
-
Embedding error count: The total number of embedding errors, day-over-day comparison, a time-series trend chart for embedding errors, and a leaderboard of embedding errors.
Retrieval augmentation:
-
Call count: The number of Retrieval/Rerank calls, a day-over-day comparison, and a trend chart for the call count.
-
Error count: The number of Retrieval/Rerank errors, a day-over-day comparison, and a trend chart for the error count.
-
Latency: The average latency of Retrieval/Rerank calls, a day-over-day comparison, and a trend chart for the average latency.
Tool call:
-
Call count: The number of tool calls, day-over-day comparison, a trend chart for the call count, and a call leaderboard.
-
Error count: The number of tool call errors, day-over-day comparison, a trend chart for error count, and an error count leaderboard.
-
Latency: Average latency, day-over-day comparison, a trend chart for latency, and an average latency leaderboard.
Method call:
-
Call count: The number of calls, day-over-day comparison, a trend chart for call volume, and a call leaderboard.
-
Error count: The number of method call errors, day-over-day comparison, a trend chart for method call errors, and an error count leaderboard.
-
Latency: Average latency, day-over-day comparison, a trend chart for latency, and an average latency leaderboard.
Trace analysis
On the Trace Analysis page, you can filter data by time. You can query data by using query and analysis statements or use quick filters to narrow the results. The page includes a span list, trace list, scatter plot, end-to-end aggregation, end-to-end topology, and analysis of slow or erroneous traces. You can also view details and raw logs. In the aggregation bar, you can also select aggregation dimensions to filter the data.
The top of the page displays a Call count bar chart, a Tokens bar chart, and an Average latency line chart. The quick filter panel on the left allows you to filter by Status, Latency, Application Name, span type (such as TASK, CHAIN, TOOL, and LLM), and Interface Name. The span list table includes columns such as Trace ID, Input/Output, Interface Name, span type, Latency, and Total tokens. The Slow trace analysis panel on the right displays the top spanName values and their contribution rankings.