All Products
Search
Document Center

Cloud Monitor:Model applications

Last Updated:Sep 29, 2025

Cloud Monitor 2.0 supports AI applications. You can view data for your model applications in AI Application Observability.

Try in Playground

Alibaba Cloud Playground provides a demo environment where you can experience the main features of Cloud Monitor 2.0.

  1. Visit the Playground Demo. You are directed to the o11y-demo-cn-hangzhou workspace by default.

  2. In the navigation bar, select AI Application Observability. Alternatively, in the Application Center, select AI Application Observability.

  3. In the AI Application Observability navigation bar, select Model Applications to view the list of model applications.

  4. Click the name of a target application in the list to view its details and topology.

Model applications

The Model Applications page in AI Application Observability displays a list of your model applications.

Query conditions

You can set query conditions to filter the information. By default, the page uses two conditions: `domain = apm` and `type = apm.service`. To filter for model applications, add the query condition `feature_genai = app`.

Page layout

The page displays the following columns:

  1. Application name: The name of the application. Click the name to go to the Application details page.

  2. Source: The source of the application, such as `apm` for application monitoring or `xtrace` for Tracing Analysis.

  3. Language: The programming language of the application, such as Python.

  4. Region: The region where the application is located.

  5. Requests: The number of application requests and a trend line.

  6. Faults: The number of application request faults and a trend line.

  7. Average latency: The average latency of application requests and a trend line.

Application details

Instance overview

On the Instance overview page, you can filter data by time. The page displays the following information:

  • Requests: The total number of requests, a day-over-day comparison, a trend chart of the request count, and a service ranking by request count.

  • Faults: The total number of faults, a day-over-day comparison, a trend chart of the fault count, the fault rate, and a service ranking by fault count.

  • Latency: The average latency, a day-over-day comparison, a latency trend chart, and a service ranking by average latency.

  • Instance count: The total number of instances and a day-over-day comparison.

  • CPU usage: A trend chart of peak CPU usage and an instance ranking by peak CPU usage.

Associated instances

On the Associated instances page, you can filter data by time. The page displays the following information:

  • Application: View the APIs that the application provides and the instances that support the application. You can click an API or instance to view its observable data.

  • Kubernetes: A list of associated clusters. You can click a target cluster to view its observable data.

  • Infrastructure: The associated infrastructure. You can click the target infrastructure to view its observable data.

  • Upstream/Downstream: A list of associated upstream and downstream services. You can click a service to view its observable data.

Associated topology

The following figure shows an example of the upstream and downstream network topology for the application.

Application overview

On the Application overview page, you can filter data by time. The page displays the following information:

  • Statistics: Model calls, token usage, trace count, span count, session count, user count, and user requests.

  • Charts: Operation type distribution, average request response trend for large models, request count trend, model call ranking, and session count trend.

Performance analysis

On the Performance analysis page, you can filter data by time. The page displays the following information:

  • Requests: The number of model calls, a day-over-day comparison, a time-series trend chart for model calls, and a ranking by the number of model calls.

  • Faults: The number of model call faults, a day-over-day comparison, a trend chart for model call faults, and a ranking by the number of model call faults.

  • Latency: The model call latency, a day-over-day comparison, the average model call latency trend, a ranking by average model latency, and the time to first packet for model calls.

Token analysis

On the Token analysis page, you can filter data by time. The page displays the following information:

  • Token usage: The total token usage, a day-over-day comparison, a trend chart for input/output consumption, and a ranking by token usage per model.

  • Average token usage per session: The average token usage per session, a day-over-day comparison, a trend chart for the average token usage per session, and a ranking by token usage per session.

  • Average token usage per request: The average token usage per request, a trend chart for the average token usage per request, and a ranking by token usage per user.

Operation analysis

Operation analysis includes four types of data: embedding analysis, retrieval augmentation, tool calling, and method invocation.

Embedding analysis:

  • Embedding requests: The number of embedding requests, a day-over-day comparison, a time-series trend chart for embedding requests, and a ranking by the number of embedding requests.

  • Embedding latency: The average latency, a day-over-day comparison, a latency trend chart, and a ranking by latency.

  • Embedding faults: The total number of embedding faults, a day-over-day comparison, a time-series trend chart for embedding faults, and a ranking by the number of embedding faults.

Retrieval augmentation:

  • Calls: The number of Retrieval/Rerank calls, a day-over-day comparison, and a trend chart for the number of Retrieval/Rerank calls.

  • Faults: The number of Retrieval/Rerank faults, a day-over-day comparison, and a trend chart for the number of Retrieval/Rerank faults.

  • Latency: The average latency of Retrieval/Rerank calls, a day-over-day comparison, and a trend chart for the average latency of Retrieval/Rerank calls.

Tool calling:

  • Calls: The number of tool calls, a day-over-day comparison, a trend chart for the number of calls, and a ranking by the number of calls.

  • Faults: The number of tool call faults, a day-over-day comparison, a trend chart for the number of call faults, and a ranking by the number of call faults.

  • Latency: The average latency, a day-over-day comparison, a latency trend chart, and a ranking by average latency.

Method invocation:

  • Calls: The number of calls, a day-over-day comparison, a trend chart for the number of calls, and a ranking by the number of calls.

  • Faults: The number of model call faults, a day-over-day comparison, a trend chart for the number of model call faults, and a ranking by the number of call faults.

  • Latency: The average latency, a day-over-day comparison, a latency trend chart, and a ranking by average latency.

Trace analysis

On the Trace analysis page, you can filter data by time. You can run query analysis statements or use quick filters to find specific data. The page displays the Span list, Trace list, scatter chart, trace aggregation, trace topology, and faulty/slow trace analysis. You can view details and raw logs for each trace. In the aggregation bar, you can select aggregation dimensions to filter the data that you need.