LLM application overview - Application Real-Time Monitoring Service

After installing an ARMS agent for Python for a Large Language Model (LLM) application, Application Real-Time Monitoring Service (ARMS) can start monitoring the application. You can view information such as the number of LLM invocations, token usage, trace count, and session count on the Overview tab of the application details page.

Prerequisites

An ARMS agent has been installed for the LLM application. For more information, see Monitor LLM applications in ARMS.

Go to the Overview tab

Log on to the ARMS console. In the left-side navigation pane, choose LLM Application Monitoring > Application List.
On the page that appears, select a region in the top navigation bar and click the application that you want to manage.
In the top navigation bar, click the Overview tab.

Dashboard

Panel	Description
Number of model calls	The number of times the application invoked LLMs within a specified time period.
Token usage	The number of times the token of the application was used within a specified time period.
Trace Count	The number of traces generated by the application within a specified time period.
Span Count	The number of spans generated by the application within a specified time period.
Number of sessions	The number of sessions generated by the application within a specified time period.
Number of users	The number of users of the application within a specified time period.
Operation type distribution	CHAIN: a tool that connects LLMs and other multiple components to accomplish complex tasks, which may include Retrieval, Embedding, LLM invocation, and can even nest other Chains. EMBEDDING: refers to embedding processing, such as operations for embedding texts into LLMs. It enables querying based on similarity and optimizes questions. RETRIEVER: generally refers to accessing vector storage or databases to retrieve data, often used to supplement context in order to enhance the accuracy and efficiency of LLM responses. RERANKER: involves ranking multiple input documents based on their relevance to the query content. It may return the top K documents as input to the LLM. LLM: denotes the invocation of a LLM, for example, making requests through SDKs or OpenAPI Explorer to different LLMs for inference or text generation. TOOL: refers to the invocation of external tools, such as calling a calculator or requesting the latest weather conditions via a weather API. AGENT: an intelligent agent scenario, involving a complex Chain that requires decision-making for the next steps based on the inference results from LLMs. This might involve multiple invocations of LLMs and Tools to progressively arrive at a final answer. TASK: denotes an internally customized method, for instance, invoking a local Function to apply custom logic.
Avg LLM call per request	The average number of LLM invocations per request within one minute.
Request Number Trend	The trend graph of the number of requests per minute for the application.
Model Call Ranking	The top 5 most frequently invoked LLMs of the application.
Number of Request User Ranking	The top 5 users who initiated the most requests in the application.
Session Number Trend	The trend graph of the session count per minute for the application.

Application Real-Time Monitoring Service:Overview

Prerequisites

Go to the Overview tab

Dashboard

Number of model calls

Trace Count

Number of sessions

Number of users

Operation type distribution

Avg LLM call per request

Model Call Ranking

Number of Request User Ranking

Session Number Trend

References