All Products
Search
Document Center

Application Real-Time Monitoring Service:Overview

Last Updated:Mar 13, 2025

After installing an ARMS agent for Python for a Large Language Model (LLM) application, Application Real-Time Monitoring Service (ARMS) can start monitoring the application. You can view information such as the number of LLM invocations, token usage, trace count, and session count on the Overview tab of the application details page.

Prerequisites

An ARMS agent has been installed for the LLM application. For more information, see Monitor LLM applications in ARMS.

Go to the Overview tab

  1. Log on to the ARMS console. In the left-side navigation pane, choose LLM Application Monitoring > Application List.

  2. On the page that appears, select a region in the top navigation bar and click the application that you want to manage.

  3. In the top navigation bar, click the Overview tab.

Dashboard

image

Panel

Description

Number of model calls

The number of times the application invoked LLMs within a specified time period.

Token usage

The number of times the token of the application was used within a specified time period.

Trace Count

The number of traces generated by the application within a specified time period.

Span Count

The number of spans generated by the application within a specified time period.

Number of sessions

The number of sessions generated by the application within a specified time period.

Number of users

The number of users of the application within a specified time period.

Operation type distribution
  • CHAIN: a tool that connects LLMs and other multiple components to accomplish complex tasks, which may include Retrieval, Embedding, LLM invocation, and can even nest other Chains.

  • EMBEDDING: refers to embedding processing, such as operations for embedding texts into LLMs. It enables querying based on similarity and optimizes questions.

  • RETRIEVER: generally refers to accessing vector storage or databases to retrieve data, often used to supplement context in order to enhance the accuracy and efficiency of LLM responses.

  • RERANKER: involves ranking multiple input documents based on their relevance to the query content. It may return the top K documents as input to the LLM.

  • LLM: denotes the invocation of a LLM, for example, making requests through SDKs or OpenAPI Explorer to different LLMs for inference or text generation.

  • TOOL: refers to the invocation of external tools, such as calling a calculator or requesting the latest weather conditions via a weather API.

  • AGENT: an intelligent agent scenario, involving a complex Chain that requires decision-making for the next steps based on the inference results from LLMs. This might involve multiple invocations of LLMs and Tools to progressively arrive at a final answer.

  • TASK: denotes an internally customized method, for instance, invoking a local Function to apply custom logic.

Avg LLM call per request

The average number of LLM invocations per request within one minute.

Request Number Trend

The trend graph of the number of requests per minute for the application.

Model Call Ranking

The top 5 most frequently invoked LLMs of the application.

Number of Request User Ranking

The top 5 users who initiated the most requests in the application.

Session Number Trend

The trend graph of the session count per minute for the application.

References