After installing an ARMS agent for Python for a Large Language Model (LLM) application, Application Real-Time Monitoring Service (ARMS) can start monitoring the application. You can view information such as the number of operations, duration, and error counts for Embedding, Retrieval & Reranking, Tool Invocation, and Task Invocation in your application on the LLM operation tab of the application details page.
Prerequisites
An ARMS agent has been installed for the LLM application. For more information, see Monitor LLM applications in ARMS.
Go to the LLM operation tab
Log on to the ARMS console. In the left-side navigation pane, choose .
On the page that appears, select a region in the top navigation bar and click the application that you want to manage.
In the top navigation bar, select a tab from the LLM operation dropdown list.
Embedding Analysis
In LLM applications, Embedding is a technique that converts text, images, or other types of data into low-dimensional vectors. These vectors capture the semantic information of the data and are used for tasks such as similarity calculation, retrieval, and classification.
Through Embedding Analysis, you can comprehensively monitor the performance, stability, and effectiveness of the Embedding functionality, providing data support for the optimization and maintenance of LLM applications.

Panel
Description
Number of Embedding
The total number of Embedding invocations within a specified time period.
Average Embedding Time
The average time consumed by all Embedding invocations within a specified time period.
Number of Embedding errors
The number of failed Embedding invocations within a specified time period.
Number of Embedding/1m
The total number of Embedding invocations per minute.
Embedding Time/1m
The average time consumed by all Embedding invocations per minute.
Embedding error/1m
The number of failed Embedding invocations per minute.
Number of Embedding (Top5)
Displays the top 5 Embedding functions or models with the highest invocation counts, sorted from highest to lowest based on the number of calls.
Embedding Time-consuming Ranking (Top5)
Displays the top 5 Embedding functions or models with the longest average time consumption, sorted from highest to lowest based on the average time consumed.
Embedding Error Ranking (Top5)
Displays the top 5 Embedding functions or models with the highest error rates, sorted from highest to lowest based on the number of invocation errors.
Search Enhancement
In LLM applications, Retrieval-Augmented Generation (RAG) is a technique that combines Retrieval and Reranking to enhance the relevance and accuracy of the content generated by LLMs.
By monitoring relevant metrics of Retrieval and Rerank, you can comprehensively evaluate the performance, stability, and effectiveness of the retrieval-enhanced functionality, providing data support for optimizing LLMs.

Panel
Description
Retrieval
Number of calls
The total number of Retrieval invocations within a specified time period.
Average call time
The average time consumed by all Retrieval operations within a specified time period.
Number of errors
The number of failed Retrieval invocations within a specified time period.
Number of calls/1m
The total number of Retrieval invocations per minute.
Call time/1m
The average time consumed by all Retrieval invocations per minute.
Number of errors/1m
The number of failed Retrieval invocations per minute.
Rerank
Number of calls
The total number of Rerank invocations within a specified time period.
Average call time
The average time consumed by all Rerank operations within a specified time period.
Number of errors
The number of failed Rerank invocations within a specified time period.
Number of calls/1m
The total number of Rerank invocations per minute.
Call time/1m
The average time consumed by all Rerank invocations per minute.
Number of errors/1m
The number of failed Rerank invocations per minute.
Tool Call
In LLM applications, Tool Invocation refers to the process where a LLM invokes external tools or APIs to accomplish specific functions while performing tasks. These tools can include calculators, database query interfaces, search engines, and translation services, serving to extend the capabilities of the LLM so it can handle more complex or specific tasks.
By monitoring the data related to tool invocations, you can comprehensively assess the interaction between the LLM application and external tools, providing data support for optimization and maintenance.

Panel
Description
Number of calls
The total number of tool invocations within a specified time period.
Average call time
The average time consumed by all tool invocations within a specified time period.
Number of errors
The number of failed tool invocations within a specified time period.
Number of calls/10m
The total number of tool invocations per minute.
Call time/10m
The average time consumed by all tool invocations per minute.
Call Error/10m
The number of failed tool invocations per minute.
Call ranking (Top5)
Displays the top 5 tools with the highest invocation counts, sorted from highest to lowest based on the number of calls.
Call Time Row (Top5)
Displays the top 5 tools with the longest average time consumption, sorted from highest to lowest based on the average time consumed.
Error ranking (Top5)
Displays the top 5 tools with the highest error rates, sorted from highest to lowest based on the number of invocation errors.
Method Calls
In LLM applications, Tasks refer to internal custom methods, such as invocations of local methods executed by the application or important task operations.
By monitoring the data related to task invocations, you can comprehensively evaluate the invocation status of internal methods within the LLM application, providing data support for optimization and maintenance. This allows for a deeper understanding of how effectively and efficiently these internal processes are working, helping to identify bottlenecks or areas for improvement.

Panel
Description
Number of calls
The total number of task invocations within a specified time period.
Average call time
The average time consumed by all task invocations within a specified time period.
Number of model call errors
The number of failed task invocations within a specified time period.
Number of calls/10m
The total number of task invocations per minute.
Call time/10m
The average time consumed by all task invocations per minute.
Model Call Error/10m
The number of failed task invocations per minute.
Call ranking
Displays the top 5 tasks with the highest invocation counts, sorted from highest to lowest based on the number of invocations.
Call Time Row (Top5)
Displays the top 5 tasks with the longest average processing time, sorted from highest to lowest based on the average time consumed.
Error ranking (Top5)
Displays the top 5 tasks with the highest error rates, sorted from highest to lowest based on the number of invocation errors.