After installing an ARMS agent for Python for a Large Language Model (LLM) application, Application Real-Time Monitoring Service (ARMS) can start monitoring the application. You can view the token usage on the Token analysis tab of the application details page.
In LLM applications, a token is the fundamental unit of text processing, representing the smallest semantic unit of LLM input and output. A token can be a word, a subword, or a character, depending on the tokenizer used by the LLM.
Prerequisites
An ARMS agent has been installed for the LLM application. For more information, see Monitor LLM applications in ARMS.
Go to the Token analysis tab
Log on to the ARMS console. In the left-side navigation pane, choose .
On the page that appears, select a region in the top navigation bar and click the application that you want to manage.
In the top navigation bar, click the Token analysis tab.

Panel
Description
Token usage
The total number of tokens consumed by all LLM invocations within a specified time period.
Avg tokens per LLM call
The average number of tokens consumed per LLM invocation.
Avg tokens per request
The average number of tokens consumed per user request.
Tokens Consumption/1m
The total number of tokens consumed by all LLM invocations per minute.
Avg tokens per LLM call/1m
The average number of tokens consumed per LLM invocation per minute.
Avg tokens per request/1m
The average number of tokens consumed per user request per minute.
Token Usage Model Ranking (Top5)
Displays the top 5 LLMs with the highest token consumption, sorted from high to low.
Token Use Session Ranking (Top5)
Displays the top 5 sessions with the highest token consumption, sorted from high to low.
Token users user ranking (Top5 )
Displays the top 5 users with the highest token consumption, sorted from high to low.