LLM metrics

This topic describes the mainstream large language model (LLM) metrics which you can utilize to customize Grafana dashboards.

Common labels

Dimension description	Dimension key	Example
Service name	service	llm-rag-demo
Service PID	pid	ggxw4lnjuz@0cb8619bb54****
Server IP address	serverIp	127.0.0.1
Interface	rpc	query
Application source	source	xtrace: indicates that the application is monitored in Managed Service for OpenTelemetry. apm: indicates that the application is monitored in Application Real-Time Monitoring Service (ARMS).

Request metrics

Note

By design, the request metrics cover the protocols and invocation types supported by instrumentation, such as provided and dependent services. For more information, see Application monitoring metrics.

Metric description	Metric name	Measurement	Collection interval (Unit: seconds)	Unit	Dimension
Total of requests	arms_$callType_requests_count	Gauge	15	None	Different dimensions are applicable to different service access types. For more information, see Application monitoring metrics.
Number of error requests	arms_$callType_requests_error_count	Gauge	15	None
Total request duration	arms_$callType_requests_seconds	Gauge	15	Seconds
Number of slow requests	arms_$callType_requests_slow_count	Gauge	15	None

In addition to the common labels, the following labels may also be used: modelName, spanKind, usageType.

Dimension description	Dimension key	Example	Remarks
Model name	modelName	gpt-4 text-davinci-003	None
Operation type	spanKind	LLM, CHAIN, or EMBEDDING For more information, see Trace fields for LLM applications.	None
Usage type	usageType	input output	Available only to token-related metrics

Operation types

Metric description	Metric name	Measurement	Collection interval (Unit: minutes)	Unit	Dimension
Number of requests for invoking a LLM	genai_calls_count	Gauge	1	None	modelName spanKind
Response duration for invoking a LLM	genai_calls_duration_seconds	Gauge	1	Seconds	modelName spanKind
Number of LLM invoking errors	genai_calls_error_count	Gauge	1	None	modelName spanKind
Number of slow LLM invocations	genai_calls_slow_count	Gauge	1	None	modelName spanKind

LLM performance

Metric description	Metric name	Measurement	Collection interval (Unit: minutes)	Unit	Dimension
Time to first token (TTFT) for an LLM	genai_llm_first_token_seconds	Gauge	1	Seconds	modelName spanKind

LLM usage

Metric description	Metric name	Measurement	Collection interval (Unit: minutes)	Unit	Dimension
Count of used tokens	genai_llm_usage_tokens	Gauge	1	None	modelName spanKind usageType input output