All Products
Search
Document Center

Application Real-Time Monitoring Service:LLM metrics

Last Updated:Mar 28, 2025

This topic describes the mainstream large language model (LLM) metrics which you can utilize to customize Grafana dashboards.

Common labels

Dimension description

Dimension key

Example

Service name

service

llm-rag-demo

Service PID

pid

ggxw4lnjuz@0cb8619bb54****

Server IP address

serverIp

127.0.0.1

Interface

rpc

query

Application source

source

  • xtrace: indicates that the application is monitored in Managed Service for OpenTelemetry.

  • apm: indicates that the application is monitored in Application Real-Time Monitoring Service (ARMS).

Request metrics

Note

By design, the request metrics cover the protocols and invocation types supported by instrumentation, such as provided and dependent services. For more information, see Application monitoring metrics.

Metric description

Metric name

Measurement

Collection interval (Unit: seconds)

Unit

Dimension

Total of requests

arms_$callType_requests_count

Gauge

15

None

Different dimensions are applicable to different service access types. For more information, see Application monitoring metrics.

Number of error requests

arms_$callType_requests_error_count

Gauge

15

None

Total request duration

arms_$callType_requests_seconds

Gauge

15

Seconds

Number of slow requests

arms_$callType_requests_slow_count

Gauge

15

None

LLM metrics

In addition to the common labels, the following labels may also be used: modelName, spanKind, usageType.

Dimension description

Dimension key

Example

Remarks

Model name

modelName

  • gpt-4

  • text-davinci-003

None

Operation type

spanKind

LLM, CHAIN, or EMBEDDING

For more information, see Trace fields for LLM applications.

None

Usage type

usageType

  • input

  • output

Available only to token-related metrics

Operation types

Metric description

Metric name

Measurement

Collection interval (Unit: minutes)

Unit

Dimension

Number of requests for invoking a LLM

genai_calls_count

Gauge

1

None

  • modelName

  • spanKind

Response duration for invoking a LLM

genai_calls_duration_seconds

Gauge

1

Seconds

  • modelName

  • spanKind

Number of LLM invoking errors

genai_calls_error_count

Gauge

1

None

  • modelName

  • spanKind

Number of slow LLM invocations

genai_calls_slow_count

Gauge

1

None

  • modelName

  • spanKind

LLM performance

Metric description

Metric name

Measurement

Collection interval (Unit: minutes)

Unit

Dimension

Time to first token (TTFT) for an LLM

genai_llm_first_token_seconds

Gauge

1

Seconds

  • modelName

  • spanKind

LLM usage

Metric description

Metric name

Measurement

Collection interval (Unit: minutes)

Unit

Dimension

Count of used tokens

genai_llm_usage_tokens

Gauge

1

None

  • modelName

  • spanKind

  • usageType

    • input

    • output