All Products
Search
Document Center

Application Real-Time Monitoring Service:Create alerts for a business trace

Last Updated:Mar 11, 2026

Application Real-Time Monitoring Service (ARMS) stores business trace metrics in Managed Service for Prometheus, which lets you write PromQL-based alert rules for request count, error rate, response time, exception count, and database latency. When a metric crosses your threshold, ARMS triggers an alert and sends a notification through the channel you configure.

Business trace metrics are stored in the Managed Service for Prometheus instance metricstore-apm-metrics-custom_<regionId>_default-cms-<userId>-<region>.

How it works

ARMS collects metrics from each business trace and writes them to a Managed Service for Prometheus instance. You create an alert rule with a PromQL expression that targets a metric and threshold. When the condition is met, ARMS generates an alert event and delivers a notification.

Example scenario: An e-commerce platform defines a "VIP order" business trace. The operations team creates an alert that fires when VIP order requests exceed 10 per minute, enabling immediate response to abnormal traffic.

Business trace alert flow for a VIP order scenario

Create an alert rule

Prerequisites

Before you begin, make sure that you have:

  • An ARMS application with at least one business trace configured

  • Access to the ARMS console

Procedure

  1. Log on to the ARMS console.

  2. Use a custom PromQL statement to create an alert rule.

  3. In the PromQL expression field, enter a query that targets your business trace metric. The following example fires an alert when VIP order requests exceed 10 per minute:

    sum by ()(sum_over_time_lorc(arms_biz_app_requests_count_raw{_biz_code="vip_order",serverIp=~".*",callKind=~"http|rpc|custom_entry|server|consumer|schedule",pid="awy7aw18hz@dd0231c44bd35bf",rpc=~"/api/v1/biz/order",source="apm",}[1m]))>10
  4. Configure the alert threshold and notification channel.

    Alert rule configuration

View alert events

After a rule fires, view the alert event details:

  1. In the left-side navigation pane of the ARMS console, choose Alert Management > Alert Event History.

  2. Locate the alert event to review the metric value that triggered it.

PromQL query structure

All business trace alert queries follow this pattern:

sum by (<grouping-labels>)(sum_over_time_lorc(<metric>{<label-filters>}[1m]))
ComponentPurposeExample
sum by (<grouping-labels>)Groups results by one or more labels. Use empty parentheses () for a single aggregated value.sum by (serverIp) groups by instance IP.
sum_over_time_lorc(<metric>{...}[1m])Aggregates the raw metric over a 1-minute window.sum_over_time_lorc(arms_biz_app_requests_count_raw{...}[1m])
<label-filters>Filters the metric by label values such as _biz_code, callType, or rpc._biz_code="vip_order",callType="http"

How to replace template variables

When using the PromQL templates in the next section, replace the placeholder variables with actual values:

VariableReplace withExample
<grouping-labels>One or more label names to group by, or leave empty for a single valueserverIp or empty ()
<label-filters>Comma-separated label matchers_biz_code="vip_order",callType="http"

See Available labels for all supported label names and values.

PromQL templates

Request count

Tracks the number of calls to an application entry point. Use this to detect traffic spikes (overload) or drops (service degradation).

sum by (<grouping-labels>)(sum_over_time_lorc(arms_biz_app_requests_count_raw{<label-filters>}[1m]))

Error rate

Calculates the ratio of failed calls to total calls at an application entry point.

sum by (<grouping-labels>)(sum_over_time_lorc(arms_biz_app_requests_error_count_raw{<label-filters>}[1m])) / sum by (<grouping-labels>)(sum_over_time_lorc(arms_biz_app_requests_count_raw{<label-filters>}[1m]))

Response time

Measures the average response time per call. Use this to identify slow requests.

sum by (<grouping-labels>)(sum_over_time_lorc(arms_biz_app_requests_seconds_raw{<label-filters>}[1m])) / sum by (<grouping-labels>)(sum_over_time_lorc(arms_biz_app_requests_count_raw{<label-filters>}[1m]))

Exception count

Counts runtime exceptions such as null pointer exceptions, array index out-of-bounds exceptions, and I/O exceptions.

sum by (<grouping-labels>)(sum_over_time_lorc(arms_biz_exception_requests_count_raw{<label-filters>}[1m]))

Database response time

Measures the average time between a database query and its response. High latency here can cause user-facing delays or timeouts.

sum by (<grouping-labels>)(sum_over_time_lorc(arms_biz_db_requests_seconds_raw{<label-filters>}[1m])) / sum by (<grouping-labels>)(sum_over_time_lorc(arms_biz_db_requests_count_raw{<label-filters>}[1m]))

Example: HTTP error rate by instance

Alert on the error rate of all HTTP operations for a specific business trace, grouped by instance IP:

sum by (serverIp)(sum_over_time_lorc(arms_biz_app_requests_error_count_raw{_biz_code="xxxxx",callType="http"}[1m])) / sum by (serverIp)(sum_over_time_lorc(arms_biz_app_requests_count_raw{_biz_code="xxxxx",callType="http"}[1m]))

For a single aggregated error rate without per-instance breakdown, remove the grouping label:

sum by ()(sum_over_time_lorc(arms_biz_app_requests_error_count_raw{_biz_code="xxxxx",callType="http"}[1m])) / sum by ()(sum_over_time_lorc(arms_biz_app_requests_count_raw{_biz_code="xxxxx",callType="http"}[1m]))

Available labels

Common labels

These labels are available for filtering (<label-filters>) and grouping (<grouping-labels>).

LabelDescription
_biz_codeBusiness trace identifier.
pidApplication process ID (PID).
serviceApplication name.
serverIpInstance IP address.
rpcOperation name.
callTypeProtocol-level operation type. See callType values below.
callKindHigh-level operation category. See callKind values below.

callType values

CategoryValues
Provided servicesHTTP, Dubbo, HSF, DSF, user_method, MQ, Kafka, gRPC, Thrift, Sofa
Dependent servicesHTTP_Client, Dubbo_Client, HSF_Client, DSF_Client, Notify_Client, gRPC_Client, Thrift_Client, Sofa_Client, MQ_Client, Kafka_Client
DatabasesMySQL, Oracle, Mariadb, PostgresQL, PPAS, SQLServer, MongoDB, DmDB

callKind values

CategoryValues
Provided servicesHTTP, RPC, Custom_Entry, Server, Consumer
Dependent servicesHTTP_Client, RPC_Client, Client, Producer
DatabasesSQL, DB, NoSQL, Cache

Metric-specific labels

Some metrics support additional labels:

MetricLabelDescription
arms_biz_exception_requests_count_rawexcepNameException class name.
arms_biz_db_requests_seconds_rawdestIdDatabase name.