All Products
Search
Document Center

Application Real-Time Monitoring Service:Create alerts for a business trace

Last Updated:May 19, 2025

By default, business trace data sources of Application Real-Time Monitoring Service (ARMS) are integrated with Managed Service for Prometheus, which provides the Prometheus Query Language (PromQL) to help you customize alert settings. This topic provides a set of alert metrics and sample PromQL statements to meet your requirements for O&M and emergency response.

Note

The data sources are stored in the Managed Service for Prometheus instance metricstore-apm-metrics-custom_<regionId>_default-cms-<userId>-<region>.

Example

Take e-commerce ordering as an example: Configure a VIP order business trace and configure alerts for the number of order requests from VIP customers. If the number of order requests exceeds the threshold, an alert is triggered.

image

Configure an alert rule

Use a custom PromQL statement to create an alert rule:

sum by ()(sum_over_time_lorc(arms_biz_app_requests_count_raw{_biz_code="vip_order",serverIp=~".*",callKind=~"http|rpc|custom_entry|server|consumer|schedule",pid="awy7aw18hz@dd0231c44bd35bf",rpc=~"/api/v1/biz/order",source="apm",}[1m]))>10

image

View alert notifications

In the left-side navigation pane of the ARMS console, choose Alert Management > Alert Event History and view the alert events.

Common PromQL templates

$dims and $filters are used for grouping and filtering, respectively:

Metric

Description

Number of calls

The number of calls to your application's entry point. This metric indicates traffic volume health-identifying abnormal spikes (overload) or drops (service degradation). Protocol-specific tags, such as HTTP and Dubbo, enable granular analysis by request type.

sum by ($dims) (sum_over_time_lorc(arms_biz_app_requests_count_raw{$filters}[1m]))

Call error rate (%)

The number of errors divided by the total calls in your application's entry point.

sum by ($dims)(sum_over_time_lorc(arms_biz_app_requests_error_count_raw{$filters}[1m])) / sum by ($dims)(sum_over_time_lorc(arms_biz_app_requests_count_raw{$filters}[1m]))

Call response time

The response time of calls to your application's entry point. This metric identifies slow requests.

sum by ($dims) (sum_over_time_lorc(arms_biz_app_requests_seconds_raw{$filters}[1m])) / sum by ($dims) (sum_over_time_lorc(arms_biz_app_requests_count_raw{$filters}[1m]))

Number of exceptions

The number of exceptions that occurred during software runtime, such as null pointer exceptions, Array Index Out Of Bounds exceptions, and I/O exceptions. This metric indicates whether the call stack throws an error.

sum by ($dims) (sum_over_time_lorc(arms_biz_exception_requests_count_raw{$filters}[1m]))

Database request response time

The duration between an application initiating a database query and receiving the response. High latency impacts application performance, causing user-perceived delays or timeouts.

sum by ($dims) (sum_over_time_lorc(arms_biz_db_requests_seconds_raw{$filters}[1m])) / sum by ($dims) (sum_over_time_lorc(arms_biz_db_requests_count_raw{$filters}[1m]))

To configure alerts for the error rate of all HTTP operations of a specific business, use the following PromQL statement based on the template:

sum by ()(sum_over_time_lorc(arms_biz_app_requests_error_count_raw{_biz_code="xxxxx",callType="http"}[1m])) / sum by ()(sum_over_time_lorc(arms_biz_app_requests_count_raw{_biz_code="xxxxx",callType="http"}[1m]))

You can also modify the statement to group alerts by instance IP address:

sum by (serverIp)(sum_over_time_lorc(arms_biz_app_requests_error_count_raw{_biz_code="xxxxx",callType="http"}[1m])) / sum by (serverIp)(sum_over_time_lorc(arms_biz_app_requests_count_raw{_biz_code="xxxxx",callType="http"}[1m]))

Available labels - common

Dimension

Description

_biz_code

The business label that identifies a fixed business trace.

pid

The application process ID (PID).

service

The application name.

serverIp

The instance IP address.

rpc

The operation name.

callType

The API operation:

  • Provided services: HTTP, Dubbo, HSF, DSF, user_method, MQ, Kafka, gRPC, Thrift, and Sofa.

  • Dependent services: HTTP_Client, Dubbo_Client, HSF_Client, DSF_Client, Notify_Client, gRPC_Client, Thrift_Client, Sofa_Client, MQ_Client, and Kafka_Client.

  • Databases: MySQL, Oracle, Mariadb, PostgresQL, PPAS, SQLServer, MongoDB, and DmDB.

callKind

The main category of the API operation:

  • Provided services: HTTP, RPC, Custom_Entry, Server, and Consumer.

  • Dependent services: HTTP_Client, RPC_Client, Client, and Producer.

  • Databases: SQL, DB, NoSQL, and Cache.

Exclusive label arms_biz_exception_requests_count_raw

Dimension

Description

excepName

The exception name.

Exclusive label arms_biz_db_requests_seconds_raw

Dimension

Description

destId

The database name.