This topic describes common metrics in ARMS application monitoring. You can use these metrics to customize Grafana dashboards.
Applications connected through Observable Link Opentelemetry version only support displaying/using business metrics. Other metrics (such as JVM metrics and system metrics) are not supported.
Business metrics
Common dimensions
Dimension name | Dimension key |
Service name | service |
Service PID | pid |
Machine IP | serverIp |
Interface | rpc |
Metrics list
All access types include the following metrics. When executing queries, you only need to replace $callType with the specific access type. For detailed access types, see Service access types and available dimensions.
For example, to query the number of requests for HTTP services, you only need to modify arms_$callType_requests_count to arms_http_requests_count.
Metric name | Metric | Metric type | Collection interval | Unit | Dimension |
Request count | arms_$callType_requests_count | Gauge | 15s | None | Different service access types have different dimensions. For detailed information, see Service access types and available dimensions. |
Error request count | arms_$callType_requests_error_count | Gauge | 15s | None | |
Request duration | arms_$callType_requests_seconds | Gauge | 15s | Seconds | |
Slow request count | arms_$callType_requests_slow_count | Gauge | 15s | None | |
Request latency percentiles | arms_$callType_requests_latency_seconds | Summary | 15s | Seconds | Only exists when the service access type is HTTP and percentile statistics are enabled. For operations to enable percentile statistics, see Advanced settings. Quantile percentile dimensions:
|
Except for percentile metrics, all the above metrics are Gauge type, meaning each point's value represents the cumulative total within the collection interval. This differs from metrics generated by open-source frameworks. For example, to calculate the average QPS for 1 minute, the promQL using ARMS metrics should be written as sum_over_time(arms_$callType_requests_count[1m])/60, while open-source frameworks typically use rate(http_server_requests_count[1m]).
Aggregated business metrics
Business metrics create separate metrics for each different call type, which leads to particularly lengthy PromQL when a single application contains multiple call types (for example, when application A contains both HTTP and Dubbo call types).
Business metrics record complete observation dimensions, but not all observation dimensions are needed in some statistical scenarios, which causes direct queries of business metrics to perform poorly in some scenarios.
To solve these two problems, ARMS has created aggregated business metrics.
Metrics description
Aggregated business metrics are divided into:
General type
Records the number of requests, errors, slow requests, and average request duration for all access types.
Database type
Records the number of requests, errors, slow requests, and average request duration for database access types.
SQL type
Records the number of requests, errors, slow requests, and average request duration for database access types, adding SQL dimension compared to database type.
Exception type
Records the number of requests and average request duration when exceptions occur for all access types.
Status code type
Records the number of requests for different status codes in HTTP scenarios.
Percentile type
Records request duration percentiles for all access types.
Except for percentile type, each category contains two types of metrics: one is the regular aggregated metric with naming format xxx_raw; the other is the dimension-reduced metric with naming format xxx_ign_x_y (x, y are the aggregated dimensions, meaning the metric does not include x, y dimensions).
Metric types and collection intervals
Unless otherwise specified, all aggregated business metrics are Gauge type with a collection interval of 15 seconds.
Common dimensions
The following dimensions exist in each aggregated metric and are explained uniformly below:
Dimension | Description |
pid | Application PID |
service | Application name |
serverIp | Instance IP |
source | Metric source:
|
Metrics list
Metric category | Metric name | Metric | Unit | Other dimensions |
General type | Request count | arms_app_requests_count_raw | Unit |
|
arms_app_requests_count_ign_destid_endpoint_rpc | Count | Does not include destId, endpoint, rpc dimensions. | ||
arms_app_requests_count_ign_destid_endpoint_ppid_prpc | unit | Does not include destId, endpoint, ppid, prpc dimensions. | ||
arms_app_requests_count_ign_destid_endpoint_ppid_prpc_rpc | Count | Does not include destId, endpoint, ppid, prpc, rpc dimensions. | ||
arms_app_requests_count_ign_parent_ppid_prpc_rpc | Count | Does not include parent, ppid, prpc, rpc dimensions. | ||
arms_app_requests_count_ign_endpoint_parent_ppid_prpc_rpc | item | Does not include endpoint, parent, ppid, prpc, rpc dimensions. | ||
Error request count | arms_app_requests_error_count_raw | Count |
| |
arms_app_requests_error_count_ign_destid_endpoint_rpc | Count | Does not include destId, endpoint, rpc dimensions. | ||
arms_app_requests_error_count_ign_destid_endpoint_ppid_prpc | Count | Does not include destId, endpoint, ppid, prpc dimensions. | ||
arms_app_requests_error_count_ign_destid_endpoint_ppid_prpc_rpc | Count | Does not include destId, endpoint, ppid, prpc, rpc dimensions. | ||
arms_app_requests_error_count_ign_parent_ppid_prpc_rpc | Count | Does not include parent, ppid, prpc, rpc dimensions. | ||
arms_app_requests_error_count_ign_endpoint_parent_ppid_prpc_rpc | Count | Does not include endpoint, parent, ppid, prpc, rpc dimensions. | ||
Slow request count | arms_app_requests_slow_count_raw | Count |
| |
arms_app_requests_slow_count_ign_destid_endpoint_rpc | Count | Does not include destId, endpoint, rpc dimensions. | ||
arms_app_requests_slow_count_ign_destid_endpoint_ppid_prpc | Count | Does not include destId, endpoint, ppid, prpc dimensions. | ||
arms_app_requests_slow_count_ign_destid_endpoint_ppid_prpc_rpc | Count | Does not include destId, endpoint, ppid, prpc, rpc dimensions. | ||
arms_app_requests_slow_count_ign_parent_ppid_prpc_rpc | Count | Does not include parent, ppid, prpc, rpc dimensions. | ||
arms_app_requests_slow_count_ign_endpoint_parent_ppid_prpc_rpc | Count | Does not include endpoint, parent, ppid, prpc, rpc dimensions. | ||
Request duration | arms_app_requests_seconds_raw | Seconds |
| |
arms_app_requests_seconds_ign_destid_endpoint_rpc | Seconds | Does not include destId, endpoint, rpc dimensions. | ||
arms_app_requests_seconds_ign_destid_endpoint_ppid_prpc | Seconds | Does not include destId, endpoint, ppid, prpc dimensions. | ||
arms_app_requests_seconds_ign_destid_endpoint_ppid_prpc_rpc | Seconds | Does not include destId, endpoint, ppid, prpc, rpc dimensions. | ||
arms_app_requests_seconds_ign_parent_ppid_prpc_rpc | Seconds | Does not include parent, ppid, prpc, rpc dimensions. | ||
arms_app_requests_seconds_ign_endpoint_parent_ppid_prpc_rpc | Seconds | Does not include endpoint, parent, ppid, prpc, rpc dimensions. | ||
Database type | Database request count | arms_db_requests_count_raw | Unit |
|
arms_db_requests_count_ign_rpc | Unit | Does not include interface dimension. | ||
Database error request count | arms_db_requests_error_count_raw | Count |
| |
arms_db_requests_error_count_ign_rpc | Unit | Does not include interface dimension. | ||
Database slow request count | arms_db_requests_slow_count_raw | Count |
| |
arms_db_requests_slow_count_ign_rpc | Count | Does not include interface dimension. | ||
Database request duration | arms_db_requests_seconds_raw | Seconds |
| |
arms_db_requests_seconds_ign_rpc | Seconds | Does not include interface dimension. | ||
SQL type | SQL request count | arms_sql_requests_count_raw |
| |
arms_sql_requests_count_ign_rpc | Does not include interface dimension. | |||
SQL error request count | arms_sql_requests_error_count_raw | Count |
| |
arms_sql_requests_error_count_ign_rpc | Unit | Does not include interface dimension. | ||
SQL slow request count | arms_sql_requests_slow_count_raw | Count |
| |
arms_sql_requests_slow_count_ign_rpc | Count | Does not include interface dimension. | ||
SQL request duration | arms_sql_requests_seconds_raw | Seconds |
| |
arms_sql_requests_seconds_ign_rpc | Seconds | Does not include interface dimension. | ||
Exception type | Exception request count | arms_exception_requests_count_raw | Count |
|
arms_exception_requests_count_ign_rpc | Count | Does not include interface dimension. | ||
Exception request duration | arms_exception_requests_seconds_raw | Seconds |
| |
arms_exception_requests_seconds_ign_rpc | Seconds | Does not include interface dimension. | ||
Status code type | Request count by status code | arms_requests_by_status_count_raw | Unit |
|
arms_requests_by_status_count_ign_rpc | Unit | Does not include interface dimension. | ||
Percentile type | Request latency percentiles Note Only supported by probes version 4.x and above. | arms_uni_requests_latency_seconds |
|
Usage examples
How to choose metrics when using promQL to count the number of requests for all interfaces of an application?
Based on the requirements, we need to find a metric that provides interface request counts. By checking the documentation, we know that general type metrics meet the requirements.
Since the statistical result only cares about the interface dimension, and other dimensions such as upstream interface, upstream application, and remote address are not of concern, when selecting specific metrics, we first need to ensure that the metric includes the interface dimension, and then the fewer other dimensions included in the metric, the better.
In summary, the optimal metric should be arms_app_requests_count_ign_destid_endpoint_ppid_prpc.
JVM metrics
Common dimensions
Dimension name | Dimension key |
Service name | service |
Service PID | pid |
Machine IP | serverIp |
Metrics list
Metric name | Metric | Metric type | Collection interval | Unit | Dimension |
Cumulative GC occurrences | arms_jvm_gc_total | Counter | 15s | None | Gen GC occurrence area:
Cause GC trigger reason (probe version 4.4.0 and above): System.gc(), Heap Dump Initiated GC, Allocation Failure, etc. |
Cumulative GC duration | arms_jvm_gc_seconds_total | Counter | 15s | Seconds | |
GC count between two collection intervals | arms_jvm_gc_delta | Gauge | 15s | None | |
GC duration between two collection intervals | arms_jvm_gc_seconds_delta | Gauge | 15s | Seconds | |
JVM thread count | arms_jvm_threads_count | Gauge | 15s | None | State thread status:
|
JVM memory region initial size | arms_jvm_mem_init_bytes | Gauge | 15s | Bytes | Area region:
ID region subdivision:
|
JVM memory region maximum size | arms_jvm_mem_max_bytes | Gauge | 15s | Bytes | |
JVM memory region used size | arms_jvm_mem_used_bytes | Gauge | 15s | Bytes | |
JVM memory region committed size | arms_jvm_mem_committed_bytes | Gauge | 15s | Bytes | |
JVM memory region usage ratio | arms_jvm_mem_usage_ratio | Gauge | 15s | Ratio (0~1) | |
JVM loaded classes | arms_class_load_loaded | Counter | 15s | None | None |
JVM unloaded classes | arms_class_load_un_loaded | Counter | 15s | None | None |
JVM buffer pool size | arms_jvm_buffer_pool_total_bytes | Gauge | 15s | Bytes | ID area:
|
JVM buffer pool used size | arms_jvm_buffer_pool_used_bytes | Gauge | 15s | Bytes | |
JVM buffer pool count | arms_jvm_buffer_pool_count | Gauge | 15s | None | |
Open file descriptor count | arms_file_desc_open_count | Gauge | 15s | None | None |
File descriptor open ratio (open count/maximum allowed open count) | arms_file_desc_open_ratio | Gauge | 15s | Ratio (0~1) | None |
System metrics
Common dimensions
Dimension name | Dimension key |
Service name | service |
Service PID | pid |
Machine IP | serverIp |
Metrics list
Metric name | Metric | Metric type | Collection interval | Unit |
Idle CPU percentage | arms_system_cpu_idle | Gauge | 15s | Percentage |
IO wait CPU percentage | arms_system_cpu_io_wait | Gauge | 15s | Percentage |
System CPU percentage | arms_system_cpu_system | Gauge | 15s | Percentage |
User mode CPU percentage | arms_system_cpu_user | Gauge | 15s | Percentage |
System load (1 minute) | arms_system_load | Gauge | 15s | None |
Disk free size | arms_system_disk_free_bytes | Gauge | 15s | Bytes |
Disk total size | arms_system_disk_total_bytes | Gauge | 15s | Bytes |
Disk usage | arms_system_disk_used_ratio | Gauge | 15s | Ratio (0~1) |
Memory buffer size | arms_system_mem_buffers_bytes | Gauge | 15s | Bytes |
Memory cache size | arms_system_mem_cached_bytes | Gauge | 15s | Bytes |
Memory free size | arms_system_mem_free_bytes | Gauge | 15s | Bytes |
Memory swap free size | arms_system_mem_swap_free_bytes | Gauge | 15s | Bytes |
Memory swap size | arms_system_mem_swap_total_bytes | Gauge | 15s | Bytes |
Memory size | arms_system_mem_total_bytes | Gauge | 15s | Bytes |
Used memory size | arms_system_mem_used_bytes | Gauge | 15s | Bytes |
Network receive traffic size | arms_system_net_in_bytes | Gauge | 15s | Bytes |
Network interface send traffic size | arms_system_net_out_bytes | Gauge | 15s | Bytes |
Network input error count | arms_system_net_in_err | Gauge | 15s | None |
Network output error count | arms_system_net_out_err | Gauge | 15s | None |
Thread pool/connection pool metrics
Common dimensions
Dimension name | Dimension key |
Service name | service |
Service PID | pid |
Machine IP | serverIp |
Thread pool name (supported by probe versions below 4.1.x) | name |
Thread pool type (supported by probe versions below 4.1.x) | type |
Metrics list
Probe version 4.1.x and above
Thread pool metrics
Metric name | Metric | Metric type | Collection interval | Dimension |
Core thread count | arms_thread_pool_core_pool_size | Gauge | 15s |
|
Maximum thread count | arms_thread_pool_max_pool_size | Gauge | 15s |
|
Active thread count | arms_thread_pool_active_thread_count | Gauge | 15s |
|
Current thread count | arms_thread_pool_current_thread_count | Gauge | 15s |
|
Thread pool historical maximum thread count (peak value of thread count since the thread pool was created) | arms_thread_pool_max_thread_count | Gauge | 15s |
|
Thread pool scheduled task count | arms_thread_pool_scheduled_task_count | Counter | 15s |
|
Thread pool completed task count | arms_thread_pool_completed_task_count | Counter | 15s |
|
Thread pool rejected task count | arms_thread_pool_rejected_task_count | Counter | 15s |
|
Thread pool task queue size | arms_thread_pool_queue_size | Gauge | 15s |
|
Connection pool metrics
Metric description | Metric name | Metric type | Collection interval | Dimension |
Connection count | arms_connection_pool_connection_count | Gauge | 15s |
|
Minimum idle connection count | arms_connection_pool_connection_min_idle_count | Gauge | 15s |
|
Maximum idle connection count | arms_connection_pool_connection_max_idle_count | Gauge | 15s |
|
Maximum connections | arms_connection_pool_connection_max_count | Gauge | 15s |
|
Blocked connection request count | arms_connection_pool_pending_request_count | Counter | 15s |
|
Probe versions below 4.1.x
Metric name | Metric | Metric type | Collection interval | Dimension |
Thread pool core thread count | arms_threadpool_core_size | Gauge | 15s | None |
Thread pool maximum thread count | arms_threadpool_max_size | Gauge | 15s | None |
Thread pool active thread count | arms_threadpool_active_size | Gauge | 15s | None |
Thread pool queue size | arms_threadpool_queue_size | Gauge | 15s | None |
Thread pool current size | arms_threadpool_current_size | Gauge | 15s | None |
Thread pool task count by status | arms_threadpool_task_total | Gauge | 15s | Status task status:
|
Scheduled task metrics
The following metrics only exist for scheduled tasks.
Common dimensions
Dimension name | Dimension key |
Service name | service |
Service PID | pid |
Machine IP | serverIp |
Task ID | rpc |
Metrics list
Metric name | Metric | Metric type | Collection interval | Unit |
Scheduling delay | arms_$callType_delay_milliseconds | Gauge | 15s | Milliseconds |
Go Runtime metrics
Metric name | Metric | Metric type | Collection interval |
Application startup duration (ms) | arms_golang_runtime_uptime | Int64Counter | 15s |
Current application Goroutine count | arms_golang_process_runtime_go_goroutines | Gauge | 15s |
Heap object memory (bytes) | arms_golang_process_runtime_go_mem_heap_alloc | Gauge | 15s |
Unallocated or reclaimed heap memory | arms_golang_process_runtime_go_mem_heap_idle | Gauge | 15s |
Used heap memory | arms_golang_process_runtime_go_mem_heap_inuse | Gauge | 15s |
Allocated live heap objects | arms_golang_process_runtime_go_mem_heap_objects | Gauge | 15s |
Memory released to the operating system from HeapIdle | arms_golang_process_runtime_go_mem_heap_released | Gauge | 15s |
Virtual memory size requested from the system | arms_golang_process_runtime_go_mem_heap_sys | Gauge | 15s |
Current live object count | arms_golang_process_runtime_go_mem_live_objects | Gauge | 15s |
GC count since program start | arms_golang_process_runtime_go_gc_count | Gauge | 15s |
Cumulative garbage collection (stop-the-world) time, i.e., unavailable time | arms_golang_process_runtime_go_gc_pause_total_ns | Int64Counter | 15s |
GC time distribution | arms_golang_process_runtime_go_gc_pause_ns | Int64Histogram | 15s |
Service access types and available dimensions
Client types
Access types
http_client
dubbo_client
hsf_client
dsf_client
notify_client
grpc_client
thrift_client
sofa_client
mq_client
kafka_client
Dimensions
parent: Upstream service name
ppid: Upstream service PID
destId: Request peer extended information
endpoint: Request peer address
excepType: Exception ID
excepInfo: Exception ID encoding rule
excepName: Exception name
stackTraceId: Exception stack ID
DB types
Access types
mysql
oracle
mariadb
postgresql
ppas
sqlserver
mongodb
dmdb
Dimensions
parent: Upstream service name
ppid: Upstream service PID
destId: Database name
endpoint: Database address
excepType: Exception ID
excepInfo: Exception ID encoding rule
excepName: Exception name
stackTraceId: Exception stack ID
sqlId: SQL statement ID
Server types
Access types
http
dubbo
hsf
dsf
user_method
mq
kafka
grpc
thrift
sofa
Dimensions
prpc: Upstream interface
parent: Upstream service name
ppid: Upstream service PID
endpoint: Service address
excepType: Exception ID
excepInfo: Exception ID encoding rule
excepName: Exception name
stackTraceId: Exception stack ID
Scheduled task types
Access types
xxl_job
spring_scheduled
quartz
elasticjob
jdk_timer
schedulerx
Dimensions
prpc: Upstream interface
parent: Upstream service name
ppid: Upstream service PID
excepType: Exception ID
excepInfo: Exception ID encoding rule
excepName: Exception name
stackTraceId: Exception stack ID