The Application Monitoring feature of Application Real-Time Monitoring Service (ARMS) collects metric data every minute. Use these metrics to configure alert rules that detect anomalies in your application's JVM, services, dependencies, infrastructure, and more.
JVM
JVM metrics are for reference only. For authoritative definitions, see the official JVM documentation.
Metrics
Metric | Unit | Commonly used | Description |
Number of JVM full GCs (instantaneous value) | - | Yes | Full garbage collections in the last N minutes. Frequent full GCs indicate potential memory pressure or application errors. |
JVM full GC duration (instantaneous value) | ms | No | Time spent on full GCs in the last N minutes. Long full GC pauses cause application stuttering and degrade user experience. |
Number of JVM young GCs (instantaneous value) | - | Yes | Young generation garbage collections in the last N minutes. A high count suggests rapid object creation or possible memory leaks. |
JVM young GC duration (instantaneous value) | ms | No | Time spent on young GCs in the last N minutes. Longer durations indicate declining GC efficiency and may cause application pauses. |
Total JVM heap memory | MB | No | Total heap memory allocated to the JVM, including young and old generations. Undersized heaps lead to frequent GCs; oversized heaps waste system resources. |
Used JVM heap memory | MB | Yes | Heap memory currently in use. Monitor this metric to detect memory leaks or excessive memory consumption before they cause OutOfMemoryErrors. |
Committed JVM non-heap memory | MB | No | Non-heap memory committed by the JVM. Excessive non-heap memory may indicate too many loaded classes or static variables. |
Initial JVM non-heap memory | MB | No | Initial non-heap memory size. Dynamically calculated based on JVM version, operating system, and JVM parameters. |
Maximum JVM non-heap memory | MB | No | Maximum non-heap memory. Controlled by |
Used JVM non-heap memory | MB | Yes | Non-heap memory in use, including Metaspace and PermGen. |
Used JVM metaspace | MB | No | Memory used for class metadata (class structures, methods, and fields). Typically stable during normal operation. |
Number of JVM blocked threads | - | No | Threads waiting to acquire a monitor lock. A high count may indicate lock contention and can degrade system performance. |
Total number of JVM threads | - | Yes | Threads across all states. Too many threads can exhaust memory and CPU resources, affecting application stability. |
Number of JVM deadlocked threads | - | No | Threads involved in deadlocks. When deadlocks occur, the affected threads cannot proceed, and the application may crash. |
Number of new JVM threads | - | No | Threads recently created by the JVM. Excessive thread creation wastes system resources and adds scheduling overhead. |
Number of JVM runnable threads | - | No | The maximum number of threads supported by the JVM at runtime. Excessive thread creation consumes significant memory resources and may cause the system to slow down or crash. |
Number of JVM terminated threads | - | No | The number of threads that can run concurrently in the JVM at runtime. Control thread counts based on actual requirements to prevent resource waste or thread starvation. |
Number of JVM timed-out waiting threads | - | Yes | Threads that timed out while waiting for a resource. A high count may indicate resource bottlenecks. |
Number of JVM waiting threads | - | No | Threads in the waiting state. For high-concurrency applications, a rising count of waiting threads can signal performance degradation. |
Number of JVM GCs (cumulative value) | - | No | Total garbage collections since JVM startup. |
JVM mark-and-sweep garbage collection cycles (cumulative value) | - | No | Total mark-and-sweep GC cycles since JVM startup. |
JVM heap memory usage (%) | - | No | Ratio of allocated heap memory to total heap memory. Keep this below 70% to avoid memory overflow risks. |
Dimensions and filters
These metrics are collected per node IP address. Filter options:
Filter type | Description | Example |
Traversal | Evaluate each node independently and create separate alerts per node. | - |
Equals (=) | Alert on specific nodes only. |
|
No dimension | Aggregate data across all nodes into a single alert. | - |
Scheduled tasks
ARMS Application Monitoring supports only XXL-JOB, SchedulerX, and JDK-Timer scheduled task types.
Metrics
Metric | Unit | Commonly used | Description |
Duration | ms | No | Average execution time of the scheduled task. |
Total number of executions | - | No | Total times the scheduled task ran. |
Number of execution errors | - | No | Times the scheduled task failed within the specified interval. |
Scheduling latency | ms | No | Delay between the scheduled start time and actual task execution. |
Dimensions and filters
These metrics are collected per scheduled task. Filter options:
Filter type | Description | Example |
Traversal | Evaluate each scheduled task independently and create separate alerts. | - |
Equals (=) | Alert on specific scheduled tasks only. |
|
No dimension | Aggregate data across all scheduled tasks into a single alert. | - |
Exceptions
Metrics
Metric | Unit | Commonly used | Description |
Number of exceptions | - | Yes | Runtime exceptions such as NullPointerException, ArrayIndexOutOfBoundsException, and IOException. Detects error spikes in call stacks. |
Response time of abnormal interface calls | ms | Yes | Response time for interface calls that returned exceptions. Helps assess the performance impact of errors on specific interfaces. |
Dimensions and filters
These metrics support two dimensions: interface name and exception.
By interface name:
Filter type | Description | Example |
Traversal | Evaluate each interface independently. | - |
Equals (=) | Alert on specific interfaces. |
|
Not Equals (!=) | Exclude specific interfaces. |
|
Contains | Match interfaces containing a keyword. |
|
Does Not Contain | Match interfaces not containing a keyword. |
|
Regular expression | Match interfaces by regex. |
|
No dimension | Aggregate across all interfaces. | - |
By exception:
Filter type | Description | Example |
Traversal | Evaluate each exception type independently. | - |
Equals (=) | Alert on specific exceptions. |
|
Not Equals (!=) | Exclude specific exceptions. |
|
Contains | Match exceptions containing a keyword. |
|
Does Not Contain | Match exceptions not containing a keyword. |
|
Regular expression | Match exceptions by regex. |
|
No dimension | Aggregate across all exceptions. | - |
Application dependency services
Metrics
Metric | Unit | Commonly used | Description |
Number of application dependency service calls | - | No | Calls to downstream interfaces the application depends on. Monitor for unexpected changes in call volume. |
Application dependency service call error rate (%) | - | No | Calculated as: abnormal downstream interface requests / total interface requests. An increasing error rate indicates dependency issues affecting your application. |
Response time of application dependency service calls | ms | Yes | Average response time of downstream interface calls. Rising latency from dependency services may degrade your application's performance. |
Number of slow calls of an application dependency service | - | No | Dependency service calls that exceeded the response time threshold. A high count suggests bottlenecks in downstream services. |
Dimensions and filters
These metrics are collected per interface call type (such as HTTP, MySQL, and Redis). Filter options:
Filter type | Description | Example |
Traversal | Evaluate each call type independently. | - |
Equals (=) | Alert on specific call types. |
|
No dimension | Aggregate across all call types. | - |
ECS instances
Metrics
Metric | Unit | Commonly used | Description |
Node CPU utilization (%) | - | No | CPU utilization of the node. High utilization can cause slow response times and service unavailability. |
Node CPU utilization in user mode (%) | - | No | CPU time spent on user-space processes such as web services and databases, as a percentage of total CPU time. |
Idle node disk space | MB | Yes | Unused disk space. A full disk can cause system crashes or unexpected behavior. |
Node disk utilization (%) | - | No | Ratio of used disk space to total disk space. Higher utilization means less available storage. |
Node system load | - | Yes | System load average. For a node with N CPU cores, the maximum recommended load is N. |
Idle node memory | MB | Yes | Unused memory. Insufficient memory may trigger out-of-memory (OOM) errors. |
Node memory usage (%) | - | No | Percentage of memory in use. If usage exceeds 80%, reduce memory pressure by adjusting configurations or optimizing workloads. |
Number of error packets received on the node | - | No | Error packets received during network communication, possibly caused by transmission or application issues. |
Number of error packets sent from the node | - | No | Error packets sent during network communication. Helps check for network anomalies. |
Number of JVM instances | - | Yes | JVM instances running in real time. Typically used to detect service downtime. |
Number of bytes sent from the node | - | No | Data volume sent over the network, including application data, system messages, and error messages. |
Number of packets sent from the node | - | No | Total packets sent over the network. |
Number of bytes received on the node | - | No | Total data volume received over the network. |
Number of packets received on the node | - | No | Total packets received over the network. |
Dimensions and filters
These metrics are collected per node IP address. Filter options:
Filter type | Description | Example |
Traversal | Evaluate each node independently. | - |
Equals (=) | Alert on specific nodes. |
|
No dimension | Aggregate across all nodes. | - |
Containers
Container CPU and memory metrics require ARMS agent v4.1.0 or later.
Metrics
Metric | Unit | Commonly used | Description |
CPU utilization in user mode | - | No | CPU time spent executing code in user space, including application logic and non-kernel library functions. |
CPU utilization in kernel mode | - | No | CPU time spent on kernel operations such as system calls, interrupt handling, and kernel services. |
Total CPU utilization | - | Yes | Sum of user-mode and kernel-mode CPU utilization. |
Memory usage | Bytes | Yes | Memory actively used by the container at runtime, including non-swappable memory and active cached data. |
Number of sent network packets | - | No | Packets sent from the container over the network. |
Number of sent bytes | Bytes | Yes | Bytes sent from the container over the network. |
Number of sent error packets | - | No | Error packets sent during network communication. Helps detect container network issues. |
Number of sent discarded packets | - | No | Outbound packets dropped by the system or network stack since the container network interface started. |
Number of received packets | - | No | Packets received by the container over the network. |
Number of received bytes | Bytes | Yes | Total data received by the container over the network. |
Number of received error packets | - | No | Error packets received during network communication. Received error packets may prevent the container from processing network traffic correctly. |
Number of received discarded packets | - | No | Inbound packets dropped by the system or network stack since the container network interface started. |
Dimensions and filters
These metrics are collected per node IP address. Filter options:
Filter type | Description | Example |
Traversal | Evaluate each container independently. | - |
Equals (=) | Alert on specific containers. |
|
No dimension | Aggregate across all containers. | - |
Application providing services
Metrics
Metric | Unit | Commonly used | Description |
Number of calls | - | Yes | Entry-point calls to the application, including HTTP and Dubbo calls. Useful for analyzing traffic volume and detecting anomalies. |
Number of slow calls | - | No | Entry-point calls (HTTP and Dubbo) that exceeded the response time threshold. |
Number of error calls | - | Yes | Entry-point calls that returned HTTP status code 400 or were intercepted by the top layer of Dubbo. |
Call error rate (%) | - | Yes | Calculated as: error entry-point calls / total entry-point calls x 100%. |
Call response time | ms | Yes | Average response time of entry-point calls (HTTP and Dubbo). Helps identify slow requests and exceptions. |
Dimensions and filters
These metrics support two dimensions: interface name and interface call type.
By interface name:
Filter type | Description | Example |
Traversal | Evaluate each interface independently. | - |
Equals (=) | Alert on specific interfaces. |
|
Not Equals (!=) | Exclude specific interfaces. |
|
Contains | Match interfaces containing a keyword. |
|
Does Not Contain | Match interfaces not containing a keyword. |
|
Regular expression | Match interfaces by regex. |
|
No dimension | Aggregate across all interfaces. | - |
By interface call type:
Filter type | Description | Example |
Traversal | Evaluate each call type independently (HTTP, MySQL, Redis, etc.). | - |
Equals (=) | Alert on specific call types. |
|
No dimension | Aggregate across all call types. | - |
Thread pools
Metrics
Metric | Commonly used | Description |
Number of core threads | Yes | Always-active threads in the pool. |
Maximum number of threads | Yes | Upper limit of concurrent threads in the pool. |
Number of active threads | Yes | Threads currently executing tasks. Evaluates thread pool performance and utilization. |
Queue size | Yes | Task queue capacity. A queue that is too small causes long wait times; a queue that is too large can exhaust system resources. |
Current number of threads | Yes | Threads that are running or waiting to run. |
Number of executed tasks | Yes | Tasks completed by the thread pool. Evaluates throughput. |
Thread pool usage (%) | Yes | Ratio of threads in use to the total thread pool size. |
Dimensions and filters
These metrics support three dimensions: node IP address, thread pool name, and thread pool type.
By node IP address:
Filter type | Description | Example |
Traversal | Evaluate each node independently. | - |
Equals (=) | Alert on specific nodes. |
|
No dimension | Aggregate across all nodes. | - |
By thread pool name:
Filter type | Description | Example |
Traversal | Evaluate each thread pool independently. | - |
Equals (=) | Alert on specific thread pools. |
|
No dimension | Aggregate across all thread pools. | - |
By thread pool type:
Filter type | Description | Example |
Traversal | Evaluate each thread pool type independently. | - |
Equals (=) | Alert on specific thread pool types. |
|
No dimension | Aggregate across all thread pool types. | - |
HTTP status codes
Metrics
Metric | Commonly used | Description |
Number of HTTP requests with 4xx status codes | Yes | Requests that returned a 4xx status code, indicating client errors such as missing resources or parameters. Common codes: 400, 404. |
Number of HTTP requests with 5xx status codes | Yes | Requests that returned a 5xx status code, indicating server errors or system overload. Common codes: 500, 503. |
Dimensions and filters
These metrics are collected per interface name. Filter options:
Filter type | Description | Example |
Traversal | Evaluate each interface independently. | - |
Equals (=) | Alert on specific interfaces. |
|
Not Equals (!=) | Exclude specific interfaces. |
|
Contains | Match interfaces containing a keyword. |
|
Does Not Contain | Match interfaces not containing a keyword. |
|
Regular expression | Match interfaces by regex. |
|
No dimension | Aggregate across all interfaces. | - |
Databases
Metrics
Metric | Unit | Commonly used | Description |
Number of database requests | - | Yes | Read or write requests sent to the database at runtime. High request volume affects application performance and response time. |
Number of database request errors | - | Yes | Failed database requests, including connection failures, query errors, and permission issues. A high error count indicates problems with application-database interaction. |
Database request response time | ms | Yes | Time between sending a database request and receiving a response. High response times cause application stuttering or slowdowns. |
Number of slow database requests | - | No | Database requests that exceeded the response time threshold. Frequent slow requests degrade application performance. |
Dimensions and filters
These metrics are collected per database name. Filter options:
Filter type | Description | Example |
Traversal | Evaluate each database independently. | - |
Equals (=) | Alert on specific databases. |
|
No dimension | Aggregate across all databases. | - |