All Products
Search
Document Center

Application Real-Time Monitoring Service:Continuous profiling performance overhead: ARMS agent for Java v4.x

Last Updated:Mar 11, 2026

Application Real-Time Monitoring Service (ARMS) provides continuous profiling through the ARMS agent. This feature collects CPU, memory, and code-level diagnostics from Java applications at runtime. This report quantifies that overhead under realistic production workloads.

Key finding: With all continuous profiling features enabled, CPU and memory overhead stays under 5%, and response time increases by less than 0.2 ms.

Test setup

Application architecture

Test architecture

The test application is built with Spring Web MVC and handles two types of requests:

  • ${mall-gateway}/case/api/v1/mysql/execute -- accesses MySQL 1 to 4 times per request

  • ${mall-gateway}/case/api/v1/redis/execute -- accesses Redis 1 to 10 times per request

Each request type accounts for 50% of the total queries per second (QPS).

Environment

ComponentConfiguration
Stress testingAlibaba Cloud Performance Testing (PTS)
ClusterAlibaba Cloud Container Service for Kubernetes (ACK), single cluster
Node instance typeecs.u1-c1m2.8xlarge
Operating systemAlibaba Cloud Linux 2.1903 LTS 64-bit
Application pods2 cores, 4 GB memory, 2 replicas
ARMS agentJava v4.2.1

The Java application, MySQL, and Redis are all deployed in the same ACK cluster.

Demo code: alibabacloud-microservice-demo/arms-demo

Procedure

Each test run follows this sequence:

  1. Install the ARMS agent and set the sampling rate (10% or 100%).

  2. Warm up the application at 50 QPS for 5 minutes.

  3. Run a stress test at the target QPS for 30 minutes with continuous profiling disabled. Record the baseline CPU usage, memory usage, and response time.

  4. Dynamically enable all continuous profiling features: CPU diagnostics, memory diagnostics, and code diagnostics. Continue the stress test for another 30 minutes and record the same metrics.

Each stress test runs for 1 hour total. Tests are repeated at 500, 1,000, and 2,000 QPS for both 10% and 100% sampling rates.

Baseline: performance without continuous profiling

QPSSampling rateCPUMemoryResponse time
50010%8.112%13.52%55.5 ms
500100%8.416%13.62%56.5 ms
1,00010%15.247%14.14%62.9 ms
1,000100%15.614%14.42%65.3 ms
2,00010%30.550%14.64%70.6 ms
2,000100%30.945%14.67%71.1 ms

Performance with continuous profiling enabled

QPSSampling rateCPUMemoryResponse time
50010%8.912%15.52%55.6 ms
500100%9.316%15.71%56.6 ms
1,00010%17.140%16.24%63.0 ms
1,000100%17.710%16.82%65.4 ms
2,00010%34.650%16.84%70.7 ms
2,000100%35.245%16.89%71.3 ms

Overhead from continuous profiling

This table isolates the overhead from continuous profiling. It compares metrics before and after enabling the feature within the same test run.

QPSSampling rateCPUMemoryResponse time
50010%+0.80%+2.00%+0.1 ms
500100%+0.90%+2.09%+0.1 ms
1,00010%+1.893%+2.10%+0.1 ms
1,000100%+2.096%+2.40%+0.1 ms
2,00010%+4.10%+2.20%+0.1 ms
2,000100%+4.30%+2.22%+0.2 ms

Key observations:

  • CPU overhead scales with QPS: from +0.80% at 500 QPS to +4.30% at 2,000 QPS (100% sampling). At all tested load levels, CPU overhead remains under 5%.

  • Memory overhead stays consistent at approximately +2% across all load levels and sampling rates.

  • Response time increases by 0.1 to 0.2 ms, a negligible impact on end-user latency.

Conclusion

  • With all continuous profiling features enabled (CPU diagnostics, memory diagnostics, and code diagnostics), CPU and memory overhead stays within 5%. Enabling a subset of features reduces the overhead further.

  • Continuous profiling has negligible impact on application latency. Response time increases by at most 0.2 ms under the tested conditions.