Application Real-Time Monitoring Service (ARMS) provides continuous profiling through the ARMS agent. This feature collects CPU, memory, and code-level diagnostics from Java applications at runtime. This report quantifies that overhead under realistic production workloads.
Key finding: With all continuous profiling features enabled, CPU and memory overhead stays under 5%, and response time increases by less than 0.2 ms.
Test setup
Application architecture
The test application is built with Spring Web MVC and handles two types of requests:
${mall-gateway}/case/api/v1/mysql/execute-- accesses MySQL 1 to 4 times per request${mall-gateway}/case/api/v1/redis/execute-- accesses Redis 1 to 10 times per request
Each request type accounts for 50% of the total queries per second (QPS).
Environment
| Component | Configuration |
|---|---|
| Stress testing | Alibaba Cloud Performance Testing (PTS) |
| Cluster | Alibaba Cloud Container Service for Kubernetes (ACK), single cluster |
| Node instance type | ecs.u1-c1m2.8xlarge |
| Operating system | Alibaba Cloud Linux 2.1903 LTS 64-bit |
| Application pods | 2 cores, 4 GB memory, 2 replicas |
| ARMS agent | Java v4.2.1 |
The Java application, MySQL, and Redis are all deployed in the same ACK cluster.
Demo code: alibabacloud-microservice-demo/arms-demo
Procedure
Each test run follows this sequence:
Install the ARMS agent and set the sampling rate (10% or 100%).
Warm up the application at 50 QPS for 5 minutes.
Run a stress test at the target QPS for 30 minutes with continuous profiling disabled. Record the baseline CPU usage, memory usage, and response time.
Dynamically enable all continuous profiling features: CPU diagnostics, memory diagnostics, and code diagnostics. Continue the stress test for another 30 minutes and record the same metrics.
Each stress test runs for 1 hour total. Tests are repeated at 500, 1,000, and 2,000 QPS for both 10% and 100% sampling rates.
Baseline: performance without continuous profiling
| QPS | Sampling rate | CPU | Memory | Response time |
|---|---|---|---|---|
| 500 | 10% | 8.112% | 13.52% | 55.5 ms |
| 500 | 100% | 8.416% | 13.62% | 56.5 ms |
| 1,000 | 10% | 15.247% | 14.14% | 62.9 ms |
| 1,000 | 100% | 15.614% | 14.42% | 65.3 ms |
| 2,000 | 10% | 30.550% | 14.64% | 70.6 ms |
| 2,000 | 100% | 30.945% | 14.67% | 71.1 ms |
Performance with continuous profiling enabled
| QPS | Sampling rate | CPU | Memory | Response time |
|---|---|---|---|---|
| 500 | 10% | 8.912% | 15.52% | 55.6 ms |
| 500 | 100% | 9.316% | 15.71% | 56.6 ms |
| 1,000 | 10% | 17.140% | 16.24% | 63.0 ms |
| 1,000 | 100% | 17.710% | 16.82% | 65.4 ms |
| 2,000 | 10% | 34.650% | 16.84% | 70.7 ms |
| 2,000 | 100% | 35.245% | 16.89% | 71.3 ms |
Overhead from continuous profiling
This table isolates the overhead from continuous profiling. It compares metrics before and after enabling the feature within the same test run.
| QPS | Sampling rate | CPU | Memory | Response time |
|---|---|---|---|---|
| 500 | 10% | +0.80% | +2.00% | +0.1 ms |
| 500 | 100% | +0.90% | +2.09% | +0.1 ms |
| 1,000 | 10% | +1.893% | +2.10% | +0.1 ms |
| 1,000 | 100% | +2.096% | +2.40% | +0.1 ms |
| 2,000 | 10% | +4.10% | +2.20% | +0.1 ms |
| 2,000 | 100% | +4.30% | +2.22% | +0.2 ms |
Key observations:
CPU overhead scales with QPS: from +0.80% at 500 QPS to +4.30% at 2,000 QPS (100% sampling). At all tested load levels, CPU overhead remains under 5%.
Memory overhead stays consistent at approximately +2% across all load levels and sampling rates.
Response time increases by 0.1 to 0.2 ms, a negligible impact on end-user latency.
Conclusion
With all continuous profiling features enabled (CPU diagnostics, memory diagnostics, and code diagnostics), CPU and memory overhead stays within 5%. Enabling a subset of features reduces the overhead further.
Continuous profiling has negligible impact on application latency. Response time increases by at most 0.2 ms under the tested conditions.