Continuous Profiling Overhead Analysis for Java v4.x - ARMS

Application Real-Time Monitoring Service (ARMS) provides continuous profiling through the ARMS agent. This feature collects CPU, memory, and code-level diagnostics from Java applications at runtime. This report quantifies that overhead under realistic production workloads.

Key finding: With all continuous profiling features enabled, CPU and memory overhead stays under 5%, and response time increases by less than 0.2 ms.

Test setup

Application architecture

The test application is built with Spring Web MVC and handles two types of requests:

${mall-gateway}/case/api/v1/mysql/execute -- accesses MySQL 1 to 4 times per request
${mall-gateway}/case/api/v1/redis/execute -- accesses Redis 1 to 10 times per request

Each request type accounts for 50% of the total queries per second (QPS).

Environment

Component	Configuration
Stress testing	Alibaba Cloud Performance Testing (PTS)
Cluster	Alibaba Cloud Container Service for Kubernetes (ACK), single cluster
Node instance type	ecs.u1-c1m2.8xlarge
Operating system	Alibaba Cloud Linux 2.1903 LTS 64-bit
Application pods	2 cores, 4 GB memory, 2 replicas
ARMS agent	Java v4.2.1

The Java application, MySQL, and Redis are all deployed in the same ACK cluster.

Demo code: alibabacloud-microservice-demo/arms-demo

Procedure

Each test run follows this sequence:

Install the ARMS agent and set the sampling rate (10% or 100%).
Warm up the application at 50 QPS for 5 minutes.
Run a stress test at the target QPS for 30 minutes with continuous profiling disabled. Record the baseline CPU usage, memory usage, and response time.
Dynamically enable all continuous profiling features: CPU diagnostics, memory diagnostics, and code diagnostics. Continue the stress test for another 30 minutes and record the same metrics.

Each stress test runs for 1 hour total. Tests are repeated at 500, 1,000, and 2,000 QPS for both 10% and 100% sampling rates.

Baseline: performance without continuous profiling

QPS	Sampling rate	CPU	Memory	Response time
500	10%	8.112%	13.52%	55.5 ms
500	100%	8.416%	13.62%	56.5 ms
1,000	10%	15.247%	14.14%	62.9 ms
1,000	100%	15.614%	14.42%	65.3 ms
2,000	10%	30.550%	14.64%	70.6 ms
2,000	100%	30.945%	14.67%	71.1 ms

Performance with continuous profiling enabled

QPS	Sampling rate	CPU	Memory	Response time
500	10%	8.912%	15.52%	55.6 ms
500	100%	9.316%	15.71%	56.6 ms
1,000	10%	17.140%	16.24%	63.0 ms
1,000	100%	17.710%	16.82%	65.4 ms
2,000	10%	34.650%	16.84%	70.7 ms
2,000	100%	35.245%	16.89%	71.3 ms

Overhead from continuous profiling

This table isolates the overhead from continuous profiling. It compares metrics before and after enabling the feature within the same test run.

QPS	Sampling rate	CPU	Memory	Response time
500	10%	+0.80%	+2.00%	+0.1 ms
500	100%	+0.90%	+2.09%	+0.1 ms
1,000	10%	+1.893%	+2.10%	+0.1 ms
1,000	100%	+2.096%	+2.40%	+0.1 ms
2,000	10%	+4.10%	+2.20%	+0.1 ms
2,000	100%	+4.30%	+2.22%	+0.2 ms

Key observations:

CPU overhead scales with QPS: from +0.80% at 500 QPS to +4.30% at 2,000 QPS (100% sampling). At all tested load levels, CPU overhead remains under 5%.
Memory overhead stays consistent at approximately +2% across all load levels and sampling rates.
Response time increases by 0.1 to 0.2 ms, a negligible impact on end-user latency.

Conclusion

With all continuous profiling features enabled (CPU diagnostics, memory diagnostics, and code diagnostics), CPU and memory overhead stays within 5%. Enabling a subset of features reduces the overhead further.
Continuous profiling has negligible impact on application latency. Response time increases by at most 0.2 ms under the tested conditions.