The Application Real-Time Monitoring Service (ARMS) agent for Go instruments your application at compile time, providing observability without manual code changes. This report benchmarks the resulting performance overhead across three traffic levels (500, 1,000, and 2,000 QPS) and two sampling rates (10% and 100%). Review these results to assess the impact before you connect your application to Application Monitoring.
Key findings
CPU: Under 10% additional usage across all tested scenarios.
Memory: Under 1.3% increase at all traffic levels.
Response time: 1--2 ms increase, even at 2,000 queries per second (QPS).
Sampling rate: Switching from 10% to 100% sampling adds roughly 2% CPU overhead. Memory and response time differences are negligible.
Test setup
Architecture
The test application is a Go service built with Net/HTTP that handles two types of requests:
MySQL queries -- 50% of total QPS
Redis operations -- 50% of total QPS
Performance Testing (PTS) generates the load. The Go application, MySQL, and Redis all run in the same Container Service for Kubernetes (ACK) cluster.
Environment
| Component | Specification |
|---|---|
| Load generator | Performance Testing (PTS) |
| Kubernetes cluster | ACK with ecs.c6.2xlarge nodes |
| Node OS | Alibaba Cloud Linux 3.2104 LTS 64-bit |
| Pod resources | 1 core, 2 GB memory, 2 replicas |
| Agent version | ARMS agent for Go V1.0.0 |
Procedure
Each stress test runs for 1 hour, preceded by a 3-minute warm-up at 100 QPS.
Establish baseline. Run stress tests at 500, 1,000, and 2,000 QPS without the agent. Record CPU, memory, and response time as baseline metrics.
Test with 10% sampling. Install the ARMS agent for Go, set the sampling rate to 10% in the sampling policy, and repeat the same stress tests.
Test with 100% sampling. Set the sampling rate to 100% and repeat the stress tests.
All basic Application Monitoring features are enabled during testing: metrics, traces, and runtime monitoring. All plug-ins are also enabled. Runtime monitoring adds approximately 0.5% CPU utilization. To reduce overhead, disable runtime monitoring in application settings.
Results
Baseline (no agent)
| Traffic level | CPU | Memory | Response time |
|---|---|---|---|
| 500 QPS | 2.42% | 0.71% | 30 ms |
| 1,000 QPS | 4.21% | 0.91% | 30 ms |
| 2,000 QPS | 8.5% | 1.41% | 30 ms |
CPU: Percentage of total CPU consumed by the application pods.
Memory: Percentage of total memory consumed by the application pods. Because pod memory grows naturally until it reaches the
requestsvalue, this report uses the actual memory reading at the end of each 1-hour test.Response time: Average across all requests, in milliseconds.
With the ARMS agent
| Traffic level | Sampling rate | CPU | Memory | Response time |
|---|---|---|---|---|
| 500 QPS | 10% | 5.15% | 1.25% | 30 ms |
| 1,000 QPS | 10% | 8.42% | 1.52% | 31 ms |
| 2,000 QPS | 10% | 16.2% | 2.5% | 31 ms |
| 500 QPS | 100% | 5.25% | 1.85% | 31 ms |
| 1,000 QPS | 100% | 10.48% | 2.02% | 32 ms |
| 2,000 QPS | 100% | 18.45% | 2.63% | 32 ms |
Overhead (difference from baseline)
| Traffic level | Sampling rate | CPU | Memory | Response time |
|---|---|---|---|---|
| 500 QPS | 10% | +2.73% | +0.54% | 0 ms |
| 1,000 QPS | 10% | +4.21% | +0.61% | +1 ms |
| 2,000 QPS | 10% | +7.7% | +1.09% | +1 ms |
| 500 QPS | 100% | +2.83% | +1.14% | +1 ms |
| 1,000 QPS | 100% | +6.27% | +1.11% | +2 ms |
| 2,000 QPS | 100% | +9.95% | +1.22% | +2 ms |
Conclusions
CPU overhead stays under 10%. At the highest tested load (2,000 QPS) with 100% sampling, CPU increases by 9.95%.
Memory overhead is minimal. The largest observed increase is 1.22%, at 2,000 QPS with 100% sampling.
Response time impact is negligible. At 2,000 QPS, latency increases by 1 ms (10% sampling) or 2 ms (100% sampling).
Sampling rate trade-off is small. Moving from 10% to 100% sampling adds roughly 2% more CPU overhead, with marginal effects on memory and response time. To lower overhead further, reduce the sampling rate in the sampling policy or disable runtime monitoring in application settings (saves approximately 0.5% CPU).