In high-throughput distributed systems, collecting every trace is expensive and often unnecessary. Most trace data is repetitive -- a problem that appears in one trace almost always appears in others. Sampling lets you keep the traces that matter while controlling costs and resource usage.
Application Real-Time Monitoring Service (ARMS) provides two head-based sampling policies -- fixed-rate sampling and adaptive sampling -- that cover most production scenarios. This guide helps you choose the right policy based on your priorities: reducing costs, protecting core business paths, handling major events, or managing fluctuating traffic.
How sampling works in distributed tracing
A sampling policy determines which traces to collect and which to discard. The decision can happen at three points in the trace lifecycle:
| Policy | When the decision is made | Trade-offs |
|---|---|---|
| Head-based sampling | At the root span of an ingress service (gateway, proxy, or core upstream service). If the root span is sampled, all downstream spans in the trace are also sampled. | Low overhead. Coherent traces. Cannot filter by outcome (errors, latency) because the decision is made before the trace completes. |
| Tail-based sampling | After all spans in a trace are collected. The server evaluates the completed trace and keeps it only if it matches specific criteria (errors, slow responses, anomalies). | Accurate filtering for failed and slow traces. Higher overhead and data costs because all trace data must be buffered before the decision. |
| Unitary sampling | Each service independently decides whether to sample its own spans. No coordination across services. | Lightweight per-service setup. Incomplete traces because different services may make different sampling decisions for the same request. |
ARMS uses head-based sampling to minimize the cost of observable data while keeping traces coherent. For billing details, see Billing.
Sampling policies in ARMS
ARMS offers two head-based sampling policies:
Fixed-rate sampling -- Samples traces at a specified percentage at the ingress service. For example, a 10% rate records 1 out of every 10 traces.
Adaptive sampling -- Samples 10 traces per minute for each of the 1,000 API operations with the highest request volume, ranked by the Least Frequently Used (LFU) algorithm. All other API operations share a combined quota of 10 traces per minute. This policy, developed by ARMS, prevents high-traffic operations from dominating trace storage.
Both policies support full collection for specific API operations when you need 100% sampling on critical paths.
Choose a policy by scenario
The right sampling policy depends on what you are optimizing for. Use the following decision table, then see the detailed guidance for each scenario.
| Your priority | Recommended policy | Key benefit |
|---|---|---|
| Reduce trace costs | Fixed-rate sampling at a low rate (for example, 5%) | Cuts costs without significantly reducing exception visibility |
| Protect core business paths | Fixed-rate sampling + full collection for critical operations | Captures every trace on key paths while sampling the rest |
| Full visibility during major O&M events | Fixed-rate sampling at 100% | Complete audit trail for promotions, releases, or performance tests |
| Handle fluctuating traffic | Adaptive sampling | Automatically balances coverage across high-traffic and low-traffic operations |
You can copy trace sampling settings from one application to another in batches. For details, see Synchronize application settings to other applications.
Reduce trace costs with fixed-rate sampling
Lower the default sampling rate from 10% to a rate that fits your budget, such as 5%.
Halving the rate cuts trace costs roughly in half, but exception visibility stays high. Errors that occur in one trace typically recur in other traces of the same application. At scale, even a 5% rate captures most exceptions -- unless the error is a one-time spike.

To configure the sampling rate, see Fixed-rate sampling.
Protect core business paths with full collection
For applications with critical paths -- such as product querying and purchasing in an e-commerce system -- combine a higher fixed sampling rate with full collection on specific operations. Focus on the operations that matter most, rather than less critical ones like querying and editing user information.
Increase the fixed sampling rate above the default 10% for core applications.
Enable full collection for operations that require complete trace coverage, such as product querying and purchasing.
You can target operations by exact name, prefix, or suffix.
Full collection can cause a sharp increase in collected data. Enable it only for operations that genuinely require 100% trace coverage.

To configure full collection, see Fixed-rate sampling.
Get full visibility during major O&M events
During large-scale promotions, load tests, or new-release rollouts, set a 100% sampling rate for the affected applications. This gives you a complete audit trail for troubleshooting, post-event review, and accountability.
To apply the 100% rate only to specific applications, filter by tag. See Manage tags.
After the event ends, lower the sampling rate to avoid unnecessary costs and performance overhead.
Handle fluctuating traffic with adaptive sampling
If your applications serve many API operations with unpredictable traffic patterns, fixed-rate sampling has two drawbacks:
Over-sampling on high-traffic operations -- Popular operations generate a disproportionate share of sampled traces, most of which are redundant.
Under-sampling on low-traffic operations -- Operations with few requests (such as scheduled jobs) may not appear in the sampled data at all. If an exception occurs on an under-sampled operation, you have no trace to investigate.
Manually enabling full collection for individual operations is also impractical -- it creates a large maintenance workload and risks overlooking critical operations.
Adaptive sampling addresses these problems. It allocates a fixed trace quota per operation using the LFU algorithm, so collected data does not grow linearly with traffic. ARMS also enables a minimum sampling policy by default: every API operation is automatically sampled at least once per minute, so even low-traffic operations are represented.
You can still enable full collection for specific critical operations alongside adaptive sampling.

To configure adaptive sampling, see Adaptive sampling.
What's next
After you configure a sampling policy, set up filter conditions and aggregation dimensions to analyze trace data in real time. See Trace analysis.