All Products
Search
Document Center

Application Real-Time Monitoring Service:Select a trace sampling mode for the ARMS agent V3.2.8 and later

Last Updated:Nov 08, 2024

This topic describes the trace sampling modes that are supported by Application Real-Time Monitoring Service (ARMS). You can select an appropriate mode based on your scenarios so that you can obtain the trace data that you want at a low cost.

Terms

  • span: a specific operation in a request, such as a remote call or an internal method call.

  • root span: the first span in a trace.

  • local root span: the first span of a trace in a single service.

  • span context: the context of a span. A span context is associated with a specific operation in a request.

  • head-based sampling: makes a sampling decision upfront at the root span and ensures that whole traces are sampled.

  • non-head based sampling: takes effect if head-based sampling is not triggered, and may be triggered at any local root span in a trace. In most cases, the integrity of the trace cannot be guaranteed.

Sampling policies and marks

ARMS provides two head-based sampling policies and three non-head based sampling policies, to help you sample the significant trace data.

Sampling marks

Sampling marks specify whether to sample trace data when trace contexts are passed across processes by using EagleEye protocol. The key of the header is EagleEye-Sampled, and the valid values are:

  • s0: not sampled

  • s1: sampled

Sampling marks can also record sampling reasons in the local root span where trace data is sampled. The marks are stored in spans in the form of attributes. The key is sample.reason and valid values are:

  • s2: minimum sampling for all interfaces

  • s3: custom sampling

  • s4: fixed-rate sampling

  • s5: reserved

  • s6: adaptive sampling

  • s7: reserved

  • s8: Basic Edition sampling

  • s9: sampling for failed requests

  • s10: sampling for slow requests

  • s11: sampling for abnormal calls

Head-based sampling policies

ARMS supports two head-based sampling policies: fixed-rate sampling and adaptive sampling. Fixed-rate sampling is the most common head-based trace sampling policy. Adaptive sampling is a cost-effective head-based sampling policy developed by ARMS.

Fixed-rate sampling

Traces are sampled based on the specified sampling rate at the ingress service. Spans that are sampled carry an attribute whose key is sample.reason and value is s4.image

To configure a fixed-rate sampling policy, perform the following steps:

  1. Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Application List.

  2. On the Application List page, select a region in the top navigation bar and click the name of the application that you want to manage.

    Note

    If the Java图标 icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.

  3. In the top navigation bar, choose Configuration > Custom Configurations.

  4. In the Sampling Settings section, you can set a sampling rate. Set the Sampling strategy parameter to Fixed sampling rate. In the Sample Rate Percentage field, enter a percent value. For example, if you enter 10, the sampling rate is 10%.

    Note

    The modifications take effect immediately. You do not need to restart the application. The default value is 10. If you increase the sampling rate, additional system resources are consumed. We recommend that you keep the default value.

  5. Click Save.

Adaptive sampling

The traffic of different business may vary greatly. The interface reading traffic is often excessively larger than the writing traffic, whereas the trace data related to interface writing is more significant than the trace data related to interface reading. To prevent imbalance in the sampling between the significant trace data and the trace data that is less significant, ARMS provides adaptive sampling. Traces of 1,000 interfaces with the most requests are separately sampled based on the Least Frequently Used (LFU) algorithm. 10 traces are sampled for each of these traces per minute, and 10 traces are sampled for all other interfaces per minute. Spans that are sampled carry an attribute whose key is sample.reason and value is s6.image

To configure an adaptive sampling policy, perform the following steps:

  1. Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Application List.

  2. On the Application List page, select a region in the top navigation bar and click the name of the application that you want to manage.

    Note

    If the Java图标 icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.

  3. In the top navigation bar, choose Configuration > Custom Configurations.

  4. In the Sampling Settings section, set the Sampling strategy parameter to Adaptive Sampling.

    Note

    The modifications take effect immediately. You do not need to restart the application.

  5. Click Save.

Non-head based sampling policies

Head-based sampling may be triggered at any span in a trace and cannot guarantee the integrity of the trace. You may be unable to sample all significant trace data that you care about, such as spans related to slow or failed requests, or spans that are infrequent or user-defined.

Minimum sampling for all interfaces

The traces of each interface are automatically sampled at least once in a minute. Spans that are sampled carry an attribute whose key is sample.reason and value is s2.

image

Sampling for failed or slow requests

Important

Before you sample traces for failed or slow requests, go to the application details page, choose Configuration > Custom Configurations from the top navigation bar, and then turn on the Call chain compression switch in the Advanced Settings section. The switch is turned on by default.

If a request meets one of the following conditions, the relevant traces are automatically sampled.

  • For an HTTP interface, a status code other than 200 is returned. For other interfaces, exceptions are thrown by the methods used for instrumentation.

  • An exception occurs during the internal execution of the interface, and is not thrown to the ingress service of the framework.

  • The duration of the request is longer than the 99th percentile of the historical request duration of the same interface. To enable percentiles, go to the application details page, choose Configuration > Custom Configurations from the top navigation bar, and then turn on the Quantile Statistics switch in the Advanced Settings section.

    Note

    If quantiles are not enabled, traces whose values are greater than the specified thresholds are sampled.

Spans that are sampled carry an attribute whose key is sample.reason and value is s9, s11, or s10. The specific value depends on which condition is met.

image.png

Custom sampling

You can specify names, prefixes, or suffixes to specify the interfaces whose traces you want to completely sample. Spans that are sampled carry an attribute whose key is sample.reason and value is s3.

image.png

To configure a custom sampling policy, perform the following steps:

  1. Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Application List.

  2. On the Application List page, select a region in the top navigation bar and click the name of the application that you want to manage.

    Note

    If the Java图标 icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.

  3. In the top navigation bar, choose Configuration > Custom Configurations.

  4. In the Sampling Settings section, specify the interface names, prefixes, or suffixes.

    Note

    The modifications take effect immediately. You do not need to restart the application.

  5. Click Save.

Flowchart

Take a trace that is generated among the A, B and C services as an example. The preceding sampling policies determine whether the spans are sampled. The following flowchart describes how sampling decisions are made. Each decision needs to be made when the request is at A, B, or C, and whether the current span is a local root span or a root span.

image

The flowchart uses the following colors:

  • Purple: indicates head-based sampling, which is triggered only at the root span of the trace. Only one sampling decision is made at A.

  • Blue: triggers sampling at any span in the trace if head-based sampling is not triggered. Assume that A decides not to sample. When the request is at B, B decides whether to implement custom sampling, minimum sampling, or neither. If the sampling is implemented, the attributes attached to the spans are passed on to C. Three sampling decisions are made at A, B, and C.

  • Green: triggers sampling at any span in the trace if head-based sampling, custom sampling, and minimum sampling are not triggered. Assume that A decides not to sample. When the request is at B, B decides whether the request is slow or has failed, and whether to implement sampling. If the sampling is implemented, the attributes attached to the spans are not passed on to C. Three sampling decisions are made at A, B, and C.

References

After traces are sampled, you can configure filter conditions and aggregation dimensions to analyze the trace data in real time. For more information, see Trace analysis.