All Products
Search
Document Center

Microservices Engine:Configure system protection

Last Updated:Mar 10, 2026

When individual interfaces lack dedicated traffic protection rules, unexpected traffic surges can destabilize your microservice applications. System protection provides node-level safeguards that cover all interfaces on a node, acting as a baseline safety net to maintain application stability.

Microservices Governance offers five system protection types:

Protection typeMonitorsApplies toAgent version
Adaptive overload protectionCPU utilizationAll server interfacesV3.1.4+
Total QPS throttlingSum of QPS across all interfaces on a nodeAll server interfaces4.2.0+
Total concurrency throttlingSum of concurrent requests across all interfaces on a nodeAll server interfaces4.2.0+
Abnormal call circuit breakingError percentage per client interfaceAll client interfaces4.2.0+
Slow call circuit breakingSlow call percentage per client interfaceAll client interfaces4.2.0+

All system protection rules have lower priority than interface-level traffic protection rules. When throttling or circuit breaking triggers, the system returns HTTP status code 429. For a detailed comparison, see System protection vs. traffic protection.

Choose the right protection type

Use this decision matrix to identify which protection types fit your scenario:

ScenarioRecommended protectionWhy
CPU-sensitive application with traffic-driven loadAdaptive overload protectionDynamically adjusts throttling based on CPU utilization gap
Performance limited by memory, network, or other factors (not CPU)Total QPS throttlingCaps request rate regardless of CPU load
High response times (typically over 1 second) causing request queuingTotal QPS throttling + Total concurrency throttlingQPS throttling alone cannot prevent queue buildup when requests take longer than 1 second to complete
Downstream services returning frequent errorsAbnormal call circuit breakingEnables fast failure to prevent request queuing in the caller
Downstream services responding slowly but not timing outSlow call circuit breakingDetects slow responses independently of timeout settings

Best practice: Start with system protection as a baseline safety net, then add interface-level traffic protection rules to minimize unnecessary throttling on individual interfaces.

Configure system protection rules

Before you begin, make sure that you have:

To configure system protection:

  1. Log on to the MSE console and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Microservices Governance > Application Governance.

  3. On the Application list page, click the resource card of the target application. In the left-side navigation pane, click Traffic management.

  4. Click the System Protection tab and configure the protection types described in the following sections.

Adaptive overload protection

Note

Requires agent V3.1.4 or later.

How it works

Adaptive overload protection uses CPU utilization to gauge system load. When CPU usage approaches the configured threshold, the system adaptively adjusts the throttling percentage of server traffic -- rejecting a portion of incoming requests to keep CPU utilization within a small range around the threshold.

This protection takes effect on all server interfaces.

When to use it

Use adaptive overload protection for CPU-sensitive applications where unexpected traffic surges directly increase CPU load and degrade response times.

To determine a threshold, run stress tests or analyze historical data to identify the maximum CPU utilization during steady-state operation, then set a slightly higher value.

Console layout

The Adaptive Overload Protection section has two panels:

  • Left panel -- Lists adaptive overload protection events. Events are generated when throttling starts, activates, and ends. Click View in the Actions column to inspect the CPU utilization for a specific node IP and replay data from the reporting interval.

  • Right panel -- Shows the average CPU utilization trend for each application node over the previous 5 minutes.

Parameters

ParameterDescription
ONClose: Disabled. Simulated Execution: Events are generated when protection triggers, but traffic is not throttled. Open: Protection triggers and throttles a percentage of ingress traffic.
vCPU UtilizationTarget CPU utilization threshold. The system adaptively adjusts throttling probability based on the gap between actual and target utilization.
Exception SettingsInterfaces excluded from this rule. See Exception settings.

Total QPS throttling

Note

Requires agent 4.2.0 or later.

How it works

Total QPS throttling measures the aggregate queries per second (QPS) across all server interfaces on a single node. If the total QPS exceeds the configured threshold, incoming requests are throttled.

This protection takes effect on all server interfaces.

When to use it

Use total QPS throttling when application performance depends on factors other than CPU utilization -- for example, memory, network, or other factors. In these cases, CPU-based adaptive protection alone may not trigger even though the application is degraded.

To determine a threshold, run stress tests or analyze historical data to identify the total QPS during steady-state operation, then set a slightly higher value.

Console layout

The Total QPS Throttling section has two panels:

  • Left panel -- Lists throttling events. Events are reported for nodes and interfaces that were throttled within the previous 5 minutes, at a 5-minute reporting interval. Click View to inspect the total QPS for a specific node IP and replay data from the reporting interval.

  • Right panel -- Shows the average total QPS trend for each application node over the previous 5 minutes.

Parameters

ParameterDescription
ONClose: Disabled. Enable: Throttles requests when the total QPS exceeds the threshold.
Total QPS ThresholdMaximum total QPS allowed on a single node.
Exception SettingsInterfaces excluded from this rule. See Exception settings.

Total concurrency throttling

Note

Requires agent 4.2.0 or later.

How it works

Total concurrency throttling measures the aggregate number of concurrent requests across all server interfaces on a single node. If the total concurrency exceeds the configured threshold, incoming requests are throttled.

This protection takes effect on all server interfaces.

When to use it

Use total concurrency throttling alongside total QPS throttling, especially when interface response times (RT) are high (typically over 1 second). In high-RT scenarios, if system resources such as thread pools, memory resources, and connection pools are occupied, requests are queued and interface RT increases. QPS throttling alone has a limitation: even a small number of new requests per second can accumulate because queued requests take longer than one second to complete. This causes request queuing and inflates RT for both existing and new requests.

Concurrency throttling addresses this by rejecting new requests while queued requests are still processing. After the queue drains, subsequent requests are admitted and processed with shorter wait times -- significantly improving both success rates and average RT.

To determine a threshold, run stress tests or analyze historical data to identify the total concurrency during steady-state operation, then set a slightly higher value.

Console layout

The Total Concurrency Throttling section has two panels:

  • Left panel -- Lists throttling events. Events are reported for nodes and interfaces that were throttled within the previous 5 minutes, at a 5-minute reporting interval. Click View to inspect the total concurrency for a specific node IP and replay data from the reporting interval.

  • Right panel -- Shows the average total concurrency trend for each application node over the previous 5 minutes.

Parameters

ParameterDescription
ONClose: Disabled. Enable: Throttles requests when total concurrency exceeds the threshold.
Total Concurrency ThresholdMaximum number of concurrent requests allowed on a single node.
Exception SettingsInterfaces excluded from this rule. See Exception settings.

Abnormal call circuit breaking

Note

Requires agent 4.2.0 or later.

How it works

Abnormal call circuit breaking monitors the error percentage of each client interface. When the error percentage exceeds the configured threshold, the circuit breaker opens for that interface. During the circuit breaking period, all requests to the interface fail immediately. The system sends probe requests at regular intervals, and if a probe succeeds, the circuit breaker closes and normal traffic resumes.

This protection applies to all client interfaces, except those that already have interface-level circuit breaking rules configured.

When to use it

Abnormal call circuit breaking handles two scenarios:

  • Timeout scenarios -- Frequent timeouts on a client interface usually indicate that the service provider is experiencing issues. Without circuit breaking, requests queue up and eventually affect other interfaces in the calling application. Circuit breaking enables fast failure, preventing request queuing.

  • Non-timeout error scenarios -- Frequent non-timeout errors on a client interface. Circuit breaking allows the system to report relevant errors for user handling, minimizing the impact of the issues and optimizing the user experience when the issues occur.

Console layout

The Abnormal Call Circuit Breaking section has two panels:

  • Left panel -- Lists circuit breaking events reported within the previous 5 minutes, at a 5-minute reporting interval.

  • Right panel -- Shows the top 10 interfaces with the highest abnormal call percentage over the previous 5 minutes.

Parameters

ParameterDescription
ONClose: Disabled. Enable: Triggers circuit breaking when the abnormal call percentage exceeds the threshold.
Circuit Breaking Percentage Threshold (%)Error percentage that triggers circuit breaking on an interface.
Exception SettingsInterfaces excluded from this rule. See Exception settings.

Advanced settings

ParameterDescription
Statistics Window Duration (s)Length of the statistics window. Valid range: 1 second to 120 minutes.
Circuit Breaking Duration (s)Duration of the circuit breaking period. During this period, all requests to the affected interface fail immediately.
Minimum number of requestsMinimum number of requests required within the statistics window to trigger circuit breaking. If the request count is below this value, circuit breaking does not trigger even if the error percentage exceeds the threshold.
Fuse recovery strategyControls how the circuit breaker recovers after the circuit breaking period ends. Single detection recovery -- The circuit breaker tests the next request after the circuit breaking period. If the request succeeds (no slow call or abnormal call), the circuit breaker closes. Otherwise, circuit breaking triggers again. Progressive recovery -- Requires the Number of recovery phases and Minimum number of passes per step parameters. The circuit breaker gradually increases the percentage of allowed requests across multiple stages. See Progressive recovery.

Progressive recovery

After the circuit breaking period ends, the circuit breaker steps through recovery stages, gradually increasing the percentage of requests allowed to pass.

How the percentage is calculated:

Request percentage per stage = 100 / Number of recovery stages (N)

  • Stage 1 allows T% of requests

  • Stage 2 allows 2T% of requests

  • This continues until 100% of requests are allowed

At each stage, a check triggers when the number of requests reaches the Minimum number of passes per step value. If the error percentage stays below the threshold, the circuit breaker advances to the next stage. If the error percentage exceeds the threshold, circuit breaking triggers again.

Example: With 3 recovery stages and a minimum of 5 passes per step:

StageAllowed requestsCheck condition
133%After 5 or more requests
267%After 5 or more requests
3100%Full recovery

If fewer than 5 requests arrive during a stage, the system advances to the next stage without checking.

Slow call circuit breaking

Note

Requires agent 4.2.0 or later.

How it works

Slow call circuit breaking monitors the slow call percentage of each client interface. A call is classified as slow when its response time exceeds the configured Slow Call RT threshold. When the slow call percentage exceeds the Degradation Threshold, the circuit breaker opens for that interface. During the circuit breaking period, all requests fail immediately. Probe requests are sent at regular intervals, and if a probe succeeds, the circuit breaker closes.

This protection applies to all client interfaces, except those that already have interface-level circuit breaking rules configured.

When to use it

Use slow call circuit breaking in timeout-prone scenarios where abnormal call circuit breaking may also apply. Unlike abnormal call circuit breaking, slow call circuit breaking allows dynamic adjustment of the RT threshold that defines a "slow call," independent of timeout settings.

Console layout

The Slow Call Circuit Breaking section has two panels:

  • Left panel -- Lists circuit breaking events reported for slow calls within the previous 5 minutes, at a 5-minute reporting interval.

  • Right panel -- Shows the top 10 average RT values across application interfaces over the previous 5 minutes.

Parameters

ParameterDescription
ONClose: Disabled. Enable: Classifies calls as slow when their RT exceeds the configured threshold.
Slow Call RT (ms)RT threshold in milliseconds. Calls with RT above this value are classified as slow calls.
Degradation Threshold (%)Percentage of slow calls that triggers circuit breaking.
Exception SettingsInterfaces excluded from this rule. See Exception settings.

Advanced settings

ParameterDescription
Statistics Window Duration (s)Length of the statistics window. Valid range: 1 second to 120 minutes.
Circuit Breaking Duration (s)Duration of the circuit breaking period. During this period, all requests to the affected interface fail immediately.
Minimum number of requestsMinimum number of requests required within the statistics window to trigger circuit breaking. If the request count is below this value, circuit breaking does not trigger even if the slow call percentage exceeds the threshold.
Fuse recovery strategyControls how the circuit breaker recovers after the circuit breaking period ends. Single detection recovery and Progressive recovery. For details, see Progressive recovery.

Exception settings

Note

Requires agent 4.2.0 or later.

Exception settings let you exclude specific interfaces from all system protection rules. Requests on excluded interfaces pass through without rule checking.

When to use them

Configure exception settings for:

  • Health check interfaces -- Prevent system protection rules from throttling health checks, which could affect the health status of nodes.

  • Key interfaces of the system -- Key interfaces that have separate throttling limits imposed should not be subject to the system-wide throttling mechanism.

How to configure

In the exception settings dialog:

  1. The Available Interfaces section on the left lists recently called interfaces. If the target interface is not listed, enter its name in the search box and click the search icon.

  2. Add the interface to the Selected Interfaces section on the right.

System protection vs. traffic protection

Both system protection and traffic protection keep applications stable, but they operate at different levels:

DimensionSystem protectionTraffic protection
ScopeNode-level. Same rules apply to all interfaces of an application.Interface-level. Different thresholds per interface.
GranularityCoarse-grained -- protects the node as a whole.Fine-grained -- protects individual interfaces based on importance and load characteristics.
Configuration effortLow. A few thresholds cover the entire application.Higher. Requires per-interface threshold tuning.
Traffic lossHigher. Throttling applies broadly across all interfaces.Lower. Only the interfaces that exceed their individual thresholds are throttled.
Best forBaseline safety net against unexpected surges.Production-grade protection with minimized traffic loss.
Note

Both system protection and traffic protection return HTTP status code 429 when throttling triggers. Custom status codes are not supported.

See also