When individual interfaces lack dedicated traffic protection rules, unexpected traffic surges can destabilize your microservice applications. System protection provides node-level safeguards that cover all interfaces on a node, acting as a baseline safety net to maintain application stability.
Microservices Governance offers five system protection types:
| Protection type | Monitors | Applies to | Agent version |
|---|---|---|---|
| Adaptive overload protection | CPU utilization | All server interfaces | V3.1.4+ |
| Total QPS throttling | Sum of QPS across all interfaces on a node | All server interfaces | 4.2.0+ |
| Total concurrency throttling | Sum of concurrent requests across all interfaces on a node | All server interfaces | 4.2.0+ |
| Abnormal call circuit breaking | Error percentage per client interface | All client interfaces | 4.2.0+ |
| Slow call circuit breaking | Slow call percentage per client interface | All client interfaces | 4.2.0+ |
All system protection rules have lower priority than interface-level traffic protection rules. When throttling or circuit breaking triggers, the system returns HTTP status code 429. For a detailed comparison, see System protection vs. traffic protection.
Choose the right protection type
Use this decision matrix to identify which protection types fit your scenario:
| Scenario | Recommended protection | Why |
|---|---|---|
| CPU-sensitive application with traffic-driven load | Adaptive overload protection | Dynamically adjusts throttling based on CPU utilization gap |
| Performance limited by memory, network, or other factors (not CPU) | Total QPS throttling | Caps request rate regardless of CPU load |
| High response times (typically over 1 second) causing request queuing | Total QPS throttling + Total concurrency throttling | QPS throttling alone cannot prevent queue buildup when requests take longer than 1 second to complete |
| Downstream services returning frequent errors | Abnormal call circuit breaking | Enables fast failure to prevent request queuing in the caller |
| Downstream services responding slowly but not timing out | Slow call circuit breaking | Detects slow responses independently of timeout settings |
Best practice: Start with system protection as a baseline safety net, then add interface-level traffic protection rules to minimize unnecessary throttling on individual interfaces.
Configure system protection rules
Before you begin, make sure that you have:
Microservices Governance Enterprise Edition activated. For more information, see Activate Microservices Governance
Microservices Governance enabled for your application. For more information, see Enable Microservices Governance for Java microservice applications in an ACK or ACS cluster and Enable Microservices Governance for microservice applications on ECS instances
To configure system protection:
Log on to the MSE console and select a region in the top navigation bar.
In the left-side navigation pane, choose Microservices Governance > Application Governance.
On the Application list page, click the resource card of the target application. In the left-side navigation pane, click Traffic management.
Click the System Protection tab and configure the protection types described in the following sections.
Adaptive overload protection
Requires agent V3.1.4 or later.
How it works
Adaptive overload protection uses CPU utilization to gauge system load. When CPU usage approaches the configured threshold, the system adaptively adjusts the throttling percentage of server traffic -- rejecting a portion of incoming requests to keep CPU utilization within a small range around the threshold.
This protection takes effect on all server interfaces.
When to use it
Use adaptive overload protection for CPU-sensitive applications where unexpected traffic surges directly increase CPU load and degrade response times.
To determine a threshold, run stress tests or analyze historical data to identify the maximum CPU utilization during steady-state operation, then set a slightly higher value.
Console layout
The Adaptive Overload Protection section has two panels:
Left panel -- Lists adaptive overload protection events. Events are generated when throttling starts, activates, and ends. Click View in the Actions column to inspect the CPU utilization for a specific node IP and replay data from the reporting interval.
Right panel -- Shows the average CPU utilization trend for each application node over the previous 5 minutes.
Parameters
| Parameter | Description |
|---|---|
| ON | Close: Disabled. Simulated Execution: Events are generated when protection triggers, but traffic is not throttled. Open: Protection triggers and throttles a percentage of ingress traffic. |
| vCPU Utilization | Target CPU utilization threshold. The system adaptively adjusts throttling probability based on the gap between actual and target utilization. |
| Exception Settings | Interfaces excluded from this rule. See Exception settings. |
Total QPS throttling
Requires agent 4.2.0 or later.
How it works
Total QPS throttling measures the aggregate queries per second (QPS) across all server interfaces on a single node. If the total QPS exceeds the configured threshold, incoming requests are throttled.
This protection takes effect on all server interfaces.
When to use it
Use total QPS throttling when application performance depends on factors other than CPU utilization -- for example, memory, network, or other factors. In these cases, CPU-based adaptive protection alone may not trigger even though the application is degraded.
To determine a threshold, run stress tests or analyze historical data to identify the total QPS during steady-state operation, then set a slightly higher value.
Console layout
The Total QPS Throttling section has two panels:
Left panel -- Lists throttling events. Events are reported for nodes and interfaces that were throttled within the previous 5 minutes, at a 5-minute reporting interval. Click View to inspect the total QPS for a specific node IP and replay data from the reporting interval.
Right panel -- Shows the average total QPS trend for each application node over the previous 5 minutes.
Parameters
| Parameter | Description |
|---|---|
| ON | Close: Disabled. Enable: Throttles requests when the total QPS exceeds the threshold. |
| Total QPS Threshold | Maximum total QPS allowed on a single node. |
| Exception Settings | Interfaces excluded from this rule. See Exception settings. |
Total concurrency throttling
Requires agent 4.2.0 or later.
How it works
Total concurrency throttling measures the aggregate number of concurrent requests across all server interfaces on a single node. If the total concurrency exceeds the configured threshold, incoming requests are throttled.
This protection takes effect on all server interfaces.
When to use it
Use total concurrency throttling alongside total QPS throttling, especially when interface response times (RT) are high (typically over 1 second). In high-RT scenarios, if system resources such as thread pools, memory resources, and connection pools are occupied, requests are queued and interface RT increases. QPS throttling alone has a limitation: even a small number of new requests per second can accumulate because queued requests take longer than one second to complete. This causes request queuing and inflates RT for both existing and new requests.
Concurrency throttling addresses this by rejecting new requests while queued requests are still processing. After the queue drains, subsequent requests are admitted and processed with shorter wait times -- significantly improving both success rates and average RT.
To determine a threshold, run stress tests or analyze historical data to identify the total concurrency during steady-state operation, then set a slightly higher value.
Console layout
The Total Concurrency Throttling section has two panels:
Left panel -- Lists throttling events. Events are reported for nodes and interfaces that were throttled within the previous 5 minutes, at a 5-minute reporting interval. Click View to inspect the total concurrency for a specific node IP and replay data from the reporting interval.
Right panel -- Shows the average total concurrency trend for each application node over the previous 5 minutes.
Parameters
| Parameter | Description |
|---|---|
| ON | Close: Disabled. Enable: Throttles requests when total concurrency exceeds the threshold. |
| Total Concurrency Threshold | Maximum number of concurrent requests allowed on a single node. |
| Exception Settings | Interfaces excluded from this rule. See Exception settings. |
Abnormal call circuit breaking
Requires agent 4.2.0 or later.
How it works
Abnormal call circuit breaking monitors the error percentage of each client interface. When the error percentage exceeds the configured threshold, the circuit breaker opens for that interface. During the circuit breaking period, all requests to the interface fail immediately. The system sends probe requests at regular intervals, and if a probe succeeds, the circuit breaker closes and normal traffic resumes.
This protection applies to all client interfaces, except those that already have interface-level circuit breaking rules configured.
When to use it
Abnormal call circuit breaking handles two scenarios:
Timeout scenarios -- Frequent timeouts on a client interface usually indicate that the service provider is experiencing issues. Without circuit breaking, requests queue up and eventually affect other interfaces in the calling application. Circuit breaking enables fast failure, preventing request queuing.
Non-timeout error scenarios -- Frequent non-timeout errors on a client interface. Circuit breaking allows the system to report relevant errors for user handling, minimizing the impact of the issues and optimizing the user experience when the issues occur.
Console layout
The Abnormal Call Circuit Breaking section has two panels:
Left panel -- Lists circuit breaking events reported within the previous 5 minutes, at a 5-minute reporting interval.
Right panel -- Shows the top 10 interfaces with the highest abnormal call percentage over the previous 5 minutes.
Parameters
| Parameter | Description |
|---|---|
| ON | Close: Disabled. Enable: Triggers circuit breaking when the abnormal call percentage exceeds the threshold. |
| Circuit Breaking Percentage Threshold (%) | Error percentage that triggers circuit breaking on an interface. |
| Exception Settings | Interfaces excluded from this rule. See Exception settings. |
Advanced settings
| Parameter | Description |
|---|---|
| Statistics Window Duration (s) | Length of the statistics window. Valid range: 1 second to 120 minutes. |
| Circuit Breaking Duration (s) | Duration of the circuit breaking period. During this period, all requests to the affected interface fail immediately. |
| Minimum number of requests | Minimum number of requests required within the statistics window to trigger circuit breaking. If the request count is below this value, circuit breaking does not trigger even if the error percentage exceeds the threshold. |
| Fuse recovery strategy | Controls how the circuit breaker recovers after the circuit breaking period ends. Single detection recovery -- The circuit breaker tests the next request after the circuit breaking period. If the request succeeds (no slow call or abnormal call), the circuit breaker closes. Otherwise, circuit breaking triggers again. Progressive recovery -- Requires the Number of recovery phases and Minimum number of passes per step parameters. The circuit breaker gradually increases the percentage of allowed requests across multiple stages. See Progressive recovery. |
Progressive recovery
After the circuit breaking period ends, the circuit breaker steps through recovery stages, gradually increasing the percentage of requests allowed to pass.
How the percentage is calculated:
Request percentage per stage = 100 / Number of recovery stages (N)
Stage 1 allows T% of requests
Stage 2 allows 2T% of requests
This continues until 100% of requests are allowed
At each stage, a check triggers when the number of requests reaches the Minimum number of passes per step value. If the error percentage stays below the threshold, the circuit breaker advances to the next stage. If the error percentage exceeds the threshold, circuit breaking triggers again.
Example: With 3 recovery stages and a minimum of 5 passes per step:
| Stage | Allowed requests | Check condition |
|---|---|---|
| 1 | 33% | After 5 or more requests |
| 2 | 67% | After 5 or more requests |
| 3 | 100% | Full recovery |
If fewer than 5 requests arrive during a stage, the system advances to the next stage without checking.
Slow call circuit breaking
Requires agent 4.2.0 or later.
How it works
Slow call circuit breaking monitors the slow call percentage of each client interface. A call is classified as slow when its response time exceeds the configured Slow Call RT threshold. When the slow call percentage exceeds the Degradation Threshold, the circuit breaker opens for that interface. During the circuit breaking period, all requests fail immediately. Probe requests are sent at regular intervals, and if a probe succeeds, the circuit breaker closes.
This protection applies to all client interfaces, except those that already have interface-level circuit breaking rules configured.
When to use it
Use slow call circuit breaking in timeout-prone scenarios where abnormal call circuit breaking may also apply. Unlike abnormal call circuit breaking, slow call circuit breaking allows dynamic adjustment of the RT threshold that defines a "slow call," independent of timeout settings.
Console layout
The Slow Call Circuit Breaking section has two panels:
Left panel -- Lists circuit breaking events reported for slow calls within the previous 5 minutes, at a 5-minute reporting interval.
Right panel -- Shows the top 10 average RT values across application interfaces over the previous 5 minutes.
Parameters
| Parameter | Description |
|---|---|
| ON | Close: Disabled. Enable: Classifies calls as slow when their RT exceeds the configured threshold. |
| Slow Call RT (ms) | RT threshold in milliseconds. Calls with RT above this value are classified as slow calls. |
| Degradation Threshold (%) | Percentage of slow calls that triggers circuit breaking. |
| Exception Settings | Interfaces excluded from this rule. See Exception settings. |
Advanced settings
| Parameter | Description |
|---|---|
| Statistics Window Duration (s) | Length of the statistics window. Valid range: 1 second to 120 minutes. |
| Circuit Breaking Duration (s) | Duration of the circuit breaking period. During this period, all requests to the affected interface fail immediately. |
| Minimum number of requests | Minimum number of requests required within the statistics window to trigger circuit breaking. If the request count is below this value, circuit breaking does not trigger even if the slow call percentage exceeds the threshold. |
| Fuse recovery strategy | Controls how the circuit breaker recovers after the circuit breaking period ends. Single detection recovery and Progressive recovery. For details, see Progressive recovery. |
Exception settings
Requires agent 4.2.0 or later.
Exception settings let you exclude specific interfaces from all system protection rules. Requests on excluded interfaces pass through without rule checking.
When to use them
Configure exception settings for:
Health check interfaces -- Prevent system protection rules from throttling health checks, which could affect the health status of nodes.
Key interfaces of the system -- Key interfaces that have separate throttling limits imposed should not be subject to the system-wide throttling mechanism.
How to configure
In the exception settings dialog:
The Available Interfaces section on the left lists recently called interfaces. If the target interface is not listed, enter its name in the search box and click the search icon.
Add the interface to the Selected Interfaces section on the right.
System protection vs. traffic protection
Both system protection and traffic protection keep applications stable, but they operate at different levels:
| Dimension | System protection | Traffic protection |
|---|---|---|
| Scope | Node-level. Same rules apply to all interfaces of an application. | Interface-level. Different thresholds per interface. |
| Granularity | Coarse-grained -- protects the node as a whole. | Fine-grained -- protects individual interfaces based on importance and load characteristics. |
| Configuration effort | Low. A few thresholds cover the entire application. | Higher. Requires per-interface threshold tuning. |
| Traffic loss | Higher. Throttling applies broadly across all interfaces. | Lower. Only the interfaces that exceed their individual thresholds are throttled. |
| Best for | Baseline safety net against unexpected surges. | Production-grade protection with minimized traffic loss. |
Both system protection and traffic protection return HTTP status code 429 when throttling triggers. Custom status codes are not supported.