When a downstream service responds slowly or returns errors at a high rate, the failures can cascade and degrade your entire application. Circuit breaking in Microservices Engine (MSE) monitors these downstream dependencies and automatically stops sending requests when a configured threshold is exceeded. After a cooldown period, MSE probes recovery by gradually allowing requests through again.
Circuit breaking targets weak dependencies -- downstream services your application can tolerate being temporarily unavailable. For critical dependencies where failures must propagate immediately, use retry or fallback strategies instead.
How it works
A circuit breaker transitions between three states:
Closed (normal operation) -- All requests pass through. MSE tracks metrics (slow call ratio or error ratio) within the configured time window.
Open (circuit tripped) -- When the tracked metric exceeds the configured threshold and the minimum request count is met, MSE blocks all requests for the configured duration. Blocked requests fail rather than waiting for a timeout.
Half-open (recovery probe) -- After the circuit breaking duration elapses, MSE allows the next request through as a test:
If the test request succeeds, the circuit breaker returns to Closed.
If the test request fails, the circuit breaker returns to Open for another full duration.
MSE also supports progressive recovery, which gradually increases the percentage of allowed requests across multiple stages instead of relying on a single test request.
Prerequisites
Before you begin, make sure you have:
MSE Enterprise Edition activated
Microservices Governance enabled for your applications. For setup instructions, see:
Create a circuit breaking rule
-
Log on to the MSE console, and select a region in the top navigation bar.
-
In the left-side navigation pane, choose Microservices Governance > Application Governance.
On the Application list page, click the target application. Then open the Add Circuit Breaking Rule dialog box using either path:
Path A: In the left-side navigation pane, click API Details. On the WEB service tab, click the Client tab. Select an interface, click the Circuit Breaking tab, and then click Added fusing rule.
Path B: In the left-side navigation pane, click Traffic management. Click the Flow protection tab, then click the Fuse rule tab, and then click Added fusing rule.
In the Add Circuit Breaking Rule dialog box, configure the following parameters and click New.
Rule parameters
| Parameter | Description | Valid values |
|---|---|---|
| Interface name | The interface to protect with this rule. | -- |
| Statistical window duration | The length of the time window used to calculate metrics. | 1 second to 120 minutes |
| Minimum number of requests | The minimum number of requests required within the time window before circuit breaking can trigger. If the actual request count is below this value, the circuit breaker stays closed regardless of the metric ratio. | Positive integer |
| Threshold Type | The metric used to evaluate whether to trip the circuit breaker. | Slow call ratio (%), Abnormal proportion (%) |
| Slow call RT | The response time threshold (in milliseconds) that defines a slow call. Requests that take longer than this value are counted as slow calls. Applies only when Threshold Type is set to Slow call ratio (%). | Positive integer (ms) |
| Circuit Breaking Ratio Threshold | The percentage threshold that triggers circuit breaking. For slow call ratio, this is the percentage of slow calls. For abnormal proportion, this is the percentage of abnormal requests. | 0--100 (%) |
| Circuit Breaking Duration (s) | How long the circuit breaker stays open. During this period, all requests to the protected interface fail. | Positive integer (seconds) |
| Circuit Breaking Policy | The recovery strategy after the circuit breaking duration elapses. | Single detection recovery, Progressive recovery |
Note: The Statistical window duration and Minimum number of requests parameters work together. For example, setting a 1-second window with a minimum of 10 requests means low-traffic interfaces (fewer than 10 requests per second) never trigger the circuit breaker. For low-traffic interfaces, use a longer window with a lower minimum request count.
Recovery policies
Single detection recovery
After the circuit breaking duration elapses, MSE allows the next request through as a test probe:
If the request succeeds (response time is under the slow call RT threshold, or no error occurs), the circuit breaker closes and normal traffic resumes.
If the request fails, the circuit breaker reopens for another full duration.
Progressive recovery
Instead of a single test request, MSE gradually increases the traffic ratio across multiple recovery stages. This reduces the risk of re-tripping the circuit breaker immediately after recovery.
Configure two additional parameters:
| Parameter | Description |
|---|---|
| Number of recovery phases | The number of stages used to ramp traffic back to 100%. |
| Minimum number of passes per step | The minimum number of requests in each stage before MSE evaluates whether to proceed to the next stage. |
The allowed traffic percentage in each stage is calculated as:
Traffic ratio = 100% / Number of recovery phases
For example, with 3 recovery phases and a minimum of 5 requests per stage:
| Stage | Allowed traffic | Behavior |
|---|---|---|
| 1 | 33% | MSE allows 33% of requests through. After at least 5 requests pass, MSE checks the metric. If below the threshold, proceed to Stage 2. If above, reopen the circuit breaker. |
| 2 | 67% | MSE allows 67% of requests through. Same evaluation as Stage 1. |
| 3 | 100% | All requests pass through. The circuit breaker fully closes. |
If the number of requests in a stage is less than the minimum number of passes per step, the system enters the next recovery stage until all requests are allowed to pass.
Examples
Slow call circuit breaking
Scenario: Your application calls a third-party service that occasionally responds slowly, causing timeouts that degrade your application's performance.
Configuration:
| Parameter | Value | Rationale |
|---|---|---|
| Interface name | test | The interface calling the slow downstream service. |
| Statistical window duration | 1 second | Detect degradation quickly. |
| Minimum number of requests | 10 | Avoid triggering on low-traffic interfaces. |
| Threshold Type | Slow call ratio (%) | Monitor response time, not errors. |
| Slow call RT | 1000 ms | Requests exceeding 1 second are slow calls. |
| Circuit Breaking Ratio Threshold | 80% | Trip when 80% of requests are slow. |
| Circuit Breaking Duration (s) | 10 seconds | Block all requests for 10 seconds. |
| Circuit Breaking Policy | Single detection recovery | Test one request after each circuit breaking period. |
Behavior: If more than 10 requests arrive within 1 second and over 80% take longer than 1000 ms, the circuit breaker opens. All requests to the test interface fail for the next 10 seconds. After 10 seconds, MSE lets one request through. If it completes under 1000 ms, the circuit breaker closes. Otherwise, it reopens for another 10 seconds.
Abnormal request circuit breaking
Scenario: Your application displays content from a third-party service. When abnormal requests spike, the degraded content harms user experience.
Configuration:
| Parameter | Value | Rationale |
|---|---|---|
| Interface name | test | The interface calling the abnormal downstream service. |
| Statistical window duration | 1 second | Detect abnormal request spikes quickly. |
| Minimum number of requests | 10 | Avoid triggering on low-traffic interfaces. |
| Threshold Type | Abnormal proportion (%) | Monitor abnormal request ratio. |
| Circuit Breaking Ratio Threshold | 80% | Trip when 80% of requests are abnormal. |
| Circuit Breaking Duration (s) | 10 seconds | Block all requests for 10 seconds. |
| Circuit Breaking Policy | Single detection recovery | Test one request after each circuit breaking period. |
Behavior: If more than 10 requests arrive within 1 second and over 80% are abnormal, the circuit breaker opens. All requests to the test interface fail for the next 10 seconds. After 10 seconds, MSE lets one request through. If it succeeds, the circuit breaker closes. Otherwise, it reopens for another 10 seconds.
Best practices
Choose the right threshold type
Use Slow call ratio when response time degradation is your primary concern, such as synchronous API calls where latency directly affects user experience.
Use Abnormal proportion when downstream errors are the primary risk, such as calls to unreliable third-party services.
Set appropriate window and threshold values
Statistical window: Shorter windows (1--10 seconds) detect issues faster but are more sensitive to traffic bursts. Longer windows (1--5 minutes) provide more stable signals for lower-traffic interfaces.
Minimum number of requests: Set this high enough to avoid false positives. For low-traffic interfaces, a minimum of 5--10 requests with a longer window is typically more reliable than 10 requests per second.
Choose a recovery policy
Single detection recovery works well for services that recover cleanly -- they either work or they do not.
Progressive recovery is safer for services that may be partially recovered. It prevents a sudden flood of traffic from overwhelming a service that just came back online.
Verify that a rule is active
After you save a circuit breaking rule, confirm it is working as expected:
In the left-side navigation pane, click Traffic management. On the Flow protection tab, click the Fuse rule tab to view all configured rules and their status.
Generate test traffic to the protected interface and monitor the circuit breaker metrics to confirm that the rule triggers correctly.
Related topics
Configure throttling rules to control request rates in addition to circuit breaking.