In a microservice framework, an application is used to provide services and can be
deployed on multiple instances. Some instances may become abnormal. If a consumer
calls the application in this situation and does not perceive the abnormal instances,
the call may fail. This affects service performance and availability. The outlier
instance removal feature monitors the availability of instances and dynamically adjusts
instances. This ensures successful service calls, and improves service stability and
performance.
Background information
In the following figure, a system includes Applications A, B, C, and D, where Application
A calls Applications B, C, and D. If the instances of Application B, C, or D become
abnormal and Application A does not identify the abnormal instances, a part of calls
initiated by Application A fail. In the following figure, Application B has one abnormal
instance, and Applications C and D each have two abnormal instances. If Applications
B, C, and D have a large number of abnormal instances, the service performance and
availability of Application A may be affected.
To ensure service performance and availability, you can configure an outlier application
removal policy. After the policy is configured, Enterprise Distributed Application
Service (EDAS) can monitor the instance status of Applications B, C, and D, and dynamically
add or remove instances to ensure successful service calls.
The following list describes the outlier instance removal process:
- EDAS detects whether Applications B, C, and D have abnormal instances. Then, EDAS
determines whether to remove the abnormal instances from the applications based on
the configured Upper Limit of Instance Removal Ratio.
- EDAS does not distribute the call requests of Application A to the removed instances.
- EDAS detects whether the abnormal instances are recovered based on the configured
Recovery Detection Unit Time.
- The detection interval is proportional to the number of detection times and linearly
increases by Recovery Detection Unit Time, which is 0.5 minutes by default. If the value of Maximum Cumulative Number of Times Not Restored is reached, EDAS detects whether the abnormal instances are recovered at the maximum
detection interval.
- After the abnormal instances are recovered, they are added to the instance lists of
the applications to continue processing call requests. The detection interval is reset
to the value of Recovery Detection Unit Time, such as, 0.5 minutes.
Note
- If the provider has a large number of abnormal instances and the ratio of the abnormal
instances exceeds the configured Upper Limit of Instance Removal Ratio, the number
of actually removed instances equals the configured upper limit.
- If the provider has only one instance available, this instance is not removed even
if the error rate exceeds the configured limit.
Create an outlier instance removal policy
- Log on to the EDAS console.
- In the left-side navigation pane, choose .
- In the navigation tree of Spring Cloud, click Outlier Instance Removal.
- In the top navigation bar, select a region. On the Outlier Instance Removal page, set Namespaces and click Create an Outlier Instance Removal Policy.
- In the Basic Information step on the Create Outlier Instance Removal Policy page, configure the parameters and click Next Step.
The following table describes the parameters in the Basic Information step.
Parameter |
Description |
Namespace |
Select a region and a namespace from the drop-down lists.
|
Policy Name |
Enter a name for the policy. The name can be a maximum of 64 characters in length.
|
Framework |
Select Spring Cloud.
|
- In the Select Effective Application step on the Create Outlier Instance Removal Policy page, select the required application and click the > icon to add the application to Selected Applications. Then, click Next Step.

After the application is selected, all the abnormal instances of the applications
that are called by this application are removed. Call requests from this application
are not distributed to the removed instances.
- In the Configure Policies step on the Create Outlier Instance Removal Policy page, configure the parameters and click Next Step.

The following table describes the parameters in the Configure Policies step.
Parameter |
Description |
Exception Type |
Select Network Exception or Network Exception + Business Exception (HTTP 5xx) based on your business requirements.
|
QPS Lower Limit |
Enter the lower limit of queries per second (QPS) based on the statistical time window.
The time window is 15s for applications that run Dubbo 2.7, and is 10s for applications
that run other Dubbo versions and Spring Cloud applications. If the QPS in a statistical
time window, 15s for example, reaches the specified lower limit, EDAS starts to collect
and analyze error rate statistics.
|
Lower Error Rate Limit |
Enter the lower limit of the error rate. If the error rate on an instance of a called
application exceeds the limit, the instance is removed. Default value: 50%. For example,
an instance receives 10 call requests in the statistical time window, and 6 call requests
fail. The error rate is 60%. If you set this parameter to 50% in this situation, the
instance is removed.
|
Upper Limit of Instance Removal Ratio |
Enter the upper limit for the proportion of abnormal instances that can be removed.
If the limit is reached, no more abnormal instances are removed. For example, an application
has 6 instances in total. If you set this parameter to 60%, the number of instances
that can be removed is 3.6, which is rounded down to the nearest integer 3. The number
is calculated by using the following formula: 6 × 60%. If the calculated result is
less than 1, no instances are removed.
|
Recovery Detection Unit Time |
Set a unit interval to detect whether abnormal instances are recovered. After abnormal
instances are removed, EDAS linearly increases the detection interval by the specified
unit interval with the number of detection times. Default value: 30000. Unit: ms.
The default value equals 0.5 minute.
|
Maximum Cumulative Number of Times Not Restored |
Enter the maximum number of times that EDAS detects whether an abnormal instance is
recovered. After the maximum number is reached, EDAS stops increasing the detection
interval. For example, an abnormal instance remains unrecovered after being detected
20 times. If you set Recovery Detection Unit Time to 30000 and Maximum Cumulative Number of Times Not Restored to 20 in this situation, EDAS detects whether the instance is recovered at an interval
of 10 minutes, which is calculated by using the following formula: 20 × 30000 ms.
If the instance is recovered before the maximum number is reached, the detection interval
is reset to Recovery Detection Unit Time.
Note We recommend that you do not set Maximum Cumulative Number of Times Not Restored to a large value. A large value can result in a long detection interval. If an instance
is recovered early before a long detection interval arrives, the recovery cannot be
detected in a timely manner. This results in low resource utilization and postponed
processing of service call requests.
|
- In the Create Confirm step on the Create Outlier Instance Removal Policy page, confirm the parameter settings and click Create.
Verify the result
After you configure and submit an outlier instance removal policy, the outlier instance
removal feature is enabled. After you configure an outlier instance removal policy
for an application, you can go to the details page of the application to view the
monitoring information. You can view the monitoring information in topology to check whether all requests are still forwarded to abnormal instances. You can
also check whether Error Rate per Minute of the application is higher than the configured Lower Error Rate. Based on the information, you can determine whether the outlier instance removal
policy takes effect.
What to do next
On the Outlier Instance Removal page, you can click Edit or Delete in the Operation column to manage the policies.