In a microservice framework, service calls are affected if consumers cannot detect the exceptions on the application instances of a provider. This further affects the performance and even availability of the services provided by the consumers. The outlier instance removal feature monitors the availability of application instances and dynamically adjusts the instances. This ensures successful service calls and improves the service stability and quality of service (QoS).

Background information

A system includes Applications A, B, C, and D, where Application A calls Applications B, C, and D. If the instances of Application B, C, or D become abnormal and Application A does not identify the abnormal instances, a part of calls initiated by Application A fail. Application B has one abnormal instance, and Applications C and D each have two abnormal instances. If Applications B, C, and D have a large number of abnormal instances, the service performance and availability of Application A may be affected.

To ensure the service performance and availability of Application A, you can configure an outlier application removal policy. After the policy is configured, Enterprise Distributed Application Service (EDAS) can monitor the instance status of Applications B, C, and D, and dynamically add or remove instances to ensure successful service calls.

The following list describes the process of outlier instance removal:

  1. EDAS detects whether Applications B, C, and D have abnormal instances. Then, EDAS determines whether to remove the abnormal instances from the applications based on the configured Upper limit of instance removal ratio parameter.
  2. EDAS does not distribute the call requests of Application A to the removed instances.
  3. EDAS detects whether the abnormal instances are recovered based on the configured Recovery detection unit time parameter.
  4. The detection interval is proportional to the number of detection times and linearly increases by the value of the Recovery detection unit time parameter, which is 0.5 minutes by default. If the value of the Maximum cumulative number of times not restored parameter is reached, EDAS detects whether the abnormal instances are recovered at the maximum detection interval.
  5. After the abnormal instances are recovered, they are added to the instance lists of the applications to continue processing call requests. The detection interval is reset to the value of the Recovery detection unit time parameter, such as 0.5 minutes.
Note
  • If the provider has a large number of abnormal instances and the ratio of the abnormal instances exceeds the value of the Upper limit of instance removal ratio parameter, the number of actually removed instances equals the configured upper limit.
  • If the provider has only one instance available, this instance is not removed even if the error rate exceeds the configured limit.

Create an outlier instance removal policy

  1. Log on to the EDAS console.
  2. In the left-side navigation pane, choose Microservices Governance > Spring Cloud.
  3. Click Outlier Instance Removal.
  4. On the Outlier Instance Removal page, select a region and a microservice namespace. Then, click Create an outlier removal policy.
  5. In the Basic information step on the Create Outlier Instance Removal Policy page, configure the parameters and click Next Step.
    Create Outlier Instance Removal Policy - Basic Information

    The following table describes the parameters in the Basic information step.

    Parameter Description
    Microservice Space Select a region and a namespace from the drop-down lists.
    Policy name Enter a name for the policy. The name can be up to 64 characters in length.
    Framework Select Spring Cloud.
  6. In the Select effective application step on the Create Outlier Instance Removal Policy page, select the required application and click the > icon to add the application to Selected Applications. Then, click Next Step.
    Create Outlier Instance Removal Policy - Select Effective Application

    After the application is selected, all the abnormal instances of the applications that are called by this application are removed. Call requests from this application are not distributed to the removed instances.

  7. In the Configure policies step on the Create Outlier Instance Removal Policy page, configure the parameters and click Next Step.
    Create Outlier Instance Removal Policy - Configure Policies

    The following table describes the parameters in the Configure policies step.

    Parameter Description
    Exception type Select Network exception or Network exception + business exception (HTTP 5xx) based on your business requirements.
    QPS lower limit Enter the lower limit of queries per second (QPS) based on the statistical time window. The time window is 15s for applications that run Dubbo 2.7, and is 10s for applications that run other Dubbo versions and Spring Cloud applications. If the QPS in a statistical time window, 15s for example, reaches the specified lower limit, EDAS starts to collect and analyze error rate statistics.
    Lower error rate limit Enter the lower limit of the error rate. If the error rate on an instance of a called application exceeds the limit, the instance is removed. Default value: 50. For example, an instance receives 10 call requests in the statistical time window, and 6 call requests fail. The error rate is 60%. If this parameter is set to 50, the instance is removed.
    Upper limit of instance removal ratio Enter the upper limit for the proportion of abnormal instances that can be removed. If the limit is reached, no more abnormal instances are removed. For example, an application has 6 instances in total. If you set this parameter to 60, the number of instances that can be removed is 3.6, which is rounded down to the nearest integer 3. The number is calculated by using the following formula: 6 × 60%. If the calculated result is less than 1, no instances are removed.
    Recovery detection unit time Set a unit interval to detect whether abnormal instances are recovered. After abnormal instances are removed, EDAS linearly increases the detection interval by the specified unit interval with the number of detection times. Default value: 30000. Unit: ms. The default value equals 0.5 minutes.
    Maximum cumulative number of times not restored Enter the maximum number of times that EDAS detects whether an abnormal instance is recovered. After the maximum number is reached, EDAS stops increasing the detection interval. For example, an abnormal instance remains unrecovered after being detected 20 times. If you set the Recovery detection unit time parameter to 30000 and the Maximum cumulative number of times not restored parameter to 20, EDAS detects whether the instance is recovered at an interval of 10 minutes, which is calculated by using the following formula: 20 × 30000 ms. If the instance is recovered before the maximum number is reached, the detection interval is reset to the value of the Recovery detection unit time parameter.
    Note We recommend that you do not set the Maximum cumulative number of times not restored parameter to a large value. A large value can result in a long detection interval. If an instance is recovered early before a long detection interval arrives, the recovery cannot be detected at earliest opportunity. This results in low resource utilization and postponed processing of service call requests.
  8. In the Create Confirm step on the Create Outlier Instance Removal Policy page, confirm the settings and click Create.
    Create Outlier Instance Removal Policy - Create Confirm

Verify the result

The outlier instance removal feature is enabled after you configure and create an outlier instance removal policy. You can go to the details page of the application for which you have configured outlier instance removal to view the application monitoring information. For example, you can check whether call requests are still forwarded to abnormal instances and whether the error rate per minute for application calls is higher than the value of the Lower error rate limit parameter in a topology. This way, you can check whether the outlier instance removal policy takes effect.

What to do next

On the Outlier Instance Removal page, you can click Edit or Delete in the Operation column to manage the policies.