Developer Content

Background

The stability of microservices has always been a topic of great concern to developers. With the evolution of business from single architecture to distributed architecture and the change of deployment mode, the dependency relationship between services has become more and more complex, and the business system is also facing huge high availability challenges. During the epidemic, everyone may have experienced the following scenarios:

• When making an online reservation to buy a mask, the instantaneous peak flow caused the system to exceed the maximum load, and the load soared, and the user could not place an order;

• There are too many requests for online course selection submitted at the same time, and the system cannot respond;

• There are too many users who meet online at the same time during online office/teaching, and the meeting is relatively stuck;

These scenarios of reduced availability will seriously affect the user experience, so we need to take some measures to protect against unstable factors in advance, and we also need to have the ability to stop loss quickly in case of sudden traffic.

Downgrading of flow control - an important part of ensuring the stability of micro-services

There are many factors affecting the availability of microservices, and these unstable scenarios may lead to serious consequences. From the perspective of microservice traffic, we can roughly divide into two common scenarios:

1. The traffic of the service itself exceeds the carrying capacity, resulting in unavailability. For example, the surge in traffic and batch task delivery caused the service load to soar and the request could not be processed normally.

The traffic is very random and unpredictable. The first second may be calm, and the next second may be the peak of flow (for example, the scene of double 11 o'clock). However, the capacity of our system is always limited. If the sudden traffic exceeds the capacity of the system, it may lead to the failure of request processing, the slow processing of accumulated requests, the soaring CPU/Load, and finally the system crash. Therefore, we need to limit the sudden traffic and ensure that the service is not destroyed while processing requests as much as possible.

2. The service is dependent on other unavailable services, resulting in its own link unavailability. For example, our service may rely on several third-party services. If a payment service has an exception, the call is very slow, and the caller has not effectively prevented and processed, the thread pool of the caller will be full, affecting the normal operation of the service itself. In a distributed system, the call relationship is meshed and complex. A service failure may lead to cascading reactions, leading to the unavailability of the entire link.

A service often calls other modules, possibly another remote service, database, or third-party API. For example, when paying, you may need to call the API provided by UnionPay remotely; To query the price of a commodity, you may need to query the database. However, the stability of this dependent service cannot be guaranteed. If the dependent service is unstable and the response time of the request becomes longer, the response time of the method calling the service will also become longer, and the thread will accumulate, which may eventually deplete the thread pool of the business itself, and the service itself will become unavailable. Modern microservice architectures are distributed and consist of many services. Different services call each other to form a complex call link. The above problems will produce amplification effect in link call. If a ring on a complex link is unstable, it may cascade layer by layer, resulting in the unavailability of the entire link. Therefore, we need to fuse and degrade unstable services, temporarily cut off unstable calls, and avoid the overall avalanche caused by local unstable factors.

MSE service governance is based on the stability protection capability of Sentinel, the current limiting and degrading component of Alibaba. It takes traffic as the entry point and helps to ensure the stability of services from multiple dimensions such as traffic control, concurrency control, fuse degradation, hotspot protection, and system adaptive protection, covering several major scenarios such as microservices, cloud native gateway, and service mesh.

After introducing the scenario and capability of flow control degradation, let's talk about the main character we want to focus on today: runtime dynamic enhancement capability. We will introduce how to realize the flow control degradation of any point through MSE service governance. Any point includes but is not limited to access interfaces such as Web, Rpc, SQL, Redis, and the interface of any business method and framework.

Enhance capability at runtime - one key to realize flow control degradation at any point

How to add a flow control degradation capability to any specified method at runtime? Let me take a Demo as an example to briefly introduce it. We wrote the following business code. We wrote a simple Spring Boot application, in which method a is an arbitrary internal method.

So far, the monitoring can't see the method a. We can only see the interface of restA or the monitoring data of GET:/a, and can configure the current limit degradation rules.

For the open source method, we need to add Sentinel's dependency in the code, and add Sentinel's ability to the com. alibabacloud. mse. demo. AApplication. AController # a method configuration annotation or coding method

If you need to code, you will naturally have many disadvantages. If you want to increase the dependency, you need to change the code, and you need to re-release it. It is difficult to do that Costs are everywhere.

So how can we achieve the current limiting and degrading ability of com. alibabacloud. mse. demo. AApplication. AController # a without writing a line of code?

Configure white-screen rules at runtime

Configure the white-screen rule at runtime, select the interface of the current application's custom buried point type, and fill in the class and method.

Of course, we can see that our white-screen rule capability not only supports dynamic flow limiting degradation, but also supports the access log at any point and the collection of request context

Monitoring data of the specified method is observed

We find the target application in application governance, and see the monitoring data of the specified method com. alibabacloud. mse. demo. AApplication. AController # a in Interface Monitoring>Custom Burial Point

Configure flow control rules

We can click the "Add Protection Rule" button in the upper right corner of the interface overview to add a flow control rule:

We can configure the flow control rules of the simplest QPS mode. For example, the above example limits the amount of single machine adjustment of this interface to no more than 1 time per second.

After configuring the rules, you can see the flow restriction effect on the monitoring page after a moment:

Rejected traffic will also return an error message. The built-in framework buried points of MSE have default flow control processing logic, such as 429 Too Many Requests returned after the Web interface was restricted, and the DAO layer and java method threw an exception after the flow was restricted.

summary

We abstract the white-screen ability at runtime into the following rules: WhiteScreenRule=Target+Action

Target:

• ResourceTarget: target interface, supporting Web, Rpc, SQL and any custom methods

• WorkloadTarget: target instance, you can select all machines or specify machine IP

• TrafficCondition: only for exception, slow call and full-link grayscale labels

Action:

• Collection of relevant context diagnosis information, parameters, return values, thread context, target objects, class loader information, etc

• Whether the subsequent links are log printed

• Perform current limiting degradation

• Designated flow for marking and dyeing (under planning)

In the near future, MSE will launch a log governance model based on the above rules combined with the dynamic enhancement capability. We not only have the flow restriction degradation at any point based on the dynamic enhancement capability, but also can help us insight into the behavior of the full-link traffic operation and make real-time governance and protection.

MSE Sentinel is not only widely used in Alibaba's internal Taobao, Tmall and other e-commerce fields, but also has a lot of practice in Internet finance, online education, games, live broadcasting and other large state-owned enterprises. With the ability to achieve current limiting and degradation for any method, we can quickly endow any microservice system with the ability of traffic protection, so that we have more time to focus on the rapid development of the business. The stability of the system can be trusted to MSE, and the professional team can do professional work.

Realize the traffic protection ability of any method

Related Articles

A detailed explanation of Hadoop core architecture HDFS

What Does IOT Mean

6 Optional Technologies for Data Storage

What Is Blockchain Technology

Explore More Special Offers

Short Message Service(SMS) & Mail Service

Sales Support

Technical Support

Connect & Report Abuse