Technology selection: Sentinel vs Hystrix
This is the third article in a series of articles on Sentinel's usage scenarios, technology comparison and implementation, and developer practices.
» The first review:
Dubbo's Traffic Guard | How Sentinel achieves high availability of services through traffic throttling - Portal
» Second review:
RocketMQ's Fuse | How Sentinel Ensures Service Stability Through Uniform Requests and Cold Starts - Portal
Sentinel is a lightweight and highly available flow control component for distributed service architecture developed by the Alibaba middleware team. It was officially open sourced in July this year. Sentinel mainly takes traffic as the entry point, and helps users improve the stability of services from multiple dimensions such as traffic control, circuit breaker downgrade, and system load protection. You may ask: What are the similarities and differences between Sentinel and Netflix Hystrix, a circuit breaker downgrade library that was often used before? This article will compare Sentinel and Hystrix from the perspectives of resource model and execution model, isolation design, circuit breaker downgrade, and real-time indicator statistical design, hoping to help developers when faced with technology selection.
1. General description
Let's take a look at the official introduction of Hystrix:
Hystrix is a library that helps you control the interactions between these distributed services by adding latency tolerance and fault tolerance logic. Hystrix does this by isolating points of access between the services, stopping cascading failures across them, and providing fallback options, all of which improve your system's overall resiliency.
It can be seen that Hystrix's focus is on the fault-tolerant mechanism based on isolation and fuse. The call that times out or fuse will fail quickly, and it can provide a fallback mechanism.
Sentinel focuses on:
Diversified flow control
System load protection
Real-time monitoring and console
It can be seen that the problems solved by the two are quite different. Let's compare them in detail.
2. Common characteristics
1. Comparison of resource model and execution model
Hystrix's resource model design adopts the command mode, which encapsulates the call to external resources and fallback logic into a command object (HystrixCommand/ HystrixObservableCommand), and its underlying execution is based on RxJava. CommandKey and groupKey (used to distinguish resources) and the corresponding isolation strategy (thread pool isolation or semaphore isolation) must be specified when each Command is created. In thread pool isolation mode, you need to configure the parameters corresponding to the thread pool (thread pool name, capacity, queuing timeout, etc.), and then Command will be executed in the specified thread pool according to the specified fault tolerance strategy; in semaphore isolation mode, you need to configure the maximum number of concurrent , Hystrix will limit its concurrent calls when Command is executed.
Sentinel's design is simpler. Compared with Hystrix Command's strong dependence on isolation rules, Sentinel's resource definition and rule configuration are less coupled. The reason why Hystrix's Command is strongly dependent on the configuration of isolation rules is that the isolation rules will directly affect the execution of Commands. During execution, Hystrix will parse the isolation rules of Command to create RxJava Scheduler and schedule execution on it. If it is in thread pool mode, the underlying thread pool of Scheduler is the configured thread pool. If it is in semaphore mode, it is simply packaged into the current thread for execution. Scheduler.
Sentinel is different. When developing, you only need to consider whether this method/code needs to be protected, and what to use to protect it. It can be modified dynamically and in real time at any time.
Starting from version 0.1.1, Sentinel also supports annotation-based resource definition. You can specify exception handling functions and fallback functions through annotation parameters. Sentinel provides a variety of rules configuration methods. In addition to registering rules in the memory state directly through the loadRules API, users can also register various external data sources to provide dynamic rules. Users can dynamically change the rule configuration according to the current real-time situation of the system, and the data source will push the changes to Sentinel and take effect immediately.
2. Comparison of isolation design
Isolation is one of the core features of Hystrix. Hystrix provides two isolation strategies: thread pool isolation (Bulkhead Pattern) and semaphore isolation, of which the most recommended and most commonly used is thread pool isolation. Hystrix's thread pool isolation creates different thread pools for different resources, and different service calls occur in different thread pools, which can fail quickly in blocking situations such as thread pool queuing and timeout, and can provide a fallback mechanism. The advantage of thread pool isolation is that the isolation is relatively high, and it can be processed for the thread pool of a resource without affecting other resources, but the cost is that the overhead of thread context switching is relatively large, especially for low-latency calls. Impact.
However, in practice, thread pool isolation does not bring many benefits. The most direct impact is to fragment machine resources. Consider such a common scenario, using Hystrix in a servlet container such as Tomcat, the number of threads in Tomcat itself is very large (maybe dozens or hundreds). If you add the thread pool created by Hystrix for each resource, The total number of threads will be very large (hundreds of threads), so context switching will be very expensive. In addition, the thorough isolation of the thread pool mode allows Hystrix to handle the queuing and timeout situations of different resource thread pools separately, but this is actually a problem to be solved by timeout fuse and flow control. If the component has timeout fuse and flow control ability, thread pool isolation is not so necessary.
Hystrix's semaphore isolation limits the number of concurrent calls to a resource. This kind of isolation is very lightweight. It only limits the number of concurrent calls to a certain resource, instead of explicitly creating a thread pool, so the overhead is relatively small, but the effect is good. But the disadvantage is that the slow call cannot be automatically degraded, and it can only wait for the client to time out itself, so there may still be cascading blocking.
Sentinel can provide the function of semaphore isolation through the flow control of the number of concurrent threads mode. And combined with the response time-based fuse downgrade mode, it can be automatically downgraded when the average response time of unstable resources is relatively high, preventing too many slow calls from filling up the number of concurrent calls and affecting the entire system.
3. Comparison of fuse downgrade
The circuit breaker downgrade functions of Sentinel and Hystrix are essentially based on the Circuit Breaker Pattern. Both Sentinel and Hystrix support fuse downgrade based on the failure ratio (abnormal ratio). When the call reaches a certain level and the failure ratio reaches the set threshold, it will automatically fuse. At this time, all calls to the resource will be blocked until the Heuristically resumes after a specified time window. As mentioned above, Sentinel also supports fuse downgrade based on average response time, which can automatically fuse when the service response time continues to soar, rejecting more requests and not recovering until a period of time. This prevents situations where calls are very slow and cause cascading blocking.
4. Comparison of real-time indicator statistics implementation
Both Hystrix and Sentinel's real-time indicator data statistics implementations are based on sliding windows. The version before Hystrix 1.5 is a sliding window implemented by a ring array, and the statistics of each bucket are updated through the operation of lock and CAS. Hystrix 1.5 began to reconstruct the implementation of real-time indicator statistics, abstracting the indicator statistics data structure into the form of reactive stream, which is convenient for consumers to use indicator information. At the same time, the bottom layer is transformed into an event-driven model based on RxJava, and corresponding events are released when the service call succeeds/fails/times out. Through a series of transformations and aggregations, a real-time stream of indicator statistics is finally obtained, which can be consumed by circuit breakers or Dashboards. .
Sentinel currently abstracts the Metric indicator statistics interface, and the bottom layer can have different implementations. The current default implementation is a sliding window based on LeapArray, and subsequent implementations such as reactive stream may be introduced as needed.
3. Sentinel Features
In addition to the common features of the two mentioned earlier, Sentinel also provides the following special features:
1. Lightweight and high performance
Sentinel is a full-featured high-availability traffic control component. Its core sentinel-core does not have any redundant dependencies. After packaging, it is less than 200 KB, which is very lightweight. Developers can safely introduce sentinel-core without worrying about dependencies. At the same time, Sentinel provides a variety of extension points, users can easily expand according to their needs and seamlessly fit into Sentinel.
The performance penalty brought by the introduction of Sentinel is very small. Only when the single-machine level of the business exceeds 25W QPS will there be some significant impact (about 5% - 10%). When the single-machine QPS is not too large, the loss is almost negligible.
2. Flow control
Sentinel can perform flow control on resource calls for different call relationships and based on different operating indicators (such as QPS, concurrent calls, system load, etc.), and adjust random requests into appropriate shapes.
Sentinel supports a variety of traffic shaping policies, and can automatically adjust the traffic to an appropriate shape when the QPS is too high. Commonly used are:
Direct rejection mode: that is, the exceeding requests are rejected directly.
Slow-start preheating mode: When the flow rate surges, control the flow rate, let the passing flow increase slowly, and gradually increase to the upper limit of the threshold within a certain period of time, give the cooling system a warm-up time, and avoid the cooling system from being overwhelmed .
Constant speed mode: The constant speed mode implemented by the Leaky Bucket algorithm strictly controls the time interval for the request to pass, and the accumulated requests will be queued at the same time, and the request that exceeds the timeout period will be directly rejected. Sentinel also supports current throttling based on call relationships, including caller-based throttling, call chain entry-based throttling, and associated traffic throttling, etc. Relying on Sentinel's powerful call link statistics, it can provide accurate current throttling in different dimensions .
At present, Sentinel's support for asynchronous call links is not very good, and subsequent versions will focus on improving support for asynchronous calls.
3. System load protection
Sentinel provides protection for the dimension of the system, and the load protection algorithm draws on the idea of TCP BBR. When the system load is high, if the request continues to come in, it may cause the system to crash and fail to respond. In a cluster environment, network load balancing will forward the traffic that should be carried by this machine to other machines. If other machines are also in an edge state at this time, the increased traffic will cause this machine to crash, and finally make the entire cluster unavailable. In response to this situation, Sentinel provides a corresponding protection mechanism to balance the ingress traffic of the system and the load of the system to ensure that the system can handle the most requests within its capacity.
4. Real-time monitoring and control panel
Sentinel provides HTTP API to obtain real-time monitoring information, such as call link statistics, cluster point information, rule information, etc. If users are using Spring Boot/Spring Cloud and Sentinel Spring Cloud Starter, they can also easily obtain some runtime information, such as dynamic rules, through the exposed Actuator Endpoint. In the future, Sentinel will also support standardized indicator monitoring APIs, which can easily integrate various monitoring systems and visualization systems, such as Prometheus, Grafana, etc.
The Sentinel console (Dashboard) provides functions such as machine discovery, configuration rules, viewing real-time monitoring, and viewing call link information, making it very convenient for users to view monitoring and configure.
Sentinel is currently targeting Servlet, Dubbo, Spring Boot/Spring Cloud, gRPC, etc. have been adapted. Users only need to introduce corresponding dependencies and perform simple configuration to easily enjoy Sentinel's high-availability traffic protection capabilities. In the future, Sentinel will adapt to more common frameworks, and will provide cluster traffic protection capabilities for Service Mesh.
Author: middleware brother
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Explore More Special Offers
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00