Goodbye Nacos, I'm going to play Service Mesh!

Date: Oct 25, 2022

Related Tags:1. An In-Depth Insight into Nacos 2.0 Architecture and New Model with a Supported gRPC Persistent Connection
2. Nacos-based Environment Isolation at Alibaba

Abstract: However, compared to the service registration discovery in the Service Mesh architecture, it largely depends on the infrastructure (such as the data plane [envoy], the control plane [istio], and the Kubernetes cluster), without the need for microservices to do it themselves.

The previous article <> actually practiced the specific gameplay of the Service Mesh microservice architecture. In this case, through the combination of Istio+Kubernetes, a set of services developed with the Spring Boot framework, If it does not implement any service registration discovery logic, it can directly complete the service interface call through the service name after being deployed to Kubernetes, and can also perform a series of service governance operations such as current limiting, fusing, and load balancing for the call.

How did all this happen? In today's article introducing the service mesh microservice architecture and the principle of service registration discovery, the answer will be revealed to you.

Traditional microservice registry implementation mechanism

Before explaining the Service Mesh service registration discovery mechanism, let's briefly review how service registration and discovery are implemented in traditional microservices represented by Spring Cloud.

In the traditional microservice system, the registry is undoubtedly the most important part of the entire microservice system. Without the unified service registration and discovery function provided by the service registry, the microservice itself cannot be discussed. However, compared to the service registration discovery in the Service Mesh architecture, it largely depends on the infrastructure (such as the data plane [envoy], the control plane [istio], and Kubernetes clusters), without the need for microservices to do it themselves. In the traditional architecture of Spring Cloud, service governance logic such as service registration, discovery, and health check all require microservices to manage themselves.

Although there is no need to reinvent the wheel, there are ready-made service governance components and frameworks, but from the perspective of application operation, the logic related to service registration discovery is the direct interaction between microservices and the registry. From the perspective of the system architecture, this method obviously couples the service governance logic with the business application, and its operation logic is shown in the following figure:

As can be seen from the above figure, when the microservice starts, it will discover the SDK through the integrated service, and send the registration information to the registration center (the registration center address of the application configuration), and the registration center will store the basic information of the service after receiving the service registration request; After that, if a service consumer wants to call the service, the caller can query the registry for the address list of the target microservice through the service discovery component (such as Ribbon), and send the service address list to the service address list with a certain load strategy. The target microservice initiates a call.

When the service node discovers changes, the new node will be re-registered, and the offline node will also be kicked out of the service registry in time based on the service detection mechanism. Based on such a service registration/discovery mechanism, microservices smooth communication can be guaranteed. From this point of view, the health check between the registry and the service is very important. If the registry cannot remove the offline or faulty nodes from the list of available server addresses in time, it is likely to cause some calls to microservices. s failure.

So how do you generally conduct a service health check? In terms of methods, the mainstream health check methods mainly include the following three types:

1. Active detection of microservices

Active detection of services means that microservices regularly send lease information to the registry to show their survival. In actual scenarios, active detection is the most used way when we use the registry. If the scale of the service is not large, or if an eventual consistency registry like Eureka is used, then active detection is the best way. Select, it can largely avoid the problem that the old service nodes still survive due to the reuse of Pod IP after the service is deployed in the Kubernetes cluster. After all, the renewal information is reported to the registry with the basic service information.

However, this method also has obvious drawbacks, which will increase the pressure of the registry write operation. If a large number of services are released at the same time and the nodes change greatly, a large number of notification events may be generated, which will have a greater impact on the stability of the entire microservice system.

In addition, the active renewal of the contract does not fully mean that the service is healthy, because in some special cases, there may be a problem that although the service cannot provide services to the outside world, the contract information can be sent to the registration center normally.

2. The registration center initiates a health check

The advantages and disadvantages of the active detection method of microservices have been analyzed earlier, but what if the health check is initiated by the registry?

This method means that when the microservice is registered, it exposes its own health check endpoint (such as /actuator/health) to the registry at the same time, and the registry checks whether the service node is alive through regular access.

However, this method is not completely without problems. For example, the aforementioned Pod IP reuse problem. If other microservices reuse the IP of the previous node, the illusion that the failed node is activated will occur. Of course, there are corresponding solutions. For example, we will talk about the way that Envoy will perform a second check on the service name when Istio microservice registration is discovered.

3. Caller load balancer performs health check

The third solution is more extreme. The registry does not perform any service probing, and all the probing is performed by the load balancer where the microservice caller is located. A common scenario of this scheme is the automatic removal of failed nodes in gRPC. However, this scheme still has the problem of IP reuse, so this method is not very popular in practical scenarios.

The three common service detection methods for microservice registration discovery have been discussed earlier. In fact, the three schemes have their own advantages and disadvantages. Taking the most widely used registries in the Spring Cloud microservice system as examples, such as Eureka, Consul, and Nacos, their comparisons are shown in the following table:

As can be seen from the table, in addition to Eureka's main service active detection method, Consul and Nacos have adopted a variety of service detection methods to avoid the disadvantages of different methods as much as possible. This is why most of the current practices are gradually abandoning Eureka instead Reasons to use Consul or Nacos.

In addition to the health check, the above table also lists the conformance protocols of these registries. By the way, the CAP theory is popularized here. CAP is an important concept in distributed systems. It consists of three parts: Consistency, Availability, and Partition tolerance. In the CAP theory, a distributed system cannot satisfy all three, generally only two of them can be satisfied at the same time. For example, Eureka and Nacos can only satisfy availability and partition fault tolerance, but cannot satisfy consistency; Consul can only satisfy Consistency and partition tolerance, but not fully availability.

In the scenario of the registry, the consistency requirements are generally not high, as long as the final consistency can be achieved. After all, in the microservice architecture, the registration and de-registration of nodes are involved, the communication between the registry and the client takes a certain amount of time, and the consistency itself is difficult to achieve. Therefore, in the selection of the registration center, the AP system is generally preferred. This is also the reason why in the practice of building microservices with Spring Cloud, in addition to self-research, Nacos is preferred as the service registration center in open source technologies. .

Kubernetes service registration and automatic discovery

The basic logic related to the registry and common open source technologies in the Spring Cloud traditional microservice system have been reviewed at a certain length. Before describing the logic of service registration discovery in the Service Mesh architecture in detail, it is necessary to understand the concepts related to Service service resources in Kubernetes container orchestration.

Most of the most popular Service Mesh solutions (such as Istio) are combined with Kubernetes clusters, and their service registration logic mainly uses Kubernetes' internal service discovery mechanism. For example, Istio implements service discovery by monitoring changes in Kubernetes Pods. Of course, this does not mean that the traditional registry solution cannot be selected in Service Mesh, but the implementation may need to be transformed or self-developed registry to meet the needs (this may be considered when compatible with old and new microservice systems). If it is a newly designed Service Mesh microservice architecture, the best solution is to choose Istio to directly use Kubernetes' own functions to realize service discovery.

The basis for providing internal service registration discovery for Service Mesh microservices in Kubernetes is Service-type resources. Service is a service concept abstracted by Kubernetes. A set of Pods is a Kubernetes Service. In Kubernetes, a Pod is the smallest container orchestration unit. Multiple containers can be defined in a Pod orchestration resource. Containers in a Pod can communicate in a Pod through local access. A group of containers share a Pod IP, which is why Kubernetes is called It is the core embodiment of the container orchestration platform.

But the Pod has a life cycle, that is to say, the IP address of the Pod is not fixed, it will change with the change of the life cycle of the Pod, so if the Pod is used as the service call object, the frequent change of the IP address will cause the caller to The service discovery logic is unstable, which complicates the system. In order to solve this problem, the Service resource type is abstracted in Kubernetes. Although the IP address of the Pod will change, the abstract Service name is fixed, so the Kubernetes cluster can access these back-end IPs through the Service name. There is no need to directly perceive the changes of these back-end Pod IPs. The mapping relationship between the specific Pod and the Service is completely implemented by the Kubernetes cluster itself. The relationship between them is shown in the following figure:

As shown in the figure above, it is now necessary to load balance access to a set of the same set of service copies-orders. Through the definition of the Service resource type, it can be expressed externally as a process or service resource object, and Kubernetes will allocate it within the cluster. After fixing the IP address, by requesting the Service (using the name in the cluster, and exposing it through Ingress or NodePort outside the cluster), the Service itself will load balance the request and forward the request to the corresponding Pod node.

Although the association between the Service resource and the Pod orchestration object is managed by the Kubernetes cluster itself, we need to associate it through the resource definition when the service is published. Take a piece of Kubernetes release code as an example:

As shown in the highlighted code in the above Kubernetes release file, when defining the Pod orchestration object, the label of this group of Services is defined through the metadata label, and then the label of the response is specified through the selector label, so that the Service can access the Pod defined by this group of labels. Assembled.

The above is the basic principle of Kubernetes to realize service registration and discovery. The logic involved will be used in the design and implementation of the Service Mesh microservice platform Istio.

Istio service registration discovery

Next, let's take the Service Mesh architecture represented by Istio (based on the istio1.9 architecture) as an example to see how it implements service registration discovery. After the previous foreshadowing, I believe you have already understood a little bit of the basic principle of Istio's implementation of service registration discovery: "implemented by monitoring Kubernetes node changes and Service resource types".

Next, we will further refine it, and analyze how the control plane and data plane cooperate to realize microservice registration discovery in Istio from the perspective of running logic. Specifically as shown in the figure below:

The above figure describes the core logic of Istio's implementation of microservice registration discovery based on Kubernetes. After Istio is installed in the Kubernetes cluster environment, when Kubernetes creates a Pod, it calls the Sidecar-Injector service of the control plane component through Kube-APIserver, automatically modifies the application description information and injects it into SideCar, and then creates the Pod of the business container. Also create the SideCar proxy container. SideCar will connect to each component of the Istio control plane through the xDS protocol. Here we focus on the content of the Pilot control plane component to realize service discovery.

First of all, it is clear that in the new version of Istio, the logic of service discovery has been moved up to the control plane component Pilot. Pilot will monitor the changes of resources such as Service, Endpoint, and Pod in Kube-APIServer, and deliver it to each microservice SideCar in real time. The Pilot Agent agent component in . This process is delivered through the xDS standard protocol, and the basic logic of delivery is that when Envoy starts, Pilot will push all service instance information to Envoy, and then update data will be pushed to all Envoy when there is an update, so that each microservice Envoy can perceive changes in microservice instance nodes in real time.

This method ensures that the Envoy proxy of each microservice can perceive the changes of the service nodes in the Kubernetes cluster at any time, and also naturally realizes the health detection logic of the microservice. Of course, in order to prevent the problem of Pod IP reuse, Envoy is receiving the changes pushed by the Pilot. When the instance information is used, the service name will also be checked twice. If the service name corresponding to the IP is found to be inconsistent with the previous one, the local data information will be updated in time to prevent calling errors.

With the support of the above service discovery system, when the service consumer calls the target microservice, the traffic will be automatically hijacked by the Envoy proxy in the Pod. At this time, Envoy will map the information according to the service instance stored by itself and deliver the configuration of the control plane. It implements service governance rules such as load balancing, circuit breaker, and current limiting for calls.

This is the basic logic of the Service Mesh microservice architecture represented by Istio to realize service registration discovery. It can be seen that the cooperation between the Envoy data plane and the Pilot control plane components automatically realizes the service discovery logic, and all of this is important for the microservice itself. It is imperceptible!


This article briefly summarizes and outlines the changes in service registration discovery logic from traditional microservice architecture to Service Mesh. It also further feels the progress in architectural design concepts of the Service Mesh microservice architecture represented by Istio from the side. I hope the content of this article can inspire you, and further open the window for you to enter the era of Service Mesh microservice architecture. In the following articles, we will further share the knowledge related to the principle and practice of Service Mesh, so stay tuned !

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us