A brief analysis of microservice full-link grayscale solutions

Service publishing under a single architecture

First, let's take a look at how to release a new version of a service module in a single architecture. As shown in the following figure, the Cart service module in should have a new version iteration:

As the Cart service is an integral part of the application, when the new version goes online, it is necessary to compile, package, and deploy the entire application. The issue of service level publishing has become an issue of service level publishing. We need to implement an effective publishing strategy based on the fact that the new version of the corresponding service is not a service.

Previously, there have been mature service release cases in the industry, such as blue-green release and grayscale release. Blue-green release requires redundant deployment of the new version of the service. Generally, the machine specifications and number of the new version remain the same as the old version, which means that the service has two identical deployment environments. However, at this time, only the old version is providing external services, and the new version is used as a hot standby. When the service is upgraded to a new version, we only need to switch all traffic to the new version, and the old version is used as a hot standby. The schematic diagram of our example is shown below. The traffic switching can be completed by turning off the traffic based on the four layer proxy.

In the blue-green release, due to overall traffic switching, it is necessary to clone a new version of the environment based on the machine size of the original service, which is equivalent to requiring twice the original machine resources. The core idea of grayscale publishing is to forward a portion of online traffic to a new version based on the request content or request traffic, and gradually adjust the request traffic for the new version after grayscale verification is passed. This is a step-by-step publishing method. The schematic diagram of our example of grayscale publishing is shown below. Content based or example based traffic control needs to be accomplished through a microservice gateway with seven tier agents.

Among them, Traffic Routing is a content based grayscale format. For example, traffic with a header tag=gray in the request is routed to the corresponding v2 version; Traffic Shifting is based on the gray scale formula of the example, which divides the online traffic in a different way. Compared to blue-green releases, grayscale releases are more competitive in terms of machine resource costs and flow control capabilities, but the disadvantage is that the release cycle is too long and the requirements for operation and maintenance infrastructure are relatively high.

Service publishing under microservice architecture

In a distributed microservice architecture, the services that are split out of the application are deployed, operated, and iterated independently. When a new version of a single service goes online, we no longer need to launch the corresponding overall version, just focus on the release process of each microservice, as follows:

In order to verify the new version of the service cart, traffic can be selectively routed to the grayscale version of the cart through some form over the entire mediation link. This is a traffic governance issue in the field of microservices governance. Common governance strategies include provider based and consumer based governance strategies.

1. Provider based governance policies. Configure the traffic flow rules of the cart, and use the traffic flow rules of the cart when the user routes to the cart.

2. Consumer based governance policies. Configure the user's traffic outflow rules. When a user routes to a cart, the user's traffic outflow rules are enabled.

In addition, these governance strategies can be combined with the blue-green release and grayscale release scenarios described above to implement true service level version releases.

What is full link grayscale

Continue to consider the scenario of publishing the service cart in the microservice system above. If a new version of the service order needs to be released at this time, as this new function involves common changes to the service cart and order, it is required to enable grayscale traffic to pass through both grayscale versions of the service cart and order during grayscale verification. As shown in the following figure:

According to the two governance strategies proposed in the previous section, we need to additionally configure the governance rules for the service order to ensure that the traffic from the service cart in the grayscale environment is forwarded to the grayscale version of the service order. This approach seems to conform to normal operational logic, but in real business scenarios, the scale and number of microservices in a business far exceed our example. Among them, a request link may pass through several microservices, and the release of new functions may also involve multiple microservices changing simultaneously, and the dependencies between services in a business are complex, with frequent service releases As well as the multiple versions and development of services, traffic governance rules are increasingly expanding, which has brought adverse factors to the maintainability and stability of the entire system.

To address the above issues, developers have proposed an end-to-end grayscale publishing scheme, that is, full link grayscale, based on actual business scenarios and production experience. The full link grayscale governance strategy mainly focuses on the entire mediation chain, regardless of the specific microservices that pass through the link. Traffic control depends on the transfer of services from the request link to the request link. With only a small amount of governance rules, multiple traffic isolation environments can be built from the gateway to the entire back-end service, effectively ensuring the smooth and safe release of multiple closely related services and the simultaneous development of multiple versions of services, Further promote the rapid development of business.

Full link grayscale solution

How to quickly implement full link grayscale in actual business scenarios? Previously, there were mainly two solutions, based on physical environment isolation and based on logical environment isolation.

Physical environment isolation

Physical environment isolation, as the name implies, builds a true sense of traffic isolation by increasing the complexity of the machine.

This type of scenario requires a network isolated, resource independent environment for grayscale services, where grayscale versions of the services are deployed. Due to isolation from the formal environment, other services in the formal environment cannot access services that require grayscale. Therefore, these online services need to be deployed redundantly in the grayscale environment, so that the entire mediation link can normally forward traffic. In addition, some other dependent middleware components such as the registry also need to be redundantly deployed in a grayscale environment to ensure accessibility issues between microservices and ensure that the obtained node IP address only belongs to the current network environment.

This case, which typically focuses on the establishment of enterprise testing and pre release development environments, is not flexible enough for online grayscale publishing and drainage scenarios. Moreover, the existence of multiple versions of microservices is a common occurrence in microservice architectures, and it is necessary to maintain multiple sets of grayscale environments in a machine-based manner for these business scenarios. If you have too many accounts, it will result in excessive costs for operation, maintenance, and machinery, with costs and costs far exceeding benefits; If the number of responses is very large, just two or three responses. This formula is still very convenient and acceptable.

Logical environment isolation

Another solution is to build a logical environment isolation. We only need to deploy the grayscale version of the service. When traffic flows over the mediation link, the grayscale traffic is identified by the gateway, middleware, and microservices that flow through it, and the grayscale version of the corresponding service is dynamically forwarded. As shown in the following figure:

The above figure can well demonstrate the effect of this scheme. We use different colors to represent different versions of grayscale traffic. It can be seen that both the microservice gateway and the microservice itself need to identify traffic and make dynamic decisions based on governance rules. When the service version changes, the forwarding of this call link will also change in real time. Compared to grayscale environments built using machines, this scheme not only can save a lot of machine costs and operation and maintenance manpower, but also can help developers quickly and accurately control online traffic over the full link.

So how to achieve full link grayscale? Through the discussion above, we need to address the following issues:

1. Each component and service on the link can dynamically route based on the characteristics of request traffic.

2. It is necessary to group all nodes under the service to distinguish versions.

3. It is necessary to provide grayscale identification and version identification for traffic.

4. It is necessary to identify different versions of grayscale traffic.

Next, we will introduce the technologies needed to solve the above problems.

Label Routing

Label routing allows service consumers who subscribe to the service node information to access a certain packet of the service on demand, that is, a subset of all nodes, by grouping all nodes under the service according to different tag names and tag values. The service consumer can use any label information on the service provider node. According to the actual meaning of the selected label, the consumer can route the label to more business scenarios.

Node marking

So how do you add different tags to service nodes? Driven by today's hot cloud technology, most businesses are actively engaged in the journey of container transformation. Here, I will take the container based application as an example to introduce how to label the service Workload node in two scenarios: using the Kubernetes service as a service discovery and using the more streamlined Nacos registration.

In a business system that uses Kubernetes Service as a service discovery service, the service provider completes service exposure by submitting a service resource to ApiServer. The service consumer listens to the Endpoint resource associated with the service resource, obtains the associated business Pod resource from the Endpoint resource, reads the labels data from the node, and serves as the metadata information for the node. Therefore, we just need to add a label to the node in the Pod template in the business application description resource deployment.

In business systems that use Nacos as a service discovery tool, it is generally necessary for businesses to determine the marking method based on their microservice framework. If Java should use the Spring Cloud microservice development framework, we can add corresponding environment variables to the business container to complete the tag addition operation. "If we want to add a version gray scale to a node, then add 'spring. cloud. nacos. discovery. metadata. version=gray' to the business container. This way, when the framework registers the node with Nacos, it will add a label 'verison=gray' to it.".

Flow dyeing

How do components on a request link identify different grayscale traffic? The answer is traffic color, which is distinguished by adding different grayscale identifiers to the requested traffic. "We can flag traffic at the source of the request, and the front-end marks the traffic based on different user information or platform information when initiating the request.". If the front-end cannot do this, we can also dynamically add traffic identifiers to requests that match specific routing rules on microservices. In addition, when traffic flows through grayscale nodes in the link, if the request information does not contain a grayscale identifier, it needs to be dynamically colored. Next, traffic can give priority to accessing the grayscale version of the service in the subsequent flow process.

Distributed Link Tracking

Another very important issue is how to ensure that grayscale identifiers can be passed directly through the link? If the request is received at the source, then when the request passes through the gateway, the gateway, as an agent, forwards the request intact to the gateway service, except that the developer implements the request content modification strategy in the gateway's routing strategy. Next, the request traffic will start from the first microservice and transfer to the next microservice. A new transfer request will be formed based on the business code logic. How can we add a grayscale identifier to this new transfer request, which can be passed down the link?

Evolving from a single architecture to a distributed microservice architecture, inter-service coordination has changed from being unable to coordinate between services in the same thread to being able to coordinate between services in a local process and between services in a remote process. Moreover, remote services may be deployed in the form of multiple replicas, and the nodes that flow through multiple requests are unpredictable and uncertain, and each hop of the coordination may result in network failures or service failures. Distributed link tracking technology records in detail the incoming and outgoing requests of a link in a distributed system. The core idea is to record the nodes that the requested link passes through and the time spent on the request through a globally unique traceid and a spanid for each link, where the traceid is required to be passed through the entire link.

With the idea of distributed link tracking, we can also transfer some definition information, such as grayscale identification. The distributed link tracking products commonly used in the industry all support link delivery of user-defined data, and their data processing process is shown in the following figure:

Logical environment isolation

Firstly, it is necessary to maintain dynamic routing functionality. For Spring Cloud and Dubbo development frameworks, you can define a filter for outbound traffic, and complete traffic identification and label routing in this filter. At the same time, it is necessary to use distributed link tracking technology to complete traffic identification link transmission and traffic dynamic monitoring. In addition, it is necessary to introduce a medium sized traffic governance platform, which allows developers of various business lines to define their full link grayscale rules. As shown in the following figure:

Overall, the ability to achieve full link grayscale is relatively high, both in terms of cost and technical complexity, as well as in terms of subsequent maintenance and expansion. However, it does improve application stability during the release process in a more refined manner.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us