One-stop dynamic multi-environment construction case

Introduction: As an entrepreneurial team, it is very cool to be able to quickly have a one-stop solution to service governance problems. During the discussion and implementation of the entire solution, the R&D team had a deep understanding of K8s, Nginx Ingress, and MSE. "Like our department's R&D team, there is no dedicated operation and maintenance team. Every developer can have a deep understanding of the ins and outs of each product. It makes sense to think about it."
Author: Li Siyuan, Wu Liangjun

problem background

Founded in December 2013, Zhijing Technology is a leading Internet company in the textile industry and a national high-tech enterprise. It owns business sectors such as "Baibu", "Quanbu", "Tiangong", "Zhijing Gold Bar", "Zhijing Textile Intelligent Manufacturing Park", "Zhijing Smart Warehouse Logistics Park", and is committed to using big data, Cloud computing, Internet of Things and other new-generation information technologies will comprehensively open up the information flow, logistics and capital flow of the textile and garment industry, help the industry to achieve collaborative, flexible and intelligent upgrades, and build a digital and intelligent integration of textile and garment vertical integration Service Platform.

As a business team of the group company that has been established for more than 2 years, more and more projects are developed and launched in parallel. It is worth mentioning that we are currently at the beginning stage of micro-service splitting. There are currently 35 micro-services, and there will be about 60 micro-services after the split. In this context, everyone originally used a set of development/testing/production environments to run the R&D process serially. With the increase in the number of projects, development and testing requirements, and the splitting of microservices, the original method is no longer suitable. us. The following is a brief list of the three problems we encountered in the process.

The project test environment is preempted

The most typical problem is that the test environment of a project is often preempted by the testing process of defect repair, resulting in intermittent project testing and lack of immersive experience for testing. At the same time, the testing process has become the main bottleneck of project parallelism, and verification affects the project. The progress of the iteration.

Development joint debugging environment is unstable

In order to ensure the development experience, the development environment allows developers to publish freely. Due to the use of one environment, different students release the development environment, which often leads to interruption of joint debugging. Many development students turn to end-to-end offline joint debugging and deploy upstream and downstream applications on personal machines. After the promotion of microservices, this model is basically difficult to move forward, especially in the face of numerous microservice applications. How to solve the portability of code debugging in the development stage has become the second problem we encountered.

Lack of online grayscale environment

The third problem is also the most important. Before, we lacked a pre-release environment specially provided to product managers for functional verification. After the new function is tested, it is directly launched to the online environment. In order to avoid adverse effects on customers, the R&D team often arranges the release plan at night. Leaving aside the release happiness of the R&D team, the lack of grayscale release capabilities in the online environment means that new features will be released to all users after they are launched. Once product design defects or code loopholes occur, the impact will be huge. It is the entire network, and the risks are huge and uncontrollable.

To sum up, we need to solve the lack of isolated multiple environments offline to support the development and testing of multiple projects, and at the same time, we need to have a flexible traffic routing strategy online to support grayscale release requirements.

Program Research and Exploration

Combined with the actual situation of our company, our goal is that the development team does not depend on the operation and maintenance team, that is, DEV = OPS. We can pull up a logically isolated development/project environment with one click, and support pre-release environment isolation. For the production environment, we can configure grayscale rule traffic and natural traffic to perform full link grayscale verification.

According to our analysis of the current problems and referring to the current solutions on the Internet, they all point to the solutions of project environment governance and service traffic governance. We briefly list several commonly used solutions, and we finally choose Alibaba Cloud Microservice Engine MSE An integrated solution of full-link grayscale + cloud effect application delivery platform APPSTACK.

Self-developed Ribbon implementation


We use the Spring Cloud framework. In the usual business development process, calls between back-end services are often made through Fegin or RestTemplate. This is through the Ribbon component to help us do the load balancing function. The core of Grayscale is routing. We can rewrite Ribbon’s default load balancing algorithm and add traffic routing logic before load balancing calls, which means we can control the forwarding of service traffic.

If this plan is to be implemented, it can indeed evolve from 0 to 1 and then to 100 for big factories. For us, if it is just to implement a routing function, it is really not very difficult to make a simple version that only supports core scenarios things. However, if you want to reach a mature and applicable stage, you need to invest in specialized technical resources to manage and maintain it. At the same time, due to the complexity of the Spring Cloud microservice framework itself, as the number of microservices gradually increases, the links become longer and longer. The positioning and resolution of related microservice governance issues will also consume a lot of time and cost.

Physical isolation (blue-green release)


This solution needs to build a network-isolated and resource-independent environment for the service to be grayscaled, and deploy the grayscale version of the service in it. Due to the isolation from the basic environment, other services in the basic environment cannot access services that require gray scale, so these services need to be deployed redundantly in the gray scale environment so that the entire call link can perform traffic forwarding normally. In addition, some other dependent middleware components such as the registration center also need to be redundantly deployed in the grayscale environment to ensure the visibility between microservices and ensure that the obtained node IP addresses only belong to the current network environment. This solution needs to maintain multiple sets of grayscale environments by stacking machines for these business scenarios, which will cause excessive operation and maintenance and machine costs, and the cost and cost far exceed the benefits; of course, if the number of applications is small, only two or three applications, This method is still very convenient and acceptable.

MSE label routing + APPSTACK application orchestration (our choice)

The documentation for these two products can be found at the link:
Cloud effect application delivery platform AppStack:
Alibaba Cloud Microservice Engine MSE Full Link Grayscale:

We assume that through the above two articles, readers already have a brief understanding of these two products. The introduction in one sentence is: APPSTACK is responsible for application environment management and pipeline release, and MSE is responsible for the full link grayscale of traffic.

Important concepts of MSE full link grayscale

Compared with the following diagram of MSE label routing, we focus on introducing several important concepts of MSE label routing, such as application marking, traffic coloring/automatic coloring, identification link transfer, etc. At the same time, the following figure is also the core principle of our solution. Use domain names to identify different logically isolated environments.


core diagram

1. Application (service) marking

Comparing with the core schematic diagram, we found that each application has a (base/gray) tag. With this tag, we can define traffic rules based on the tag.

When we create an MSE application, we mark the MSE application through specific annotations and environment variables.

Specific annotations:
Environment variables:


Add specific annotations and environment variables

For example, after annotation marking, we can define traffic rules in the label routing of MSE, and we can also see that the service in Nacos has a label-related metadata (_micro.service.env_);


MSE traffic rule configuration


Metadata information in Nacos

There is also an additional way to add a container environment variable:

This approach will add a version attribute to the service metadata of Nacos. For example, the MSE cloud-native gateway will use this version attribute for traffic management, and the MSE full-link grayscale will use the tag defined by alicloud.service.tag for traffic manage.


Add gray related environment variables to the container


Gray node appears in MSE traffic rule configuration


There is metadata information of the gray environment in Nacos


MSE cloud native gateway can choose gray version

2. Flow staining/automatic staining

To put it simply, traffic coloring means that the traffic is marked with a special identifier. For HTTP requests, some identification information is carried in the request header. For Message, it is the identification information carried in the message header. Here we mainly talk about HTTP traffic coloring Problem; one is to manually add identification information to the HTTP request, for example, when the front-end requests the back-end API, add an identification information of xx:111, then we say that the traffic is dyed. And automatic coloring refers to an unidentified HTTP request. After passing through a marked nacos service, when calling the next service, the label information of this nacos service will be automatically included in the request header; finally A simple example is that application a calls b (gray) application, then when application b calls the following application c, it will automatically bring the request header of x-mse-tag:gray, which is automatic coloring.

The x-mse-tag:xxx is specifically mentioned here, which is reserved by the MSE system. It not only represents dyeing, but also represents link delivery (each node on the request link will pass this label in turn. ) and the default routing rules (the service identified by xxx is preferred, and if it is not found, then select the base service - unmarked). This default routing rule does not need to be explicitly defined.

Our solution also makes special use of this point. Compared with the core schematic diagram, we add the traffic identifier to the domain name, then parse the traffic identifier in Ingress-Nginx and pass it all the way through x-mse-tag:xxx In this way, the service identified by xxx is preferentially selected on the entire link, and the base service without the identification is used for the bottom line.

3. Identify link delivery

The traffic is dyed, that is, after the request header has a specific identifier, how can this identifier be passed on in the call link, such as an HTTP request with a header of user-id: 100, which needs to go through A->B-> c. That is, user-id: 100 is brought when calling A, and the request header of user-id: 100 is also expected to be brought when A calls B, and user-id: 100 should also be brought when B calls C. This is the link transfer of the identification. With this identification link transfer, we can define a routing strategy based on the value of user-id for the A/B/C application. The method of MSE identification link transfer is to define the environment variable alicloud.service.header=x-user-id, after the entry application A (all versions, gray+base) increases the environment variable, and then calls B and C During the process, the request header x-user-id will be automatically added for transmission, which makes it convenient for us to define routes at nodes A, B, and C according to specific rules. Of course, the special request header x-mse-tag is the link transfer by default, and MSE will pass this identification layer down and implement the default routing rules (tag first, base at the bottom); the principle of MSE identification link transfer is as follows, With the implementation of the distributed link tracking framework, each application's probe intercepts the request and parses the identifier, and then temporarily stores it in the thread space, and then inserts the identifier into the next request through the probe when it is called later. The identity transfer is completed through the framework of distributed link tracking.


A Brief Introduction to APPSTACK for Alibaba Cloud Effect Application Delivery

We introduced Cloud Effect APPSTACK, the main purpose is to facilitate developers to complete the configuration work required by MSE by themselves through the white screen management method. At the same time, under the micro-service architecture, we hope that after the application is split, each application has its own the owner.


Through APPSTACK, we can shield the deployment, service, ingress and other details of K8s. R&D students are oriented to application + environment + pipeline. In this way, in the end, developers complete the environment deployment of the application through the pipeline in APPSTACK, and each environment will be marked with different labels according to the requirements of MSE label routing.


Application deployment in multiple environments


Each environment will be marked with a different logo according to the requirements of MSE label routing

Here we will not expand on the core functions of APPSTACK. The main thing here is to use application orchestration to allow each environment of each application to be deployed, and various environment variables and annotations required by MSE label routing can be set.

our solution

After investigating the above capabilities, we defined the abstraction of multiple environments according to the actual scenarios and business needs of our company and the characteristics of different environments. Based on this, we built a one-stop dynamic multi-environment capability, and designed it for the main scenarios. different implementations. environment definition

Through the investigation of Alibaba Cloud microservice engine MSE label routing and cloud effect application orchestration APPSTACK, combined with the problems we faced earlier, we finally defined the environment system required by our entire R&D system: multiple sets of development environments ( Including basic environment) + multiple sets of project environments (including basic environment) + (integrated) test environment + pre-release environment + (support grayscale) production environment, as shown in the figure below


Multiple development environments: The goal is to support the development joint debugging of multiple projects in the development stage. The core requirement is that each project is dynamically isolated and supports device-cloud interconnection. The dynamic isolation of projects means that each project has its own development joint debugging environment and only Changed applications need to be deployed. Device-Cloud Interconnect develops and registers its own locally running applications into the MSE system to achieve the purpose of local debugging. The two R&D teams can perform local debugging on a point-to-point basis to track problems. The basic development environment is responsible for invoking the bottom-line service. After each application is produced and deployed, the basic development environment needs to be updated synchronously to ensure that the basic environment is the latest production version.

Multiple sets of project environments: The goal is to support large-scale projects that take a long time, such as major technical transformation and major business projects, which require a long-term occupancy of the test environment to conduct stable tests with internal and external related parties. The core requirement is the dynamic isolation of each project. The definition of project dynamic isolation is the same as above.

Test environment: The goal is to support short, flat and fast project testing and integration testing, such as daily defect repair, or multiple small projects that need to be integrated and released together. It is also the environment for our daily automated testing. The feature branch in the project environment also needs to pass the automated test of the test environment before it can go online.

Pre-release environment: The goal is to support product managers to verify product functions in a real environment and perform acceptance checks. The infrastructure used in the pre-release environment, such as the database, is consistent with the production environment. Of course, higher requirements will be put forward for system design, such as It is necessary to maintain forward compatibility, just like a database, which can only add columns but not decrease columns, and cannot use select * in sql, etc. We use DMS to constrain the database structure change and ensure it through code inspection, so I won’t go into details here.

Production environment: The goal is to support the full link gray scale of regular traffic + natural traffic. The regular traffic here refers to the traffic with obvious characteristics. The traffic rules of MSE can clearly define the requests, such as request headers, parameters, and cookies. , the data in the body conforms to the rules. Natural traffic is the opposite. We do not specify any characteristics. For example, 1% of all traffic is imported into the grayscale environment. We understand this as natural traffic.

On the whole, in the current environment system, the development environment and the project environment involve dynamic isolation, so it is necessary to deploy the basic environment to complete the service capability. This basic environment is also the provider of unlabeled (base) applications in MSE label routing. .

The circulation process of this set of environmental system mainly includes:

1. Pull the feature branch into the development environment for local development and joint debugging of front-end and back-end, and then test it to the project environment

2. After the project environment is tested by the test team, the application is deployed to the (integrated) test environment

3. In the (integration) test environment, the integration is completed together with other feature branches, and after automated testing and simple verification, it can be deployed to the pre-release environment

4. The product manager conducts the functional acceptance test in the pre-release environment, and after passing it, it can be released to the production environment for grayscale verification

5. In the production environment, grayscale verification can be performed according to regular traffic + natural traffic, and all traffic can be imported after passing

6. Finally, after merging the feature branch into the trunk, update the development/project base environment with the latest production version.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us