Application Environment Capabilities - Alibaba DevOps Practice Guide Part 15

This article is from Alibaba DevOps Practice Guide written by Alibaba Cloud Yunxiao Team

No software can run independently without the presence of an environment on which it depends. The presence of an environment is indispensable in every step, including code writing, commissioning, testing, release, and O&M. The original requirements of environment governance and standardized delivery process based on environment changes includes competing for test environments, adapting to different application scenarios in different stages, software quality protection, and cost and efficiency optimization.

High efficiency is highly valued in the software industry. Learning how to improve the efficiency of development and testing under the existing conditions is very meaningful. If you want to answer this question, environment is a topic that cannot be avoided. If each developer can start development and commissioning in an exclusive environment without external interference, the efficiency would be higher. However, microservices are widely used in today's era, and a software module is developed in parallel by multiple people for different projects. In terms of hardware and maintenance costs, it is unwise to allocate a complete set of environments containing all services to each developer. The effective isolation and orchestration of environments can ensure development efficiency and reduce hardware and maintenance costs simultaneously.

Software development usually requires repeated testing and verification in multiple stages. It is a process that starts from zero to one, from simplicity to abundance, and from instability to stability. As a result, environment features vary in different stages. For example, the initial development environment is used for individual testing, the offline integration environment is used for offline integration testing, the pre-release environment is used for regression testing and acceptance before the launch, and the official environment is used to provide services to users.

We can standardize the original loose and messy development process by connecting different environments in series through processes. In addition, appropriate testing, verification, and access checkpoints planting can be used to ensure that the environment deployment can be started only when the access conditions (such as quality standards) are met. A systematic and procedural method can guide developers to deliver changes in a secure, highly efficient, and trusted manner and avoid any security risks chaos when no rules are established in the delivery processes. It also makes the entire R&D process visible, traceable, and measurable.

In short, two capabilities are required for the change delivery based on application environments:

Isolating test environments for parallel development of a single application by multiple developers without mutual interference
Managing the standardized environment-based change delivery process

Solutions

Developers are adopting microservice architecture, and the number of services is increasing dramatically, converting the complexity of modules inside the original big application into the complexity of calls between multiple small application services. In real-world business scenarios, an entire business is usually composed of multiple small application services. Therefore, a series of upstream and downstream microservice applications generally need to be modified during the business development process. Moreover, the same microservice application may have different service versions during the development process of different businesses. This increases the complexity of joint commissioning between microservices and incurs additional costs during the development and testing processes.

Environment governance needs to figure out ways to reduce the system cost of joint commissioning immediately so developers can focus on their businesses. If you want to support R&D more, classify environments, orchestrate isolated domains, and standardize the delivery process to allow R&D engineers to deliver software products more efficiently. Alibaba has accumulated a series of best practices for test environment governance. These practices will be detailed in the following sections.

Progressive Mechanism of Code Change in Environments

Environments fall into two categories: an online environment and an offline environment. An online environment is an environment where operations and data affect the user services, such as pre-release environment, beta environment, gray release environment, and production environment. Offline environments are mainly provided for R&D personnel to conduct development and testing activities. Currently, offline environments are mainly divided into three categories (according to how they are used): project environment, integration environment, and basic environment. This article focuses on offline environments, which are introduced below:

A project environment is an exclusive environment for a single feature in development for a single application. It is not affected by other features being developed and does not affect the use of other environments. Users can perform any commissioning or destructive testing activities in this environment.
An integration environment is an environment shared by multiple features that are in the development of an application. This environment is primarily used to verify whether the integration of multiple features in development can cause new problems or introduce new conflicts to the business.
A basic environment is an environment that provides service dependencies for the preceding project environment and integration environment. A basic environment is deployed immediately after services are deployed in the production environment. This can ensure that all services provided by the basic environment are the latest running versions in the production environment, guaranteeing that the preceding project environment and integration environment can run stably for development and testing.

The process starts when a feature branch is pulled, and the project environment is assigned automatically. After the offline testing of the feature branch is almost completed, the project environment is converted to an integration environment and tested together with other feature branches. After the integration environment test is successful, the feature is deployed to the online environment. Finally, after the feature branch is merged to the master branch, the basic environment is updated with the latest production version.

Orchestrate Isolated Domains

The current test environments need to be used simultaneously by tens of thousands of development engineers as businesses continue growing. For some core businesses, hundreds of business features are being developed and tested at the same time. Most business features involve the modification of multiple microservices. How can we ensure that these concurrent businesses are independent and do not affect each other? The solution is isolation. Define multiple different microservice environments of the same business feature as an isolated domain and ensure that relevant calls are performed in that domain. Thus, the impact of services provided by other business features on the development and testing of this business feature can be avoided. From the users’ perspective, it seems this business feature has a complete set of environments.

However, microservices are widely used in today's era, and the operation of a business feature usually depends on many other services. If all these services are deployed to support each isolated domain, it would cost a lot. Our solution is to share. From the routing dimension, all the isolated domains share the basic environment, while the project and integration environments of each isolated domain are reserved to reuse as many public services as possible. This solution reduces resource costs and environment maintenance costs.

As shown in the figure above, the user of Project Environment 1 and the user of Project Environment 2 pull an isolated domain for joint commissioning. The traffic between domains is isolated and does not interfere with each other. Services that have no isolated domains reuse these basic environments for a service guarantee. After a feature branch has been integrated, verified, and released to the production environment, the baseline version of the basic environment is automatically updated to continuously keep the same stable version of the production environment and provide stable support services.

Building Basic Environments

In the microservice architecture scenario, a call initiated from the terminal involves services provided by multiple applications on the call procedure. However, only a small number of applications on a procedure need to be changed in real-world development and joint commissioning. If developers need to pull the entire procedure for development and joint commissioning, the efficiency would be low, and the cost would be huge. Therefore, the basic environment can be adopted for service support on applications that do not need to be changed. The focus of building an isolated environment is to ensure the stability and availability of the basic environment. The following tasks have been done to ensure the availability of the basic environment:

Reduce the Costs to Access the Basic Environment: Environment creation is usually a headache for developers. It usually involves a series of upstream and downstream configuration tasks, such as modifying the release process, adding Dockerfile, delivering environment isolation files, preparing test domain names, and applying for corresponding resources. If you want to implement the basic environment solution, build the basic environments in batches. During this process, you must be able to quickly build the basic environments and relevant components in one click. This reduces the cost of accessing the basic environment.
Ensure Service Stability at the Code Level: Only the latest primary code version is deployed in the basic environment. The system process ensures that each time the online deployment is completed and the feature branch code is merged to the master branch, the baseline version is deployed to the basic environment automatically. Users cannot directly deploy features being developed to the basic environment. This guarantees the stability of services in the basic environment from the perspective of code version management.
Sustainable Traffic, Self-Recovery, and Monitoring Assurance: A stable and full-procedure traffic testing and monitoring mechanism is required to ensure the sustainability of stable services in the basic environment and monitor the availability of the basic environment in real-time. When the mechanism detects some unavailable services, self-recovery is implemented by unattended systems. If a service is still unavailable after self-recovery, the automatic work order system is used to send notifications and track the recovery process of the basic environment.

Environment-Based Delivery Process

The entire process from pulling a feature branch of changed code to the final delivery and release of the feature branch is divided into the following phases: Creating a changed feature branch, developing the branch function, commissioning and verifying the branch function automatically, conducting integration verification with multiple other branches, planting access checkpoints, submitting pre-release verification, and officially releasing (shown in the following figure).

Among these phases, the phase of planting access checkpoints for pre-release is to ensure that code that meets specific quality requirements can enter into the pre-release acceptance stage. The low-quality code is blocked in the test environment for continuous verification. The automatic environment deployment aims to reduce the burdens of access checkpoint planting on developers. The code of the feature branch is continuously verified by automated ways to collect and accumulate quality data, providing a basis for planting access checkpoints.

Automatic Environment Deployment

When a user creates a feature branch through the system, the system will apply for a project environment for the new branch automatically. This project environment has the same configuration as the basic environment, and resource allocation and deployment operations are performed automatically. Meanwhile, the calls and messages of this project environment are isolated from other environments. Its services will not affect the calls in other environments.

Each time a new code is committed to the feature branch created by the user, the system will be triggered to perform deployment and automated test verification tasks automatically. the feedback cycle of the change defects in the feature branch is shortened by committing changes and providing feedback continuously, which can help developers fix code defects as soon as possible.

Access Checkpoints in Branch Versions

The opening of the configuration process during the process of R&D and delivery allows the business side to configure the required process steps flexibly, according to the business needs. The business side can also complete the entire delivery process from testing to release through the automatic advancement of the pipeline steps. A pipeline consists of one or more components, and each pipeline can have one or more components in series. In Alibaba best practices, components for code quality control are deployed behind components for certain environments. As such, after the environment is deployed, the testing component can be triggered automatically to check whether the latest version meets the quality requirements and detect whether the latest changed code meets the quality requirements.

If you want to ensure the security and stability of the online environment, check whether the latest version of the branch to be released meets the quality requirements before submitting the pre-release environment for integration and deployment. Code changes that do not meet the quality requirements are blocked in the test environment for correction and verification to ensure the safety of the production environment.

Combining Environments and Testing Technologies

Automated testing has been widely used as an effective verification method. However, there are many complaints about automated test cases in real-world use cases. For example:

The test case, instead of the tested code, is incorrect. This results in spending a lot of time troubleshooting and determining that there is no problem with the code before checking the test case. Troubleshooting costs are high.
The broken windows effect is clear. The laziness of a developer may lead to the failure of a test case. If other developers do not have enough motivation to repair the test case and the test case cannot be passed due to other reasons, this would reduce the requirements for test case quality. Several times later, the situation would become worse.
It is difficult to trace the responsibility. You can only tell whether most test cases succeed or fail. You cannot correlate or compare the code changes of multiple test cases. When a test case fails, users do not know which line of code is changed between the success of the last test case and the failure of this test case, resulting in high failure locating costs.

We need to correlate the code version and test cases from a higher perspective to solve this problem. This is mainly divided into two parts:

Normalize the Regression of Master Branch Test Cases: The system automatically creates application environments based on the master branch every night, isolates and deploys the application environments, and runs corresponding test cases. Since the master branch is less affected, if the test case execution is successful, it indicates that the test cases are effective. Then, the environments will be released. Otherwise, the environments will be retained, and the system will send messages to case owners automatically. Case owners will conduct tests normally and compare the results of the previous day to locate the problem quickly. Finally, they will conclude whether these test cases are incorrect in the first place.
Compare Test Cases to Locate the Code Quickly: The system will save and associate each branch code version and integration branch code version with executed test cases. When a test case fails, it can locate the code version when the last branch test passed and display the commit history to the user to help the user find the cause of the test case failure.

The preceding process must be combined with the one-click environment activation and traffic isolation capabilities. With these two capabilities, when a test case fails, the results can be compared, and the causes can be traced to prevent the broken windows effect of test case failure.

Summary

An application environment solution builds the development environment and the basic environment for an application and explains how to ensure the stability of the environments, standardize the change process, and improve development efficiency. Environment governance needs to be viewed comprehensively from a higher perspective. Otherwise, you may fall into a strange circle of handling environment problems and receiving criticism every year.

Community

Application Environment Capabilities - Alibaba DevOps Practice Guide Part 15

Solutions

Progressive Mechanism of Code Change in Environments

Orchestrate Isolated Domains

Building Basic Environments

Environment-Based Delivery Process

Automatic Environment Deployment

Access Checkpoints in Branch Versions

Combining Environments and Testing Technologies

Summary

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

Bastionhost

Managed Service for Grafana

Microservices Engine (MSE)

DevOps Solution