The Pain of Development: Why Is It So Difficult to Have a Stable Test Environment

Stable test environment Introduction:

Stable test environment.The pain of development: a stable test environment, why is it so difficult. For the production environment, accuracy and stability are the most important, we recommend the application-centric practice method based on OAM and IaC ; for the test environment, isolation, low cost and stable dependencies are the most important, we recommend the isolation test based on the stable environment The practice of the environment, reuse the stable environment, and generate the test environment through traffic isolation and data isolation. Through environmental construction, we solved the resource conflict in the R&D process.

Stable test environment.The concept of an environment is familiar to most developers. A stable, predictable and low-cost environment is also the unanimous demand of everyone.

Stable test environment.As shown in the figure below, we divide the environment into three categories: production environment, test environment, and development environment. In many cases, we will isolate the production environment, test environment, and development environment, just like the firewall in the figure, which is divided into offline environment and online environment.

Stable test environment.However, in actual circumstances, considering many factors such as company size and development costs, the use and division of the environment will undergo some changes.

Stable test environment.For example, based on cost considerations , the first thing to ensure is the production environment, and everything is centered on providing services; the second is the test environment, before migrating to the online environment, we need to perform the corresponding verification in a test environment similar to the production environment, Only after verification in the test environment can it be migrated to the production environment, thus ensuring a stable transition of the system.

Stable test environment, Production Environment


For the production environment, accurate and stable operation is very important, and it also generates a large number of demands for operation and maintenance and governance.
If it is enough to configure one node in the test environment, the production environment needs to consider many issues such as backup, active/standby, offloading, disaster recovery, etc. The purpose is to ensure the stable operation of the environment.
Accuracy and stability are the biggest differences between the production environment and other environments. This feature brings a lot of configuration requirements for operation and maintenance and service governance. How to effectively maintain these configurations is also our original intention to manage configurations in the way of IaC based on the OAM model, which was shared in the previous article.
(Editor's note: Cloud Effect AppStack is a cloud native application delivery platform based on OAM. Enterprises can implement a set of orchestration and multi-environment differentiated deployment through declarative definitions such as application orchestration, placeholders, and variables. Realize the one-click pull up and one-click rollback of the environment. Interested students can click on the end of the article to read the original text and use it for free).
The production environment contains many kinds of configurations, such as application configuration, application image, application operation and maintenance configuration, infrastructure operation and maintenance configuration, etc. The content of these different configurations and images are concerned and managed by different students.
Development and modification of code, code release to change the image and configuration; application operation and maintenance will actively modify the application operation and maintenance configuration; infrastructure operation and maintenance will modify the infrastructure configuration. All configuration changes will have an impact on the production environment, bringing changes to the production environment, which may bring risks.
Therefore, the operation and maintenance and management of the production environment should obviously be jointly responsible for the development and operation and maintenance.

Stable test environment, test environment


Stable test environment Test environments are another important type of environment. There are two types of test environments: one is the integration environment and the other is the pre-release environment. A staging environment is a production-like environment. The integration environment is mainly used for integration testing or functional verification; the pre-release environment is mainly used in the acceptance process.

Stable test environment The goal of the test environment is to conduct independent tests with as few resources as possible, so as to achieve isolation, reuse, and simulation.

For example, the application needs to interact with an external service. If there is a problem with the external service, you can simulate one in the test environment.

Stable test environment Taking a big data product as an example, you may feel that the environment requirements for big data products are too high, and there is no way to make a test environment. Many technical services such as Hive, Kafka, and MySQL will have high requirements on the machine: Hive and Kafka need There are many machines. In addition, Redis is required for caching and Zookeeper for service discovery. The earliest set of test environment, this is obviously very inefficient. If there are 50 developers sharing a test environment, there is almost no way to test in the case of frequent conflicts.

In order to solve this problem, services and applications can be layered, which is divided into three layers here. The first is public basic services, such as Hive and Kafka; then there are independent small services, such as Redis and Zookeeper. In the test environment, there is no problem with Redis and Zookeeper using a single point, and they can run on a virtual machine; the top layer is the application, and only the necessary applications are deployed to complete the required testing work.

Stable test environment Therefore, the test environment will be managed as follows: First, all public services are shared basic services, all test environments depend on these basic services, and the data of each environment is isolated through logical mechanisms (such as namespaces). A set of independent services of Redis and Zookeeper will be deployed in each test environment .

Stable test environment The application layer only deploys the required applications, so that a test environment can be deployed with only a small amount of resources. Many test resource utilization is very low. If you build a complete set of environments, you will find that in 99.99% of cases, the resource utilization is very low.

Stable test environment In addition, the test environment should be a temporary environment, which is very important. If the test environment is used as a long-term environment, the user will get used to a certain environment as his own, such as naming the environment, and other people cannot use this environment, which will cause a lot of waste, after all, the time used every day is limited. of. We hope that the resources of the test environment are a pool that can be reused and destroyed when used up. This also requires improving the test efficiency and doing more tests in the shortest time.

Stable test environment.Development environment

Stable test environment.The development environment is the most involved environment in addition to the production environment and test environment we mentioned above. For example, some tool chains used for development and construction belong to the category of development environment. In the development environment, our focus is on how to run the service smoothly locally.

The ideal development environment can be connected with other services and connected in both directions. Therefore, there are three problems to be solved: first, how does the development environment access the services in the basic environment, such as another Service. The second is how to allow other services to access the services we are developing. The third is how to isolate requests and data from other development environments. This is also a similar problem we encountered in the previous test environment, so similar means are required between development environments. The open source kt -connect of the cloud effect team is a tool designed to solve this problem.

There will also be some corresponding tools in the development environment, as shown in the figure above. You can also take a look at which ones you use frequently.

Stable test environment,The pain of the test environment

Many companies and people will say that the test environment is not enough and the test environment is unstable when they mention the test environment. What challenges do we face in the test environment? Especially distributed applications. After microservices, the challenges faced by distribution become more and more obvious, and many of these challenges are related to the environment.

For example, an application change has not been well verified and entered the integration environment inadvertently. In this way, when it enters the integrated environment, the quality itself cannot be guaranteed. In the integration testing phase, the relationship between applications is very complex. If one service is unstable, other links are likely to be unstable.

This also causes us to often fail to perform daily integration tests well. Because there is no way to guarantee the previous process, the changed application at this time will occupy the pre-release environment, and the pre-release environment is a relatively high-cost environment, which cannot be frequently occupied by someone. Therefore, in order to allow everyone to use pre-release, the use of pre-release will become a batch of many people, so that pre-release becomes a long-term environment, and the consequence is that pre-release time increases, the entire development cycle and Lead times grow. In the continuous delivery process, we will face many challenges in the test environment: instability problems, resource problems, integration problems, etc.

At present, most of the problems in the test environment that you will encounter are mostly due to the lack of effective governance of the service. There are many service methods and high coupling. Once a problem occurs in one service , others will be affected. When the services of an environment are changing, since there are unstable services deployed at any time, the entire environment will also be unstable.

The consequence of the instability of the integrated environment is that a large number of tests are moved to pre-release, and after pre-release becomes a bottleneck, they are migrated online . Any application will eventually use the online environment to make a living.
In summary, the test environment mainly faces the following two challenges:

The first is how to resolve dependencies between services . For example, A's strong dependence on C, the success of A's function depends on C, and after C changes, corresponding verification should be done on A to ensure that the change of C is correct.

Stable test environment.For example, an application change has not been well verified and entered the integration environment inadvertently. In this way, when it enters the integrated environment, the quality itself cannot be guaranteed. In the integration testing phase, the relationship between applications is very complex. If one service is unstable, other links are likely to be unstable.
This also causes us to often fail to perform daily integration tests well. Because there is no way to guarantee the previous process, the changed application at this time will occupy the pre-release environment, and the pre-release environment is a relatively high-cost environment, which cannot be frequently occupied by someone. Therefore, in order to allow everyone to use pre-release, the use of pre-release will become a batch of many people, so that pre-release becomes a long-term environment, and the consequence is that pre-release time increases, the entire development cycle and Lead times grow. In the continuous delivery process, we will face many challenges in the test environment: instability problems, resource problems, integration problems, etc.

Stable test environment.At present, most of the problems in the test environment that you will encounter are mostly due to the lack of effective governance of the service. There are many service methods and high coupling. Once a problem occurs in one service , others will be affected. When the services of an environment are changing, since there are unstable services deployed at any time, the entire environment will also be unstable.

Stable test environment.The consequence of the instability of the integrated environment is that a large number of tests are moved to pre-release, and after pre-release becomes a bottleneck, they are migrated online . Any application will eventually use the online environment to make a living.
In summary, the test environment mainly faces the following two challenges:

The first is how to resolve dependencies between services . For example, A's strong dependence on C, the success of A's function depends on C, and after C changes, corresponding verification should be done on A to ensure that the change of C is correct.


The stability of the machine is mainly: effectively deal with hard disk failures, network failures, etc., and do a good job in system backup and disaster recovery .

The stability of the service itself is mainly: to effectively ensure the availability of each service itself, because if the availability of an application is 90%, then 10 applications are 90% of the 10th power, resulting in the entire system will be very low.

Stable test environment How to ensure the stability of the test environment

Stable test environment.Above we talked about two challenges in the test environment. Any test environment needs to ensure its stability and reduce the risk of using the online environment. So how to ensure the stability of the test environment?
Common practices in the test environment mainly include: dual-machine deployment, N+1 deployment, isolation environment, etc.

Above we talked about two challenges in the test environment. Any test environment needs to ensure its stability and reduce the risk of using the online environment. So how to ensure the stability of the test environment?
Common practices in the test environment mainly include: dual-machine deployment, N+1 deployment, isolation environment, etc.

In order to solve the disadvantage of high resource occupation of dual- machine deployment , the N+1 deployment method came into being. Replace service applications one by one in a rolling fashion. In this way, only one of your machines is in the process of change, and the others are working. This is also the default method of K8S. Generally, a new instance will be generated, and then the old instance will be dropped.

In order to ensure the stability of the test system, we need to do isolation, and try to ensure that other applications are stable except for the applications that we modify.

In Ali, the team introduced the project pre-integration environment , which is called the project environment in Ali. This is an isolated environment, and a separate environment is pulled out in the development stage for a certain feature.
To sum up, the pre-integration environment is isolated and has nothing to do with anyone, and other services that it depends on come from a stable environment to ensure that the dependent services are stable for independent development and testing.

In the early days of the project, the environment that the pre-integration environment of the project depends on is still the daily integration environment. In any case, it is definitely much better than doing nothing and putting it directly into the daily integration environment. At this time, we found that there are still problems in the daily integration environment, because it is not guaranteed that all submissions will be verified in the project pre-integration environment at the beginning of the project, so there may be big problems in the dependencies in the daily integration environment. In fact, the essence It goes back to the day-to-day integration environment we need to manage, and how to maintain relative stability.

In response to the above problems, we introduce the concept of stable environment. Now that we have isolated the environment, but the underlying environment of the isolation dependency is unstable, can we solve the problem if we have a stable environment at this time?
What kind of environment is a stable environment? It is an environment that can be released to the online version. The online environment must be a stable environment, so our stable environment is actually composed of application services that are consistent with the online version, which is consistent with the online service. Online stability, this environment is stable, so we can create an isolation environment in this stable environment to ensure overall stability.
When there is a stable basic environment, after the application is deployed to the production environment, it should also be deployed to the basic environment to provide a basic environment for the test environment as a dependency. With such a basic environment dependency, when we develop applications, the pulled environment is completely isolated, only including a few applications that are closely related to me, and all other dependent services come from the basic environment. of.
The concept of the basic environment is mentioned here, so what is the basic environment? The basic environment is a stable environment. When there is a stable integrated environment, an isolated environment can be used, and feature testing can be based on the isolated environment, and the dependent traffic can also be found in the isolated environment. However, the basic environment has a certain maintenance cost. Although the deployment cost is relatively low, the machine resources occupied by it is not a big problem compared to general large companies, but it may be a problem for small companies . But the main cost is the maintenance of the basic environment, monitoring the basic environment and repairing the problems, which requires a certain investment in manpower.

Stable test environment.The maintainer of the basic environment is generally not the user of the environment, so at this time, a relatively mature mechanism is required to ensure the long-term and stable operation of the basic environment. Let's open our minds. If there is no new basic environment, which environment is the most stable? We used firewalls to separate online and offline. We all know why we are separated. We are afraid of security risks and data pollution, but if our isolation ability is good enough, service routing is good enough, monitoring is good enough It is good enough, and the security protection is good enough, we can use the production environment as the basic environment.

Stable test environment.When the production environment is used as the basic environment, two important problems must be solved. The first is traffic isolation. Traffic isolation is relatively less of a problem. There are many ready-made methods for resource-oriented isolation to current traffic-oriented isolation. The second is data isolation. This is a big challenge. There are many forms of data. For example, message queues are different from ordinary databases, and data warehouses are different. There are many troublesome problems here, but there are ways to solve them at a specific point.

summary
To sum up, for the production environment , accuracy and stability are the most important, we recommend the application-centric practice method based on OAM and IaC ; for the test environment , isolation, low cost and stable dependencies are the most important, we recommend a stable environment based on The practice of isolating the test environment, reuse the stable environment, and generate the test environment through traffic isolation and data isolation . Through environmental construction, we have solved the resource conflict in the R&D process. In the next chapter, we will focus on the collaboration problem in the R&D process.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00