Testing is an essential part of every software development cycle. However, if you have experienced testing a software under realistic environments, you should be familiar with the challenges to ensure reliable and stable tests. In this blog, we discuss several methodologies in order to stabilize tests and ensure a more robust way to test our systems.
Ideally, we hope that each failed test case is caused by real defects. In actual situations, a test case failure often occurs due to the following reasons:
During debugging, developers and testers come across these problems repeatedly, and, as a result, can stop noticing these problems. Some developers and testers simply judge that the problem is caused by the runtime environment and do not continue with the debugging process. As a result, many real defects are ignored.
How can we solve the problems of test stability? Many people say we can ensure test stability through environment, process control, monitoring, tool, more machines, and dedicated personnel. All these measures are acceptable. However, they all are specific solutions, having nothing to do with the methodology and the theoretical system.
In terms of methodology and the theoretical system, we can take three measures to ensure production security. These are grayscale, monitoring, and rollback. Similarly, I also have the following three measures for test stability:
"If it hurts, do it more often" have been my most frequently spoken words. This sentence is sourced from Martin Fowler. If you are interested, you can read his article Frequency Reduces Difficulty.
The benefits of running tests at a high frequency are as follows:
Frequency is not only a measure for ensuring test stability, but also a game changer for handling other engineering problems:
However, it is never an easy thing to implement the high frequency method.
High-frequency testing requires support from infrastructures. First, high-frequency testing requires resources. High-frequency implementation also puts unprecedented pressure on all aspects of infrastructure. Additionally, high-frequency testing requires certain capabilities. Take SRE's high-frequency drills for an example. If many problems are caused during this drill, it is impossible to do this frequently. The premise for frequent drills is that our isolation mechanism and recovery capabilities have reached a certain level. For test and running engineers, to ensure the effectiveness of high frequency tests, isolation and disposable test environments must be ensured.
A common concern for high-frequency test runs is that, since we do not have time to troubleshoot the failed test cases when the test is performed once a day, do we still have the time for troubleshooting when the frequency increases? My answer is positive. This is because the problems converge soon after the high-frequency test run starts, so the total number of problems you need to handle are basically the same or even reduced.
Compared with other two measures (frequency and disposable test environments), the importance of isolation is more widely accepted. Benefits of isolation include:
Isolation is divided into hard isolation and soft isolation. The specific isolation mode depends on the technology stack, architecture, and service form. However, both modes can reach the final state:
The preceding two final results have been achieved in my previous work. They do work. It is a technical challenge to implement either isolation mode in the end. Both cost reduction and complete and reliable logic isolationare technical issues.
For today's payment or e-commerce systems, will our final result be hard isolation or soft isolation? It is hard to say now. Judging from the technical feasibility, soft isolation is more likely to become our final result. Due to physical limits of the architecture, it is difficult to implement hard isolation after a certain level is reached. Breaking through the physical limits of the architecture may lead to new quality blind zones. However, hard isolation will be greatly helpful for us over a long period of time. For example, we need hard isolation when performing various unconventional tests. Technically, it is highly complex to perform unconventional tests in software isolation mode. Since the last fiscal year, I have been working on a project in my team that involves building an incremental testing environment in one click (hard isolation). I have the following reason: It is relatively easy to pull up an incremental testing environment in one click, and the key issue is automation. The router-based soft isolation solution is not ready now and cannot meet our isolation requirements in the short term.
Hard isolation and software isolation do not conflict with each other. They can be used together. For example, when we build a router-based isolation environment, a new database is built. Hard isolation is performed at the database layer to supplement the soft isolation capability, which is absent at the database layer.
In short, isolation is essential. The specific isolation solution depends on complexity, cost, effect, and other factors at each stage.
Another of my favorite sentences is: The test environment is ephemeral. This sentence comes from myself. Ephemeral means short-living. I often repeat these words to my QA team and hope that my team can always remember this principle in their daily work.
The test environment is ephemeral. Consequently, we must have the following capabilities:
With these capabilities, we can build an out-of-the-box test environment that is fast and repeatable with no labor cost. Additionally, we can create all the data required for testing and make the test environment disposable. In other words, we can create a new environment for each test and then destroy the environment after the test is completed. Next time when we need a test environment, we can build a new one. In addition to the test environment, the test executor must also be disposable.
For test environments that must be retained for a certain period of time after the test is completed, we need to set a short upper limit. For example, the following was my practice in the past:
You can use a disposable test environment to:
When the test environment is disposable, some new quality risks are introduced. If you have a set of long-term maintenance environments and the data in the environment is generated by the code of earlier versions, after the code of the new version is deployed, the old data can help us find the data compatibility problems in the new code. If the environment is disposable, no old data is available and the data compatibility may not be detected.
This risk does exist. To solve this problem, we need to work harder. We need to explore other solutions for data compatibility and other tests and methods to guarantee the quality. We even need to think about how to eliminate the data compatibility problem through the architecture design.
The three measures, namely frequency, isolation, and disposable test environments are a bit idealistic. Today, our infrastructure, architecture, and automation are still far away from the ideal state.
Even so, we need to be a bit idealistic. It is quite challenging, technically, to implement these three measures. But we are optimistic, and believe we can eventually achieve our goal. We are also practical. We can break down the goal and achieve it step by step based on actual situations.
 The use cases here mainly refer to functional test cases, including use cases for the unit testing, single-system API testing, and full-link and end-to-end testing.
 In this way, one possible negative impact in practice is that it may discourage microservice governance, including domain autonomy, independent test, and independent release capabilities.
Alibaba Clouder - April 21, 2020
ApsaraDB - March 3, 2020
Alibaba Clouder - March 26, 2020
Alibaba Clouder - August 26, 2020
yzq1989 - April 10, 2020
Alibaba Clouder - November 30, 2018
More Posts by Alibaba Tech