How to use traffic mirroring across clusters based on intra-cluster service layer and grid layer

What is traffic mirroring?

Microservices bring us the excellent features of rapid development and deployment, but how to reduce the risk of change brought about by rapid development and deployment has become a crucial issue. Service grid technology provides the function of traffic mirroring (Traffic Mirroring), also known as shadow traffic (Traffic shadowing), which sends a copy of real-time traffic to the mirror service, and the mirror traffic occurs outside the critical request path of the main service. In this way, the production traffic mirror can be copied to the test cluster or a new test version, and the test can be carried out before directing the real-time traffic, which can effectively reduce the risk of version change and bring changes to production with the lowest possible risk. function.

Traffic mirroring has the following advantages:

• When traffic is mirrored to different services, it will happen outside the critical path of the request, so that any problems caused by traffic mirroring will not affect the production environment;
• Ignore the response to any mirrored traffic, the traffic is regarded as "fire and forget", and any response sent back by the mirrored instance will be ignored, so as not to interfere with the normal production traffic response;
• When the traffic is mirrored, the request will be sent to the mirror service through its host or authorization header with "-shadow" to distinguish where the traffic is mirrored from;
• Using real-time production use cases and traffic can have a more realistic test service environment, effectively reducing the risk of deployment;

Typical Scenarios for Enabling Traffic Mirroring

The following describes several typical usage scenarios that can take advantage of traffic mirroring:

• Online traffic simulation and testing: The test of the test cluster can use the real traffic of the production instance without affecting the critical path of normal production. For example, when the old system is to be replaced with a new system or the system has undergone a large-scale transformation, online traffic can be imported into the new system for trial operation; some experimental architecture adjustments can also be simulated and tested through online traffic.
• For new version verification: The output results of production traffic and mirror traffic can be compared in real time. Because it is a full-sample simulation, shadow traffic can be applied to the pre-launch drill of new services. Since traditional manual testing itself is a sample behavior, by importing real traffic patterns, all online situations can be completely simulated, such as Abnormal special characters and tokens with malicious attacks can detect the most realistic processing capabilities of the pre-release service and the ability to handle exceptions.
• For collaborative service version rollback: When a service that uses mirrored traffic needs to modify the collaborative service, because the mirroring mode adds the -shadow flag, the mirroring request can be processed normally and rolled back before submission. Changes to the image version will not affect the production version.
• Isolation test database: For business related to data processing, you can use empty data storage and load test data, and perform mirroring traffic operations on the data to achieve isolation of test data.
• Used for online problem troubleshooting and temporary data collection. For example, for some sudden online problems, offline traffic cannot always be reproduced. At this time, you can temporarily open a branch service, import shadow traffic for debugging and troubleshooting, and It does not affect online services.
• Used for log behavior collection. For recommendation systems and algorithms, samples and data are very core. The biggest challenge faced by traditional automated testing in the application of algorithms is that it is impossible to construct user behavior data in a real environment. Through Shadow traffic can save user behavior in the form of logs, which can be used to build simulation test sample data for recommendation systems and algorithm models, and can also be used as a data source for subsequent big data analysis of user portraits and then applied to recommendation services.

How to enable traffic mirroring

The VirtualService above routes 100% of its traffic to the v1 subset, while also mirroring the same traffic to that v1-mirroring subset. The same request sent to a v1 subset will be replicated and trigger that v1-mirroring subset.

The quickest way to see this effect is to v1-mirroring the application's logs while sending some requests to the v1 version of the application.

The response you will get when calling the application will be from that v1 subset. However, you will also see requests mirrored to that v1-mirroring subset.

Traffic mirroring based on the service layer in the cluster

Let's demonstrate with an example, let all traffic go to the v1 version first, and then use rules to mirror the traffic to the v1-mirroring version:

Step 1: Configure and start the sample application service

Step 2: Create a routing policy

Step 3: Send traffic request

Cross-cluster traffic mirroring based on grid layer

Traffic mirroring based on the gateway layer is generally used to import real online traffic for the pre-release environment, so it is mostly used across clusters. Here, the production environment cluster (clusterA) and the test cluster (clusterB) are taken as examples. The subject request is on the production environment cluster clusterA, and its gateway copies the traffic image to the cluster clusterB, as shown in the following figure:

Step 1: Configure and start the sample application service in the test cluster

Step 2: Configure gateway routing rules in the test cluster

Step 3: Configure external access rules in the grid of the main cluster

View the Envoy config dump in the ingress gateway Pod in the main cluster, and you can see something similar to the following:


Traffic mirroring is primarily used to be able to test services with actual production traffic without affecting the end clients in any way. It's especially useful when we're rewriting an existing service and we want to verify that the new version can handle the real variety of incoming requests in the same way, or when we want to compare between two versions of the same service When benchmarking. It can also be used to do some additional out-of-bounds processing of our requests that can be done asynchronously, such as collecting some additional statistics, or doing some extended logging.
In summary, in practice, mirroring production traffic to a test cluster, whether in production or non-production, is a very effective way to reduce the risk of new deployments. Like large Internet companies, they have been insisting on doing so for many years. The service grid technology provides a layer-7 load-based shadow traffic, which can be easily created whether it is creating a mirror copy within the cluster or implementing traffic replication across clusters. Through traffic mirroring, we can create a more realistic experimental environment, in which we can carry out debugging, testing, or data collection and traffic playback under real traffic, which makes online work a more controllable thing . Moreover, unified management of grid strategies through service grid technology can unify the technology stack, liberate the team from complex technology stacks, and greatly reduce the mental burden of the team.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us