By Zeng Yuxing (Yuzeng)
Auditing & Proofreading by Zeng Yuxing (Yuzeng)
It is time-consuming to build a complete testing system for verification before you launch new business features in the microservice software architecture. The difficulty increases as the number of split microservices continue to increase. The machine cost required for the testing system is high. This system must be maintained exclusively to ensure the efficiency of feature correctness verification before you launch the new version of an application. When the business becomes large and complex, multiple systems are required. This is a cost and efficiency challenge that the entire industry faces. If the feature verification of the new version can be completed in the same production system before the new version is launched, the cost of human and financial resources can be reduced significantly.
In addition to the feature verification in the development phase, the introduction of canary release in the production environment can help control the risk and blast radius of the new version of the software. Canary release allocates production traffic with specific characteristics by certain proportions to the service version that needs to be verified. In this process, you can observe whether the running state of the new version meets expectations after it is launched.
Alibaba Cloud ASM Pro is a comprehensive-procedure canary solution built based on Service Mesh that can help solve the problems in the preceding two scenarios.
The following diagram shows the feature-oriented architecture of ASM Pro services:
The diagram shows the core capabilities of ASM Pro, including traffic tagging, tag routing, and traffic fallback. We will discuss them in detail throughout this article.
The following figure shows the common scenarios of the comprehensive-procedure canary release:
Let's take Bookinfo as an example. The inbound traffic contains the expected tag group. The sidecar routes and distributes traffic to the corresponding tag group by obtaining the expected tag in the context (Header or Context). If the corresponding tag group does not exist, the traffic fallbacks are routed to the base group by default. You can configure the fallback policy as needed.
The tag of the inbound traffic is added by tagging the request traffic at the gateway in a way similar to using a tag plug-in. For example, you can add a tag that means canary to userids in a certain range. Considering the diversity of implementations and the selection of gateways in the actual environment, the implementation of gateways will not be discussed in this article.
The following part describes how to realize comprehensive-procedure traffic tagging and canary using ASM Pro.
Inbound refers to the inbound traffic of requests sent to the application, and outbound refers to the outbound traffic of requests sent by the application.
The preceding figure shows a typical traffic path of a business application after the mesh is enabled. The application receives an external request p1 and calls the operation of the service that the application depends on. The traffic path of the request is p1->p2->p3->p4. The Sidecar forwards p1 to generate p2 and forwards p3 to generate p4. If you want to achieve comprehensive-procedure canary, both p3 and p4 need to obtain the traffic tag from p1 to route the request to the backend service instance corresponding to the tag. Besides, p3 and p4 must carry the same tag. The key technology here is to make the passing of tags imperceptible to the application, the pass-through of tags. ASM Pro uses traceId in distributed tracing analysis technologies to realize this feature, such as OpenTracing and OpenTelemetry.
The distributed tracing analysis technology uses a traceId to uniquely identify a complete call trace. Fanout calls issued by each application on the trace carry the source traceId through the distributed tracing analysis SDK. The implementation of the comprehensive-procedure canary solution by using ASM Pro is based on this widely adopted practice of distributed application architecture.
In the preceding figure, the inbound and outbound traffic are independent of the Sidecar. The Sidecar cannot perceive the correspondence between the two, and it is unclear whether one inbound request causes multiple outbound requests. In other words, the Sidecar does not know whether there is a correspondence between p1 and p3 requests.
In the comprehensive-procedure canary solution of ASM Pro, p1 and p3 requests are associated by traceId, specifically by the trace header x-request-id in the Sidecar. The Sidecar maintains a mapping table that records the correspondence between traceId and tags. When Sidecar receives the p1 request, it stores the traceId and tags in the table. When the Sidecar receives the p3 request, it queries the tag corresponding to the traceId from the mapping table and adds the tag to the p4 request. The following figure shows this implementation principle:
In other words, the comprehensive-procedure canary feature of ASM Pro requires applications to use distributed tracing analysis technologies. If the application does not use distributed tracing analysis technologies, transformation is involved for the application to use the canary feature. Java applications can use Java Agent to realize the pass-through of traceIDs between inbound and outbound traffic without transformation through Aspect Oriented Programming (AOP).
ASM Pro introduces the new TrafficLabel CRD to define where the Sidecar obtains the traffic tag that needs to be passed through. The following sample YAML file defines the source of the traffic tag and specifies the requirement to store the tag in OpenTracing (specifically, the x-trace header). In the example, the traffic tag is named trafficLabel, and the values are obtained from $getContext(x-request-id) to $(localLabel) in the local environment.
apiVersion: istio.alibabacloud.com/v1beta1 kind: TrafficLabel metadata: name: default spec: rules: - labels: - name: trafficLabel valueFrom: - $getContext(x-request-id) // Aliyun arms, if used, corresponds to x-b3-traceid. -$(localLabel) attachTo: - opentracing # indicates the effective agreements. Blank indicates that all agreements are ineffective, and an asterisk (*) indicates that all agreements are effective. protocols: "*"
The CR definition consists of two parts, namely tag acquisition and storage.
The label name of the local deployment environment is ASM_TRAFFIC_TAG. You can use the CI/CD system for actual deployment.
After TrafficLabel is defined, we know how to tag traffic and pass tags. However, this is not enough for the comprehensive-procedure canary feature. We also need to route based on trafficLabel, which is tag routing. Meanwhile, we need logic, such as routing fallback, so degradation can be realized if the destination of the route does not exist.
The implementation of this feature extends the VirtualService and DestinationRule of Istio.
The custom group subset corresponds to the value of the trafficLabel.
apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: myapp spec: host: myapp/* subsets: -name: myproject # Project Environment labels: env: abc -name: isolation # Isolated Environment labels: env: xxx # Machine Group -name: testing-trunk # Trunk Environment labels: env: yyy -name: testing # Daily Environment labels: env: zzz --- apiVersion: networking.istio.io/v1alpha3 kind: ServiceEntry metadata: name: myapp spec: hosts: -myapp/* ports: - number: 12200 name: http protocol: HTTP endpoints: - address: 0.0.0.0 labels: env: abc - address: 184.108.40.206 labels: env: xxx - address: 220.127.116.11 labels: env: zzz - address: 18.104.22.168 labels: env: yyy
You can use one of the following methods to specify Subset:
The global default mode corresponds to a lane, indicating a single closed environment. Environment-level fallback policies are also specified. The custom group subset corresponds to the value of the trafficLabel.
The following sample code provides an example of the configuration:
apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: default-route spec: hosts: # take effect for all applications - */* http: - name: default-route route: - destination: subset: $trafficLabel weight: 100 fallback: case: noinstances target: testing-trunk - destination: host: */* subset: testing-trunk # Trunk Environment weight: 0 fallback: case: noavailabled target: testing - destination: subset: testing # Daily Environment weight: 0 fallback: case: noavailabled target: mock - destination: host: */* subset: mock # Mock Center weight: 0
apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: projectx-route spec: hosts: # only effective for myapp -myapp/* http: - name: dev-x-route match: trafficLabel: -exact: dev-x # dev Environment: x route: - destination: host: myapp/* subset: testing # Daily Environment weight: 100 fallback: case: noinstances target: testing-trunk - destination: host: myapp/* subset: testing-trunk # Trunk Environment weight: 0
80% of traffic tagged to the trunk environment is allocated to the trunk environment, and 20% is allocated to the daily environment when the local environment is dev-x. If no service resources are available in the trunk environment, traffic is allocated to the daily environment.
sourceLabels is the label corresponding to the local workload.
apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: dev-x-route spec: hosts: # effective for which applications (multi-application configuration is not supported) -myapp/* http: - name: dev-x-route match: trafficLabel: -exact: testing-trunk # Trunk Environment Label sourceLabels: -exact: dev-x # indicates that traffic comes from a project environment route: - destination: host: myapp/* subset: testing-trunk# 80% of the traffic is allocated to the trunk environment weight: 80 fallback: case: noavailabled target: testing -destination: host: myapp/* subset: testing # 20% of the traffic is allocated to the daily environment weight: 20
This solution relies on the relevant identifier when the service is deployed. In the example, the corresponding label is ASM_TRAFFIC_TAG: xxx. A common identifier is the environment. The identifier is the relevant meta-information of the service deployment. This depends on the connection of the upstream deployment system: CI/CD. The following figure shows the general process.
Note: ASM Pro has exclusively developed ServiceDiretory components (please see the featured architecture diagram of ASM Pro services) to connect between multiple registries and dynamically acquire deployment meta-information.
The following figure shows a typical multi-set governance feature for the development environment based on traffic tagging and tag routing. Each developer only needs to deploy services that have version updates in the corresponding Dev X environment. If you need to coordinate with other developers, you can forward the service fallback request to the required development environment by configuring fallback. In the following example, a fallback request is forwarded from B in the Dev Y environment to C in the Dev X environment.
Similarly, you can equate the Dev X environment with the online canary version environment. This can help solve the problems of comprehensive-procedure canary release in the online environment.
The traffic tagging and tag routing capabilities described in this article are general solutions. They can help solve problems related to testing environment governance and online comprehensive-procedure canary release. You can make the solution independent from development languages using service mesh technologies. The solution also applies to different 7-layer protocols. Currently, HTTP/gRpc and Dubbo protocols are supported.
Other service providers also offer solutions to realize comprehensive-procedure canary. ASM Pro has the following advantages:
The traffic tagging and tag routing capabilities can also be used in the following scenarios:
Alibaba Cloud Storage - June 4, 2019
Aliware - July 21, 2021
Alibaba Cloud Native Community - September 20, 2022
Alibaba Cloud Community - March 1, 2022
Aliware - August 18, 2021
ApsaraDB - July 23, 2021
Alibaba Cloud Service Mesh (ASM) is a fully managed service mesh platform that is compatible with Istio.Learn More
Alibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.Learn More
MSE provides a fully managed registration and configuration center, and gateway and microservices governance capabilities.Learn More
Accelerate and secure the development, deployment, and management of containerized applications cost-effectively.Learn More
More Posts by Alibaba Cloud Native Community