Community Blog Traffic Management with Istio (2): Grayscale Release of Applications by Istio Management

Traffic Management with Istio (2): Grayscale Release of Applications by Istio Management

This article discusses Istio's ability to support canary release based on flexible rules, allowing for a simple and powerful canary testing and launch.

Join us at the Alibaba Cloud ACtivate Online Conference on March 5-6 to challenge assumptions, exchange ideas, and explore what is possible through digital transformation.

In the process of project iteration, you will inevitably need to go online. Going online means deploying or re-deploying, deploying means modifying, and modifying means risk.

Gray is between black and white. Grayscale release refers to a release process involving a smooth transition. Grayscale release can ensure overall system stability. You can find problems and make adjustments at the initial gray scale to minimize the degree of impact. What we often call AB testing or canary release are other forms of grayscale release.

Blue-Green Release

Blue-green release means keeping the old version, deploying the new version then testing for problems, then switching the traffic to the new version. The old version is also upgraded to the new version at the same time. Blue-green deployment is safe and does not interrupt services.


The process is generally as follows:

  1. Step 1: Deploy version 1 of the application (initial state). All external traffic requests will go to this version.
  2. Step 2: Deploy version 2 of the application. The code for version 2 is different from version 1 (new features, bug fixes, and so on).
  3. Step 3: Switch traffic from version 1 to version 2, that is, v1:v2 traffic goes from 100:0 to 0:100.
  4. Step 4: If there is a problem with version 2, you need to roll back to version 1, v1:v2 traffic is switched back to 100:0.

A/B Testing

A/B testing is different from blue-green release. It is a method used to test the performance of an application's functions, such as availability, popularity, visibility, and so on. A/B tests are usually done on specific user groups. Its purpose is to draw representative conclusions through scientific experiment design, sample representativeness testing, traffic segmentation, small-flow tests and so on, and to ensure that the conclusions can be extrapolated to traffic as a whole. This has a slightly different purpose to blue-green release. The purpose of blue-green release is to ensure the safe and stable release of new application versions, which can be rolled back if required.


Canary Release

Canary releases are tested by introducing a new version to a small portion of user traffic, if all goes well, you can increase (gradually if needed) the percentage and gradually replace the older version. If any problems occur during the process, you can abort and roll back to the old version. The easiest way to do this is to randomly select a percentage request to the Canary version, but under more complex scenarios, this can be based on the content of the requests, a specific range of users, or other properties, etc.


Grayscale Release in Kubernetes

The rolling-update feature that comes with Kubernetes provides a progressive update process. Applications can be updated without interrupting operations. A brief overview of Kubernetes's upgrade strategy.

  replicas: 5
    type: RollingUpdate
      maxSurge: 2
      maxUnavailable: 2
  minReadySeconds: 5

  1. maxSurge: the maximum number of pods in the upgrade process can be higher than previously configured. This field can be either a fixed value or a percentage (% ). For example: maxSurge: 1, replicas: 5, means Kubernetes will start up a new pod before deleting an old pod. There will be a maximum of 5 + 1 pods during the entire upgrade process.
  2. maxUnavailable: up to a few pods can be in an unserviceable state. When maxSurge is not zero, this field is also cannot be zero. For example. maxUnavailable: 1, represents the maximum number of pods in an unserviceable state during the entire Kubernetes upgrade.
  3. minReadySeconds: the startup time of the application in the container. Kubernetes will wait the set time before continuing the upgrade process. If this field is not available, Kubernetes assumes that the container is serviceable as soon as turned on.

There are also several ways to achieve rolling upgrades:

  1. Using a kubectl set image
  2. Using kubectl replace
  3. Using kubectl rollout

Although this mechanism works well, it only applies to deployment of properly tested versions, that is, more of a blue/green release scenario. In fact, canary releases in Kubernetes are done using two deployments with the same label. In this case, we can no longer use AutoScaler, because there are two independent AutoScalers to control, and under different load conditions, the percentage of replicas may differ from the desired percentage.

Whether we use one or two deployments, there is a fundamental problem with canary releases using Kubernetes: instance scaling is used to manage traffic. This is because the traffic distribution for the version and deployed copy are not independent on the Kubernetes platform. All pod copies, regardless of version, are treated equally in the kube-proxy loop pool. So the only way to manage traffic received by a specific version is to control the copy percentage. Canary traffic with a small percentage requires many copies (for example 1% will require at least 100 copies). Even if we can ignore this issue, the deployment approach is still very limited, because it only supports simple (random percentage) canary deployments. You still need another solution if you want to route a request to a canary version according to certain rules. That solution is Istio.

Grayscale Release Using Istio

The Istio traffic management model basically allows for the decoupling of traffic from infrastructure scaling, allowing operations personnel to specify the rules to apply to traffic using Pilot instead of specifying which pods/VMS should receive traffic. Decoupling traffic from infrastructure scaling allows Istio to provide a variety of traffic management functions independent of application code. The Envoy sidecar proxy implements these functions.

When using Istio for grayscale release, traffic routing and copy deployment are two completely independent functions. The number of pods in service can be flexible and scale according to the traffic load, and is completely unrelated to controlling traffic routing of the version. This makes it easier to manage canary versions with automatic scaling.

Istio's routing rules are flexible enough to support fine-grained control of traffic percentages (for example, routing 1% of traffic without the need for 100 pods). You can also use other rules to control traffic (for example, route traffic from a specific user to the canary version). As follows:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
  name: addedvalues
  - addedvalues
  - match:
    - headers:
          prefix: yunqi
    - destination:
        host: addedvalues
        subset: v3
  - route:
    - destination:
        host: addedvalues
        Sub set: v2
      weight: 50
    - destination:
        host: addedvalues
        subset: v1
      weight: 50  

Independent of AutoScaler

When using Istio for grayscale release, you no longer need to use the copy percentage to maintain the percentage distribution of traffic, and can safely use Kubernetes HPA to manage copies in the deployment of two versions.

Unified Traffic Routing Rules

A unified definition of traffic routing rules based on those provided by Istio can be used, whether for blue-green release, AB testing or canary release. The details are as follows:

Distribution Based on Traffic Percentage

Istio determines the percentage of traffic distributed based on the traffic percentage entered. The range limit is [0,100].

  1. For example, when version v1 is configured to 0 and version v2 is configured to 100, this corresponds to the rules used in the blue-green release.
  2. Additionally, if the percentage for version v1 is set to x, then x% of service traffic will go to version v1, (100-x)% of traffic will go to version v2, that is, a portion of the traffic from version v1. These are the rules used in canary release or AB testing.

Release Based on a Request

Traffic for a release based on a request will traverse all canary rules apart from the default version. If the rule for a version is satisfied, traffic will go to that version, and if not satisfied, it will go to the default version. These are the rules used in canary release or AB testing.


As mentioned, Istio supports canary release based on flexible rules, and is different to the Kubernetes deployment method. The Istio service mesh provides the basic control needed to manage traffic allocation completely independently of AutoScaler. This allows for a simple and powerful canary testing and launch.

Supporting smart routing for canary deployment is just one of the many functions of Istio, which simplifies the production and deployment of large applications based on microservices. We invite you to use Alibaba Cloud Container Service to quickly set up Istio, an open management platform for microservices that can be more easily integrated into any microservice projects you are working on.

0 0 0
Share on

Xi Ning Wang(王夕宁)

56 posts | 8 followers

You may also like


Xi Ning Wang(王夕宁)

56 posts | 8 followers

Related Products