The Mixerless Telemetry technology of Alibaba Cloud Service Mesh (ASM) allows you to obtain telemetry data on containers in a non-intrusive manner. Telemetry data is collected by Prometheus as monitoring metrics. Flagger is a tool that automates the release process of applications. You can use Flagger to monitor the metrics that are collected by Prometheus to manage traffic in canary releases. This topic describes how to use Mixerless Telemetry to implement a canary release.

Prerequisites

Application monitoring metrics are collected by Prometheus. For more information, see Use Mixerless Telemetry to observe ASM instances.

Procedure for implementing a canary release

  1. Connect ASM to Prometheus to collect application monitoring metrics.
  2. Deploy Flagger and an Istio gateway.
  3. Deploy a Flagger load tester to detect traffic routing for the pods of an application in the canary release.
  4. Deploy an application. In this example, the podinfo application V3.1.0 is deployed.
  5. Deploy a Horizontal Pod Autoscaler (HPA) to scale out the pods of the podinfo application if the CPU utilization of the podinfo application reaches 99%.
  6. Implement a canary resource to specify that the traffic routed to the podinfo application is progressively increased by a fixed percentage of 10% if the P99 latency keeps being greater than or equal to 500 ms for 30s.
  7. Flagger copies the podinfo application and generates the podinfo-primary application. The podinfo application is used as the deployment of the canary release version. The podinfo-primary application is used as the deployment of the production version.
  8. Update the podinfo application to V3.1.1.
  9. Flagger monitors the metrics that are collected by Prometheus to manage traffic in the canary release. Flagger progressively increases the traffic routed to the podinfo application V3.1.1 by a fixed percentage of 10% if the P99 latency keeps being greater than or equal to 500 ms for 30s. In addition, the HPA scales out the pods of the podinfo application and scales in the pods of the podinfo-primary application based on the status of the canary release.

Procedure

  1. Use kubectl to connect to a Container Service for Kubernetes (ACK) cluster. For more information, see Connect to Kubernetes clusters by using kubectl.
  2. Run the following commands to deploy Flagger:
    alias k="kubectl --kubeconfig $USER_CONFIG"
    alias h="helm --kubeconfig $USER_CONFIG"
    
    cp $MESH_CONFIG kubeconfig
    k -n istio-system create secret generic istio-kubeconfig --from-file kubeconfig
    k -n istio-system label secret istio-kubeconfig istio/multiCluster=true
    
    h repo add flagger https://flagger.app
    h repo update
    k apply -f $FLAAGER_SRC/artifacts/flagger/crd.yaml
    h upgrade -i flagger flagger/flagger --namespace=istio-system \
        --set crd.create=false \
        --set meshProvider=istio \
        --set metricsServer=http://prometheus:9090 \
        --set istio.kubeconfig.secretName=istio-kubeconfig \
        --set istio.kubeconfig.key=kubeconfig
  3. Use kubectl to connect to an ASM instance. For more information, see Use kubectl to connect to an ASM instance.
  4. Deploy an Istio gateway.
    1. Use the following content to create the public-gateway.yaml file:
      apiVersion: networking.istio.io/v1alpha3
      kind: Gateway
      metadata:
        name: public-gateway
        namespace: istio-system
      spec:
        selector:
          istio: ingressgateway
        servers:
          - port:
              number: 80
              name: http
              protocol: HTTP
            hosts:
              - "*"
    2. Run the following command to deploy the Istio gateway:
      kubectl --kubeconfig <Path of the kubeconfig file of the ASM instance> apply -f resources_canary/public-gateway.yaml
  5. Run the following command to deploy a Flagger load tester in the ACK cluster:
    kubectl --kubeconfig <Path of the kubeconfig file of the ACK cluster> apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main"
  6. Run the following command to deploy the podinfo application and an HPA in the ACK cluster:
    kubectl --kubeconfig <Path of the kubeconfig file of the ACK cluster> apply -k "https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main"
  7. Deploy a canary resource in the ACK cluster.
    Note For more information about a canary resource, see How it works.
    1. Use the following content to create the podinfo-canary.yaml file:
      apiVersion: flagger.app/v1beta1
      kind: Canary
      metadata:
        name: podinfo
        namespace: test
      spec:
        # deployment reference
        targetRef:
          apiVersion: apps/v1
          kind: Deployment
          name: podinfo
        # the maximum time in seconds for the canary deployment
        # to make progress before it is rollback (default 600s)
        progressDeadlineSeconds: 60
        # HPA reference (optional)
        autoscalerRef:
          apiVersion: autoscaling/v2beta2
          kind: HorizontalPodAutoscaler
          name: podinfo
        service:
          # service port number
          port: 9898
          # container port number or name (optional)
          targetPort: 9898
          # Istio gateways (optional)
          gateways:
          - public-gateway.istio-system.svc.cluster.local
          # Istio virtual service host names (optional)
          hosts:
          - '*'
          # Istio traffic policy (optional)
          trafficPolicy:
            tls:
              # use ISTIO_MUTUAL when mTLS is enabled
              mode: DISABLE
          # Istio retry policy (optional)
          retries:
            attempts: 3
            perTryTimeout: 1s
            retryOn: "gateway-error,connect-failure,refused-stream"
        analysis:
          # schedule interval (default 60s)
          interval: 1m
          # max number of failed metric checks before rollback
          threshold: 5
          # max traffic percentage routed to canary
          # percentage (0-100)
          maxWeight: 50
          # canary increment step
          # percentage (0-100)
          stepWeight: 10
          metrics:
          - name: request-success-rate
            # minimum req success rate (non 5xx responses)
            # percentage (0-100)
            thresholdRange:
              min: 99
            interval: 1m
          - name: request-duration
            # maximum req duration P99
            # milliseconds
            thresholdRange:
              max: 500
            interval: 30s
          # testing (optional)
          webhooks:
            - name: acceptance-test
              type: pre-rollout
              url: http://flagger-loadtester.test/
              timeout: 30s
              metadata:
                type: bash
                cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
            - name: load-test
              url: http://flagger-loadtester.test/
              timeout: 5s
              metadata:
                cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"apiVersion: flagger.app/v1beta1
      kind: Canary
      metadata:
        name: podinfo
        namespace: test
      spec:
        # deployment reference
        targetRef:
          apiVersion: apps/v1
          kind: Deployment
          name: podinfo
        # the maximum time in seconds for the canary deployment
        # to make progress before it is rollback (default 600s)
        progressDeadlineSeconds: 60
        # HPA reference (optional)
        autoscalerRef:
          apiVersion: autoscaling/v2beta2
          kind: HorizontalPodAutoscaler
          name: podinfo
        service:
          # service port number
          port: 9898
          # container port number or name (optional)
          targetPort: 9898
          # Istio gateways (optional)
          gateways:
          - public-gateway.istio-system.svc.cluster.local
          # Istio virtual service host names (optional)
          hosts:
          - '*'
          # Istio traffic policy (optional)
          trafficPolicy:
            tls:
              # use ISTIO_MUTUAL when mTLS is enabled
              mode: DISABLE
          # Istio retry policy (optional)
          retries:
            attempts: 3
            perTryTimeout: 1s
            retryOn: "gateway-error,connect-failure,refused-stream"
        analysis:
          # schedule interval (default 60s)
          interval: 1m
          # max number of failed metric checks before rollback
          threshold: 5
          # max traffic percentage routed to canary
          # percentage (0-100)
          maxWeight: 50
          # canary increment step
          # percentage (0-100)
          stepWeight: 10
          metrics:
          - name: request-success-rate
            # minimum req success rate (non 5xx responses)
            # percentage (0-100)
            thresholdRange:
              min: 99
            interval: 1m
          - name: request-duration
            # maximum req duration P99
            # milliseconds
            thresholdRange:
              max: 500
            interval: 30s
          # testing (optional)
          webhooks:
            - name: acceptance-test
              type: pre-rollout
              url: http://flagger-loadtester.test/
              timeout: 30s
              metadata:
                type: bash
                cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
            - name: load-test
              url: http://flagger-loadtester.test/
              timeout: 5s
              metadata:
                cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"
      • stepWeight: the percentage by which the traffic routed to the application is to be progressively increased. In this example, set the value to 10.
      • max: the value of P99 latency that triggers traffic routing.
      • interval: the duration of the value of P99 latency that triggers traffic routing.
    2. Run the following command to deploy the canary resource:
      kubectl --kubeconfig <Path of the kubeconfig file of the ACK cluster> apply -f resources_canary/podinfo-canary.yaml
  8. Run the following command to update the podinfo application from V3.1.0 to V3.1.1:
    kubectl --kubeconfig <Path of the kubeconfig file of the ACK cluster> -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1

Verify whether the canary release is implemented as expected

Run the following command to view the process of progressive traffic routing:

while true; do kubectl --kubeconfig <Path of the kubeconfig file of the ACK cluster> -n test describe canary/podinfo; sleep 10s;done

Expected output:

Events:
  Type     Reason  Age                From     Message
  ----     ------  ----               ----     -------
  Warning  Synced  39m                flagger  podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
  Normal   Synced  38m (x2 over 39m)  flagger  all the metrics providers are available!
  Normal   Synced  38m                flagger  Initialization done! podinfo.test
  Normal   Synced  37m                flagger  New revision detected! Scaling up podinfo.test
  Normal   Synced  36m                flagger  Starting canary analysis for podinfo.test
  Normal   Synced  36m                flagger  Pre-rollout check acceptance-test passed
  Normal   Synced  36m                flagger  Advance podinfo.test canary weight 10
  Normal   Synced  35m                flagger  Advance podinfo.test canary weight 20
  Normal   Synced  34m                flagger  Advance podinfo.test canary weight 30
  Normal   Synced  33m                flagger  Advance podinfo.test canary weight 40
  Normal   Synced  29m (x4 over 32m)  flagger  (combined from similar events): Promotion completed! Scaling down podinfo.test

The result indicates that the traffic routed to the podinfo application V3.1.1 is progressively increased from 10% to 40%.