All Products
Search
Document Center

Alibaba Cloud Service Mesh:Use Mixerless Telemetry to implement a canary release

Last Updated:Mar 04, 2024

When you want to securely release application updates in your production environment and fully monitor results of released updates, you can use the Mixerless Telemetry technology of Service Mesh (ASM) to implement canary releases. This technology allows you to obtain telemetry data of application containers in a non-intrusive manner and uses Prometheus to trace key metrics. With Flagger, a tool that automates the release process for applications, you can monitor the access metrics in Prometheus in real time, accurately control the ratio of the traffic sent to the new version, and gradually deploy the new version to the production environment. This effectively reduces the risk of deployment failures and improves the release efficiency and user experience.

Prerequisites

Application metrics are collected by using Prometheus. For more information, see Use Mixerless Telemetry to observe ASM instances.

Process for implementing a canary release

  1. Connect ASM to Prometheus to collect application metrics.

  2. Deploy Flagger and an Istio gateway.

  3. Deploy a Flagger load tester to detect traffic routing for the pods of your application in the canary release.

  4. Deploy your application. In this example, the podinfo application V3.1.0 is deployed.

  5. Deploy a Horizontal Pod Autoscaler (HPA) to scale out the pods of the podinfo application if the CPU utilization of the podinfo application reaches 99%.

  6. Deploy a canary resource to specify that the traffic routed to the podinfo application is progressively increased by a fixed percentage of 10% if the P99 latency keeps being greater than or equal to 500 ms for 30s.

  7. Flagger copies the podinfo application and generates the podinfo-primary application. The podinfo application is used as the deployment of the canary release version. The podinfo-primary application is used as the deployment of the production version.

  8. Update the podinfo application of the canary release version to V3.1.1.

  9. Flagger monitors the metrics that are collected by Prometheus to manage traffic in the canary release. Flagger progressively increases the traffic routed to the podinfo application V3.1.1 by a fixed percentage of 10% if the P99 latency keeps being greater than or equal to 500 ms for 30s. In addition, the HPA scales out the pods of the podinfo application and scales in the pods of the podinfo-primary application based on the status of the canary release.

Procedure

  1. Connect to your Container Service for Kubernetes (ACK) cluster cluster by using kubectl. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

  2. Run the following commands to deploy Flagger:

    alias k="kubectl --kubeconfig $USER_CONFIG"
    alias h="helm --kubeconfig $USER_CONFIG"
    
    cp $MESH_CONFIG kubeconfig
    k -n istio-system create secret generic istio-kubeconfig --from-file kubeconfig
    k -n istio-system label secret istio-kubeconfig istio/multiCluster=true
    
    h repo add flagger https://flagger.app
    h repo update
    k apply -f $FLAAGER_SRC/artifacts/flagger/crd.yaml
    h upgrade -i flagger flagger/flagger --namespace=istio-system \
        --set crd.create=false \
        --set meshProvider=istio \
        --set metricsServer=http://prometheus:9090 \
        --set istio.kubeconfig.secretName=istio-kubeconfig \
        --set istio.kubeconfig.key=kubeconfig
  3. Use kubectl to connect to your ASM instance. For more information. see Use kubectl on the control plane to access Istio resources.

  4. Deploy an Istio gateway.

    1. Use the following content to create a YAML file named public-gateway.

      apiVersion: networking.istio.io/v1alpha3
      kind: Gateway
      metadata:
        name: public-gateway
        namespace: istio-system
      spec:
        selector:
          istio: ingressgateway
        servers:
          - port:
              number: 80
              name: http
              protocol: HTTP
            hosts:
              - "*"
    2. Run the following command to deploy the Istio gateway:

      kubectl --kubeconfig <Path of the kubeconfig file of the ASM instance> apply -f resources_canary/public-gateway.yaml
  5. Run the following command to deploy a Flagger load tester in the ACK cluster:

    kubectl --kubeconfig <Path of the kubeconfig file of the ACK cluster> apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main"
  6. Run the following command to deploy the podinfo application and an HPA in the ACK cluster:

    kubectl --kubeconfig <Path of the kubeconfig file of the ACK cluster> apply -k "https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main"
  7. Deploy a canary resource in the ACK cluster.

    Note

    For more information about a canary resource, see How it works.

    1. Use the following content to create a YAML file named podinfo-canary:

      Expand to view the YAML file

      apiVersion: flagger.app/v1beta1
      kind: Canary
      metadata:
        name: podinfo
        namespace: test
      spec:
        # deployment reference
        targetRef:
          apiVersion: apps/v1
          kind: Deployment
          name: podinfo
        # the maximum time in seconds for the canary deployment
        # to make progress before it is rollback (default 600s)
        progressDeadlineSeconds: 60
        # HPA reference (optional)
        autoscalerRef:
          apiVersion: autoscaling/v2beta2
          kind: HorizontalPodAutoscaler
          name: podinfo
        service:
          # service port number
          port: 9898
          # container port number or name (optional)
          targetPort: 9898
          # Istio gateways (optional)
          gateways:
          - public-gateway.istio-system.svc.cluster.local
          # Istio virtual service host names (optional)
          hosts:
          - '*'
          # Istio traffic policy (optional)
          trafficPolicy:
            tls:
              # use ISTIO_MUTUAL when mTLS is enabled
              mode: DISABLE
          # Istio retry policy (optional)
          retries:
            attempts: 3
            perTryTimeout: 1s
            retryOn: "gateway-error,connect-failure,refused-stream"
        analysis:
          # schedule interval (default 60s)
          interval: 1m
          # max number of failed metric checks before rollback
          threshold: 5
          # max traffic percentage routed to canary
          # percentage (0-100)
          maxWeight: 50
          # canary increment step
          # percentage (0-100)
          stepWeight: 10
          metrics:
          - name: request-success-rate
            # minimum req success rate (non 5xx responses)
            # percentage (0-100)
            thresholdRange:
              min: 99
            interval: 1m
          - name: request-duration
            # maximum req duration P99
            # milliseconds
            thresholdRange:
              max: 500
            interval: 30s
          # testing (optional)
          webhooks:
            - name: acceptance-test
              type: pre-rollout
              url: http://flagger-loadtester.test/
              timeout: 30s
              metadata:
                type: bash
                cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
            - name: load-test
              url: http://flagger-loadtester.test/
              timeout: 5s
              metadata:
                cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"apiVersion: flagger.app/v1beta1
      kind: Canary
      metadata:
        name: podinfo
        namespace: test
      spec:
        # deployment reference
        targetRef:
          apiVersion: apps/v1
          kind: Deployment
          name: podinfo
        # the maximum time in seconds for the canary deployment
        # to make progress before it is rollback (default 600s)
        progressDeadlineSeconds: 60
        # HPA reference (optional)
        autoscalerRef:
          apiVersion: autoscaling/v2beta2
          kind: HorizontalPodAutoscaler
          name: podinfo
        service:
          # service port number
          port: 9898
          # container port number or name (optional)
          targetPort: 9898
          # Istio gateways (optional)
          gateways:
          - public-gateway.istio-system.svc.cluster.local
          # Istio virtual service host names (optional)
          hosts:
          - '*'
          # Istio traffic policy (optional)
          trafficPolicy:
            tls:
              # use ISTIO_MUTUAL when mTLS is enabled
              mode: DISABLE
          # Istio retry policy (optional)
          retries:
            attempts: 3
            perTryTimeout: 1s
            retryOn: "gateway-error,connect-failure,refused-stream"
        analysis:
          # schedule interval (default 60s)
          interval: 1m
          # max number of failed metric checks before rollback
          threshold: 5
          # max traffic percentage routed to canary
          # percentage (0-100)
          maxWeight: 50
          # canary increment step
          # percentage (0-100)
          stepWeight: 10
          metrics:
          - name: request-success-rate
            # minimum req success rate (non 5xx responses)
            # percentage (0-100)
            thresholdRange:
              min: 99
            interval: 1m
          - name: request-duration
            # maximum req duration P99
            # milliseconds
            thresholdRange:
              max: 500
            interval: 30s
          # testing (optional)
          webhooks:
            - name: acceptance-test
              type: pre-rollout
              url: http://flagger-loadtester.test/
              timeout: 30s
              metadata:
                type: bash
                cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
            - name: load-test
              url: http://flagger-loadtester.test/
              timeout: 5s
              metadata:
                cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"
      • stepWeight: the percentage by which the traffic routed to the application is to be progressively increased. In this example, set the value to 10.

      • max: the value of P99 latency that triggers traffic routing.

      • interval: the duration of the value of P99 latency that triggers traffic routing.

    2. Run the following command to deploy the canary resource:

      kubectl --kubeconfig <Path of the kubeconfig file of the ACK cluster> apply -f resources_canary/podinfo-canary.yaml
  8. Run the following command to update the podinfo application from V3.1.0 to V3.1.1:

    kubectl --kubeconfig <Path of the kubeconfig file of the ACK cluster> -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1

Verify whether the canary release is implemented as expected

Run the following command to view the process of progressive traffic shifting:

while true; do kubectl --kubeconfig <Path of the kubeconfig file of the ACK cluster> -n test describe canary/podinfo; sleep 10s;done

Expected output:

Events:
  Type     Reason  Age                From     Message
  ----     ------  ----               ----     -------
  Warning  Synced  39m                flagger  podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
  Normal   Synced  38m (x2 over 39m)  flagger  all the metrics providers are available!
  Normal   Synced  38m                flagger  Initialization done! podinfo.test
  Normal   Synced  37m                flagger  New revision detected! Scaling up podinfo.test
  Normal   Synced  36m                flagger  Starting canary analysis for podinfo.test
  Normal   Synced  36m                flagger  Pre-rollout check acceptance-test passed
  Normal   Synced  36m                flagger  Advance podinfo.test canary weight 10
  Normal   Synced  35m                flagger  Advance podinfo.test canary weight 20
  Normal   Synced  34m                flagger  Advance podinfo.test canary weight 30
  Normal   Synced  33m                flagger  Advance podinfo.test canary weight 40
  Normal   Synced  29m (x4 over 32m)  flagger  (combined from similar events): Promotion completed! Scaling down podinfo.test

The result indicates that the traffic routed to the podinfo application V3.1.1 is progressively increased from 10% to 40%.