All Products
Search
Document Center

Alibaba Cloud Service Mesh:Implement a canary release with Mixerless Telemetry

Last Updated:Mar 11, 2026

Canary releases let you roll out application updates gradually by shifting a small percentage of production traffic to the new version and increasing it only when metrics stay healthy. Service Mesh (ASM) uses Mixerless Telemetry to collect request metrics directly from Envoy sidecars, without the legacy Mixer component. This reduces latency and resource overhead compared to Mixer-based telemetry.

This tutorial sets up an automated canary release pipeline that combines three components:

  • Prometheus collects request success rates and latency metrics from Envoy sidecars.

  • Flagger watches those metrics and progressively shifts traffic to the new version.

  • Horizontal Pod Autoscaler (HPA) scales pods based on load during the rollout.

How it works

Flagger automates the canary release lifecycle in five stages:

  1. Detect -- A new revision is detected (for example, an image tag change).

  2. Scale up -- The canary deployment scales up and pre-rollout checks run.

  3. Analyze -- Flagger queries Prometheus at each interval for request success rate and P99 latency. If metrics meet the thresholds, canary traffic increases by a fixed step.

  4. Promote -- Once canary traffic reaches the configured maximum, Flagger copies the canary spec to the primary deployment and routes all traffic to it.

  5. Scale down -- The canary deployment scales to zero and the rollout is marked as succeeded.

If metrics fail threshold checks more than the configured number of times, Flagger routes all traffic back to the primary and marks the rollout as failed. See Automated rollback for details.

Time estimates based on this tutorial's configuration:

ScenarioFormulaDuration
Successful promotioninterval * (maxWeight / stepWeight) = 1 min * (50 / 10)~5 minutes
Rollback on failureinterval * threshold = 1 min * 5~5 minutes

Prerequisites

Before you begin, ensure that you have:

  • An ASM instance with Mixerless Telemetry enabled and connected to Prometheus. For setup instructions, see Use Mixerless Telemetry to observe ASM instances

  • A Container Service for Kubernetes (ACK) cluster connected to the ASM instance

  • kubectl configured for both the ACK cluster and the ASM control plane

  • Helm 3 installed

Step 1: Deploy Flagger

Connect to your ACK cluster and install Flagger with Helm.

  1. Connect to the ACK cluster with kubectl. For instructions, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

  2. Create a Kubernetes secret that stores the ASM kubeconfig so Flagger can manage Istio resources on the ASM control plane:

       # Set aliases for convenience
       alias k="kubectl --kubeconfig $USER_CONFIG"
       alias h="helm --kubeconfig $USER_CONFIG"
    
       # Create a secret from the ASM kubeconfig
       cp $MESH_CONFIG kubeconfig
       k -n istio-system create secret generic istio-kubeconfig --from-file kubeconfig
       k -n istio-system label secret istio-kubeconfig istio/multiCluster=true
  3. Install Flagger from the official Helm chart:

       h repo add flagger https://flagger.app
       h repo update
       k apply -f $FLAAGER_SRC/artifacts/flagger/crd.yaml
       h upgrade -i flagger flagger/flagger --namespace=istio-system \
           --set crd.create=false \
           --set meshProvider=istio \
           --set metricsServer=http://prometheus:9090 \
           --set istio.kubeconfig.secretName=istio-kubeconfig \
           --set istio.kubeconfig.key=kubeconfig

Step 2: Deploy an Istio gateway

Create a gateway on the ASM control plane to expose the application to external traffic.

  1. Connect to the ASM instance with kubectl. For instructions, see Use kubectl on the control plane to access Istio resources.

  2. Save the following YAML as public-gateway.yaml:

       apiVersion: networking.istio.io/v1alpha3
       kind: Gateway
       metadata:
         name: public-gateway
         namespace: istio-system
       spec:
         selector:
           istio: ingressgateway
         servers:
           - port:
               number: 80
               name: http
               protocol: HTTP
             hosts:
               - "*"
  3. Apply the gateway:

       kubectl --kubeconfig <asm-kubeconfig-path> apply -f public-gateway.yaml

    Replace <asm-kubeconfig-path> with the path to the kubeconfig file of the ASM instance.

Step 3: Deploy the sample application

Deploy the podinfo application, an HPA, and a Flagger load tester in the ACK cluster.

  1. Deploy the Flagger load tester, which generates synthetic traffic against the canary during analysis:

       kubectl --kubeconfig <ack-kubeconfig-path> apply -k \
           "https://github.com/fluxcd/flagger//kustomize/tester?ref=main"
  2. Deploy the podinfo application (V3.1.0) and an HPA:

       kubectl --kubeconfig <ack-kubeconfig-path> apply -k \
           "https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main"

    The HPA scales out pods when CPU utilization reaches 99%.

Replace <ack-kubeconfig-path> with the path to the kubeconfig file of the ACK cluster.

Step 4: Configure the canary resource

The canary resource tells Flagger how to manage the release: which deployment to watch, what metrics to check, and how much traffic to shift at each step.

Note

Note: For a full reference on canary resource fields, see How it works in the Flagger documentation.

  1. Save the following YAML as podinfo-canary.yaml:

    Expand to view the YAML file

       apiVersion: flagger.app/v1beta1
       kind: Canary
       metadata:
         name: podinfo
         namespace: test
       spec:
         # Deployment to watch for new revisions
         targetRef:
           apiVersion: apps/v1
           kind: Deployment
           name: podinfo
         # Max time (seconds) for the canary to make progress before rollback (default: 600)
         progressDeadlineSeconds: 60
         # HPA reference (optional)
         autoscalerRef:
           apiVersion: autoscaling/v2beta2
           kind: HorizontalPodAutoscaler
           name: podinfo
         service:
           port: 9898
           targetPort: 9898
           gateways:
           - public-gateway.istio-system.svc.cluster.local
           hosts:
           - '*'
           trafficPolicy:
             tls:
               # Set to ISTIO_MUTUAL when mTLS is enabled
               mode: DISABLE
           retries:
             attempts: 3
             perTryTimeout: 1s
             retryOn: "gateway-error,connect-failure,refused-stream"
         analysis:
           # How often Flagger checks metrics (default: 60s)
           interval: 1m
           # Max failed metric checks before rollback
           threshold: 5
           # Max traffic percentage routed to the canary (0-100)
           maxWeight: 50
           # Traffic increment per analysis cycle (0-100)
           stepWeight: 10
           metrics:
           - name: request-success-rate
             # Minimum success rate (non-5xx responses, 0-100)
             thresholdRange:
               min: 99
             interval: 1m
           - name: request-duration
             # Maximum P99 latency in milliseconds
             thresholdRange:
               max: 500
             interval: 30s
           webhooks:
             - name: acceptance-test
               type: pre-rollout
               url: http://flagger-loadtester.test/
               timeout: 30s
               metadata:
                 type: bash
                 cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
             - name: load-test
               url: http://flagger-loadtester.test/
               timeout: 5s
               metadata:
                 cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"

    Key fields in the analysis section:

    FieldValueEffect
    stepWeight10Increase canary traffic by 10% at each step
    maxWeight50Stop increasing at 50% and promote if metrics pass
    threshold5Roll back after 5 consecutive failed checks
    request-duration.max500P99 latency must stay below 500 ms
    request-success-rate.min99At least 99% of requests must return non-5xx
  2. Apply the canary resource:

       kubectl --kubeconfig <ack-kubeconfig-path> apply -f podinfo-canary.yaml

    After you apply the canary resource, Flagger creates a podinfo-primary deployment as the stable production version and scales the original podinfo deployment to zero. Flagger scales it up again only when it detects a new revision.

Step 5: Trigger the canary release

Update the podinfo container image to trigger a new canary rollout:

kubectl --kubeconfig <ack-kubeconfig-path> -n test \
    set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1

Flagger detects the image change, scales up the canary, and runs the pre-rollout acceptance test. It then begins progressive traffic shifting: 10% -> 20% -> 30% -> 40% -> promotion.

Verify the canary release

Watch the rollout progress

Run a polling loop to watch Flagger events in real time:

while true; do
  kubectl --kubeconfig <ack-kubeconfig-path> -n test describe canary/podinfo
  sleep 10s
done

A successful rollout produces events similar to:

Events:
  Type     Reason  Age                From     Message
  ----     ------  ----               ----     -------
  Warning  Synced  39m                flagger  podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
  Normal   Synced  38m (x2 over 39m)  flagger  all the metrics providers are available!
  Normal   Synced  38m                flagger  Initialization done! podinfo.test
  Normal   Synced  37m                flagger  New revision detected! Scaling up podinfo.test
  Normal   Synced  36m                flagger  Starting canary analysis for podinfo.test
  Normal   Synced  36m                flagger  Pre-rollout check acceptance-test passed
  Normal   Synced  36m                flagger  Advance podinfo.test canary weight 10
  Normal   Synced  35m                flagger  Advance podinfo.test canary weight 20
  Normal   Synced  34m                flagger  Advance podinfo.test canary weight 30
  Normal   Synced  33m                flagger  Advance podinfo.test canary weight 40
  Normal   Synced  29m (x4 over 32m)  flagger  (combined from similar events): Promotion completed! Scaling down podinfo.test

Check canary status

Get a summary of all canary resources:

kubectl --kubeconfig <ack-kubeconfig-path> get canaries --all-namespaces

Example output:

NAMESPACE   NAME      STATUS        WEIGHT   LASTTRANSITIONTIME
test        podinfo   Succeeded     0        2026-03-11T08:15:07Z

Wait for a canary to complete in a CI/CD pipeline:

kubectl --kubeconfig <ack-kubeconfig-path> -n test wait canary/podinfo --for=condition=promoted

Automated rollback

If metrics fail threshold checks during the analysis phase, Flagger automatically rolls back. It routes all traffic to the primary deployment, scales the canary to zero, and marks the rollout as failed.

To test rollback behavior, generate HTTP 500 errors while a canary analysis is active:

# In a separate terminal, send requests that return HTTP 500
watch curl http://podinfo-canary.test:9898/status/500

When the number of failed checks reaches the threshold value (5 in this configuration), the canary events show:

Warning  Synced  flagger  Halt podinfo.test advancement success rate 88.76% < 99%
Warning  Synced  flagger  Rolling back podinfo.test failed checks threshold reached 5
Warning  Synced  flagger  Canary failed! Scaling down podinfo.test

Related topics