ACK One integrates Argo CD GitOps with Argo Rollouts to automate canary releases triggered by Git commits. This tutorial walks you through deploying the required components, setting up a GitOps-managed application, and running a canary release — either with manual promotion or automated promotion based on Prometheus metrics.
Prerequisites
Before you begin, make sure you have:
-
Fleet management enabled. See Enable multi-cluster management.
-
An ACK cluster created and associated with an ACK One Fleet instance. See Create an ACK managed cluster and Associate clusters with a Fleet instance.
-
The kubeconfig file of the Fleet instance, with kubectl connected to the Fleet instance. Download it from the ACK One console.
-
Alibaba Cloud CLI installed and configured. See Install Alibaba Cloud CLI and Configure Alibaba Cloud CLI.
-
The latest Argo Rollouts kubectl plug-in installed. See Controller Installation.
If you want to use GitHub repositories, avoid creating your ACK cluster in a region in the Chinese mainland. If your cluster is already in the Chinese mainland, use a GitHub service provider. This tutorial uses a Fleet instance and an associated ACK cluster deployed in the China (Hong Kong) region.
Key concepts
GitOps is a framework that uses Git repositories as the single source of truth to manage application configuration and drive continuous deployment. For more information, see GitOps overview.
Argo Rollouts is a Kubernetes controller that provides advanced deployment strategies, including blue-green deployment, canary releases, and progressive delivery. For more information, see Argo Rollouts documentation.
Canary release is a deployment strategy that gradually shifts traffic to a new application version, starting with a small subset of users. Because traffic is controlled via the Ingress controller, you can verify the new version in production and roll back instantly by redirecting traffic — without affecting all users.
How it works
Any change to spec.template in a Rollout resource — typically an image tag update committed to Git — triggers a new canary analysis. Argo CD detects the commit, syncs the updated manifest to the cluster, and Argo Rollouts starts shifting traffic according to the steps defined in the Rollout spec.
During a canary release:
-
The canary service routes traffic to the new version.
-
The stable service routes traffic to the current version.
-
The NGINX Ingress controller splits traffic based on the weight defined at each step.
Only changes tospec.templatetrigger a new canary analysis. Changes to labels, annotations, or other metadata outsidespec.templatedo not start a rollout.
Step 2: Deploy the ack-arms-prometheus add-on in the ACK cluster
Managed Service for Prometheus (the ack-arms-prometheus add-on) collects Ingress metrics used for automated canary promotion in Step 4.
-
Log on to the ACK console. In the left-side navigation pane, click Cluster.
-
On the Clusters page, find your cluster and click its name. In the left-side pane, choose Operations > Add-ons.
-
On the Add-ons page, click the Logs and Monitoring tab and find ack-arms-prometheus.
-
If Installed is displayed, the add-on is already active.
-
If Install is displayed, click Install.
-
Step 4: Perform a canary release
Trigger a canary release by updating the container image tag in rollout.yaml and pushing the change to Git. Argo CD detects the commit and Argo Rollouts starts shifting traffic.
Choose one of the following promotion methods:
-
Manual promotion — review the canary yourself before advancing each traffic step.
-
Automated promotion with Prometheus metrics — Argo Rollouts advances the canary automatically when the success rate threshold is met.
Option 1: Manual promotion
This approach pauses the canary after the first traffic step so you can verify the new version before proceeding.
Traffic steps: 20% → pause (indefinite) → 40% (5 min) → 60% (5 min) → 80% (5 min) → 100%
With three timed steps of 5 minutes each, promotion after the first manual approval takes approximately 15 minutes.
-
Update
rollout.yamlwith the new image tag and a manual pause after the first step, then commit and push:apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: rollouts-demo spec: replicas: 4 strategy: canary: canaryService: rollouts-demo-canary stableService: rollouts-demo-stable trafficRouting: nginx: stableIngress: rollouts-demo-stable steps: - setWeight: 20 - pause: {} # Indefinite pause — advance manually by replacing {} with a duration - setWeight: 40 - pause: {duration: 5m} - setWeight: 60 - pause: {duration: 5m} - setWeight: 80 - pause: {duration: 5m} revisionHistoryLimit: 2 selector: matchLabels: app: rollouts-demo template: metadata: labels: app: rollouts-demo spec: containers: - name: rollouts-demo image: argoproj/rollouts-demo:yellow # New image tag ports: - name: http containerPort: 8080 protocol: TCP resources: requests: memory: 32Mi cpu: 5m -
Watch the rollout pause at 20% traffic:
kubectl argo rollouts get rollout rollouts-demo --watchExpected output: The rollout stops at the first
pause: {}step because no duration is set. It advances only after you resume it.
-
Resume the canary release by updating the pause duration in
rollout.yaml, then commit and push:steps: - setWeight: 20 - pause: {duration: 10s} # Replace {} with a duration to resumeThen watch the release complete:
kubectl argo rollouts get rollout rollouts-demo --watchExpected output during promotion:

Expected output after completion:

Option 2: Automated promotion with Prometheus metrics
This approach uses Managed Service for Prometheus to continuously evaluate the canary's HTTP success rate. If the success rate stays at or above 95% throughout the analysis window, the canary is promoted automatically. If it drops below the threshold for more than 10 consecutive checks, the release is rolled back automatically.
Traffic steps: 20% (5 min) → 40% (5 min, analysis starts here) → 60% (5 min) → 80% (5 min) → 100%
Total promotion time: approximately 20 minutes.
Step 4a: Configure the Rollout with metric analysis
Update rollout.yaml with the new image tag and the analysis configuration, then commit and push:
strategy:
canary:
analysis:
templates:
- templateName: success-rate
startingStep: 2 # Start analysis at the 40% step, after initial traffic stabilizes
args:
- name: service-name
value: rollouts-demo-stable
canaryService: rollouts-demo-canary
stableService: rollouts-demo-stable
trafficRouting:
nginx:
stableIngress: rollouts-demo-stable
steps:
- setWeight: 20
- pause: {duration: 5m}
- setWeight: 40
- pause: {duration: 5m}
- setWeight: 60
- pause: {duration: 5m}
- setWeight: 80
- pause: {duration: 5m}
revisionHistoryLimit: 2
selector:
matchLabels:
app: rollouts-demo
template:
metadata:
labels:
app: rollouts-demo
spec:
containers:
- name: rollouts-demo
image: argoproj/rollouts-demo:blue # New image tag
Step 4b: Get the Managed Service for Prometheus endpoint
Managed Service for Prometheus is exposed as a Kubernetes Service at:
http://{ServiceName}.{Namespace}.svc.{ClusterDomain}:{ServicePort}
For the ack-arms-prometheus add-on deployed in the arms-prom namespace with the default cluster domain, the endpoint is:
http://arms-prom-server.arms-prom.svc.cluster.local:9090
Step 4c: Create the AnalysisTemplate
Create analysis.yaml with the following content. The successCondition passes the canary step if the ratio of 2xx responses to all canary requests is 95% or higher over a 5-minute window.
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 5m
successCondition: result[0] >= 0.95 # Promote if 95%+ requests succeed
failureLimit: 10 # Abort after 10 consecutive failures
provider:
prometheus:
address: http://arms-prom-server.arms-prom.svc.cluster.local:9090
query: |
sum(
irate(nginx_ingress_controller_requests{status=~"(1|2).*", canary!="" ,service="{{args.service-name}}"}[5m]))
/
sum(irate(nginx_ingress_controller_requests{canary!="",service="{{args.service-name}}"}[5m])
)
The PromQL query divides the rate of 1xx/2xx canary requests by the total canary request rate. The canary!="" label selector filters for traffic routed through the canary Ingress annotation — this ensures only canary traffic is evaluated, not stable traffic. The service label scopes the query to the specific application.
Step 4d: Generate continuous traffic for metric collection
Prometheus needs a steady request stream to evaluate the success rate. Run the following commands in a separate terminal.
-
Get the Ingress external IP:
kubectl get ingressExpected output:
NAME CLASS HOSTS ADDRESS PORTS AGE rollouts-demo-rollouts-demo-stable-canary nginx rollouts-demo.local 8.217.XX.XX 80 9h rollouts-demo-stable nginx rollouts-demo.local 8.217.XX.XX 80 9h -
Add the host mapping to your local Hosts file:
8.217.XX.XX rollouts-demo.local -
Send continuous requests to the application:
while true; do curl -s "http://rollouts-demo.local/" | grep -o "<title>.*</title>"; sleep 200ms; done
Step 4e: Watch the automated rollout
kubectl argo rollouts get rollout rollouts-demo --watch
Expected output:
To view the success rate metrics in the Alibaba Cloud console:
-
Log on to the ACK console. In the left-side navigation pane, click Cluster.
-
Click the cluster name. In the left-side pane, choose Operations > Prometheus Monitoring.
-
On the Prometheus Monitoring page, click the Network Monitoring tab, then click Ingresses.

After the canary release completes successfully:
Step 5 (optional): Roll back a canary release
If the new version causes issues during the canary release, revert the image tag in rollout.yaml to a known-stable version and push the change to Git. Argo CD syncs the change, and Argo Rollouts shifts all traffic back to the stable version.
What a failing canary looks like
When automated promotion is enabled, a failing canary produces output similar to the following before rollback completes:
Name: rollouts-demo
Namespace: default
Status: ✖ Degraded
Message: RolloutAborted: Rollout aborted update to revision 2: Metric "success-rate" assessed Failed due to failed (10) > failureLimit (10)
Strategy: Canary
Step: 4/8
SetWeight: 40
ActualWeight: 40
Images: argoproj/rollouts-demo:yellow (stable)
argoproj/rollouts-demo:blue (canary, error)
The rollout is aborted when 10 consecutive metric evaluations fall below the 95% threshold (failureLimit: 10). Argo Rollouts then shifts all traffic back to the stable version automatically.
How to roll back manually
Update rollout.yaml with the stable image tag and commit:
spec:
containers:
- name: rollouts-demo
image: argoproj/rollouts-demo:yellow # Revert to the stable image tag
Expected output after rollback:
