ACK One Fleet Multi-Cluster Canary Release: A "Safety Valve" for AI Inference Services

This article introduces ACK One Fleet's multi-cluster canary release solution, integrated with Kruise Rollout, for safe AI inference deployments across hybrid and geo-distributed clouds.

By Caijing and Mingshan Zhao

Deploying Large Language Models (LLMs) that often exceed tens of gigabytes is high-stakes work. An unregulated release can lead to service disruptions and massive resource waste. The ACK One Fleet multi-cluster canary release capability was designed to avoid this problem.

In the era of large models, AI inference has become a core business pillar for many enterprises. However, model iterations come with significant risks: Will the new model meet performance benchmarks? Will resource consumption spike? In complex geo-distributed and hybrid cloud environments, ensuring a safe and smooth version transition is a primary challenge for engineers.

By combining the ACK One Fleet management capability with Kruise Rollout, Alibaba Cloud provides a cross-cluster canary release solution that makes every update controllable, observable, and reversible.

Why Multi-Cluster Canary Release is "Mandatory" for AI Inference?

For AI inference services—especially LLMs—canary releases have shifted from an optional best practice to a technical requirement due to several unique attributes:

1. Extreme sensitivity to stability

• AI models suffer from slow loading times and high cold-start costs.

• A faulty deployment may cause a sharp drop in Queries per Second (QPS) and a sharp increase in Response Time (RT).

• Rollbacks are complex due to intricate dependencies between model versions and inference engine compatibility.

2. Multi-cluster as the standard

• Due to resource fragmentation, supply shortages, and data compliance, inference services are typically deployed across geo-distributed or hybrid cloud clusters.

3. Traditional deployment risks at scale

• Manually executing kubectl apply across dozens of clusters is prone to human error.

• Disparate scripts and approval workflows for each cluster make unified observability impossible.

• During an incident, identifying the specific cluster or version at fault becomes a difficult problem.

ACK One Fleet + Kruise Rollout solves these challenges by providing a unified management layer.

ACK One Fleet: The Intelligent Orchestrator for AI Workloads

ACK One Fleet[1] is Alibaba Cloud’s enterprise-grade multi-cluster management solution. It provides end-to-end management and intelligent scheduling strategies for AI workloads, aimed at accelerating GPU provisioning and maximizing resource utilization.

1. Geo-distributed model distribution: ModelDistribution

2. Intelligent multi-cluster scheduling and distribution:

Dynamic resource scheduling: Routing workloads to clusters with sufficient capacity.
Inventory-aware scheduling: Instant provisioning of GPU power via elastic node pools.
Priority-based scheduling: Assigning cluster-level priorities to match business logic.
Multi-cluster preemption: Ensuring high-priority services remain operational.
Partial replica scheduling: Leave no idle resource behind and further boost resource efficiency.

3. Multi-cluster HPA: Automatically scaling inference services based on global metrics to ensure stability during traffic surges.

4. Multi-cluster canary release: Providing a secure, stable, and convenient path for version updates.

5. More capabilities continue to evolve...

Kurise Rollout

Kruise Rollout[2] is an open-source progressive delivery framework from the OpenKruise community. As a bypass component, Kruise Rollouts provides advanced progressive delivery capabilities, enabling smoother and more controlled application deployments. It supports various delivery modes—including canary, blue-green, multi-batch, and A/B testing—and is compatible with the Gateway API and various Ingress implementations for seamless integration into existing infrastructure. Overall, Kruise Rollouts is a valuable tool for Kubernetes users looking to optimize their deployment workflows.

Key features:

• Flexible release strategies

Multi-batch update policy for Deployment, CloneSet, StatefulSet, Advanced StatefulSet, Advanced DaemonSet, and DaemonSet.
The canary release policy for Deployments.
The blue-green release policy for Deployments and CloneSets.

• Comprehensive traffic routing policies

Fine-grained, weighted traffic shifting when updating workloads.
A/B testing based on HTTP headers and cookies.
End-to-end traffic canary.

• Broad protocol support

Seamlessly integrates with NGINX, ALB, and Higress Ingress controllers.
Service Mesh integration via the GatewayAPI.
Pluggable Lua scripts for easy extension to other Kubernetes traffic protocols, including CRDs.

ACK One Fleet: Multi-cluster canary release solution

While multi-cluster releases can be handled by cluster or by workload, this solution focuses on the latter.

• By cluster: Suited for services replicated across multiple clusters.

• By workload: Suited for services with split-replica scheduling, such as AI inference workloads deployed across multiple clusters.

The overall solution for ACK One Fleet multi-cluster canary release is a collaborative architecture realized through integration with Kruise Rollout, following the principle of "Centralized Fleet Orchestration, Autonomous Sub-cluster Control":

1. Unified orchestration via ACK One Fleet:

Intelligent scheduling: Define your AI inference services and PropagationPolicy in the fleet. Once the Global Scheduler completes the dispatch, the services are deployed to multiple sub-clusters based on the scheduling results. Combined with multi-cluster HPA, any scaling of replicas is reflected in the Fleet-level service. After these changes are dispatched to sub-clusters, the scaling process strictly follows the defined Rollout strategy.
Policy distribution: Create a Rollout policy in the fleet to define the release rhythm in batches. The fleet automatically propagates this policy to all target sub-clusters.

2. Autonomous control via local Kruise Rollout: The Kruise Rollout controller deployed in each sub-cluster takes over the local release process. Based on the propagated Rollout policy, it automatically manages the update pace of pods within that cluster—for example, updating 10% of replicas first, then pausing for observation.

3. Unified approval with kubectl amc[3]:

When release processes across multiple sub-clusters are paused for manual intervention, you no longer need to log onto multiple clusters individually. By executing the kubectl amc rollout approve command at the fleet level, you can approve all sub-clusters at once, allowing them to proceed to the next batch simultaneously. Individual cluster approval is also supported. This approach eliminates the complexity and potential errors of repeatedly switching between cluster kubeconfig contexts.

Use ack-kruise version 1.8.3 or later for ACK clusters and Kruise Rollout version 0.6.2 or later for on-premises clusters.

Why Choose This Solution?

• Risk isolation and security control:

Limits the blast radius of new versions to specific batches and traffic segments to avoid global failures. Manual approval is supported for extra security.

• Architectural clarity:

ACK One Fleet handles service scheduling and dispatching, while Kruise Rollout manages fine-grained local rollout processes. This separation of concerns ensures a clear and logical architecture.

• Simplified O&M:

Provides a single pane of glass for global release status via the kubectl amc plugin. This eliminates the hassle and potential errors of repeatedly switching between cluster kubeconfig contexts.

• Multi-cluster policy consistency: Ensures that the same Rollout definition governs every cluster, preventing configuration drift.

Practical Example: Deploying a Qwen Model

Suppose you have a Qwen -based text generation service deployed across two clusters in Beijing and Shanghai. You are now upgrading the model to v2 using a batch release strategy. For simplicity, the inference service YAML and propagation policy are not shown in this example.

Step 1: Define and propagate the Rollout policy

Define a Rollout policy in the Fleet with 3 batches (10% -> 50% -> 100%) and manual approval pauses between them.

Apply the PropagationPolicy to distribute to multiple clusters through ACK One Fleet.

apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
  name: qwen-inference-rollout
spec:
  workloadRef:
    apiVersion: apps/v1
    kind: Deployment
    name: qwen-inference 
  strategy:
    canary:
      enableExtraWorkloadForCanary: false
      steps:
      - replicas: 10%
      - replicas: 50%
      - replicas: 100%
---
apiVersion: policy.one.alibabacloud.com/v1alpha1
kind: PropagationPolicy
metadata:
  name: qwen-inference-rollout-pp
  namespace: demo
spec:
  preserveResourcesOnDeletion: false
  resourceSelectors:
  - apiVersion: rollouts.kruise.io/v1beta1
    kind: Rollout
    name: qwen-inference-rollout
  placement:
    replicaScheduling:
      replicaSchedulingType: Duplicated

Step 2: Unified approval via kubectl amc

# View the Rollout status across all sub-clusters
kubectl amc get rollouts -M

# Approve the release for a specific sub-cluster
kubectl amc rollout approve rollouts/qwen-inference-rollout -m ${clusterid}

# Approve the release for all sub-clusters
kubectl amc rollout approve rollouts/qwen-inference-rollout -M

Conclusion

For enterprise AI inference services, stability and continuity are paramount. ACK One Fleet combines precise Kruise Rollout control with global orchestration to offer a robust, enterprise-grade release solution for multi-region and hybrid cloud deployments. This capability is essential for ACK One Fleet to build a high-performance, platform-level environment for multi-cluster AI inference.

Questions or feedback? Join our ACK One customer support group on DingTalk (Group ID: 35688562).

You can also find us in the OpenKruise community (Group ID: 23330762).

References

[1] ACK One Fleet:
https://www.alibabacloud.com/help/en/ack/distributed-cloud-container-platform-for-kubernetes/user-guide/fleet-management-overview

[2] Kruise Rollout:
https://openkruise.io/rollouts/introduction

[3] kubectl amc:
https://www.alibabacloud.com/help/en/ack/distributed-cloud-container-platform-for-kubernetes/user-guide/use-amc

Community

ACK One Fleet Multi-Cluster Canary Release: A "Safety Valve" for AI Inference Services

Why Multi-Cluster Canary Release is "Mandatory" for AI Inference?

1. Extreme sensitivity to stability

2. Multi-cluster as the standard

3. Traditional deployment risks at scale

ACK One Fleet: The Intelligent Orchestrator for AI Workloads

Kurise Rollout

ACK One Fleet: Multi-cluster canary release solution

Practical Example: Deploying a Qwen Model

Step 1: Define and propagate the Rollout policy

Step 2: Unified approval via kubectl amc

Conclusion

References

Read previous post:

Read next post:

Alibaba Container Service

You may also like

Comments

Alibaba Container Service

Related Products

Alibaba Cloud Model Studio

Qwen

Container Service for Kubernetes

Alibaba Cloud for Generative AI