All Products
Search
Document Center

Container Service for Kubernetes:Build a zone-disaster recovery system

Last Updated:Mar 26, 2026

The Application Load Balancer (ALB) multi-cluster gateways of Distributed Cloud Container Platform for Kubernetes (ACK One) let you build a zone-disaster recovery system that automatically shifts traffic between clusters when a fault occurs. This topic walks you through deploying an application across two availability zones (AZs), configuring an ALB multi-cluster gateway, and verifying seamless failover.

Disaster recovery models

Cloud disaster recovery falls into three models. Choose the model that matches your recovery objectives and operational complexity.

Model Protects against Trade-off
Zone-disaster recovery Zone-level events: fire, network outages, power failure Low latency between same-region data centers; simpler to implement
Active geo-redundancy Region-level disasters: floods, earthquakes Higher inter-data-center latency; stronger protection
Three data centers across two zones Both zone-level and region-level failures Combines the two models above; highest complexity

Zone-disaster recovery includes two sub-modes:

  • Active zone-redundancy: Both clusters serve live traffic simultaneously.

  • Primary/secondary disaster recovery: One cluster handles traffic while the other stays on standby.

Architectural layers

A typical enterprise application has three layers, each requiring its own disaster recovery strategy:

Layer Role ACK One coverage
Access layer Entry point for ingress traffic; routes requests to the backend based on forwarding rules Multi-cluster gateways; zone-disaster recovery supported by default
Application layer Hosts and processes application workloads Multi-cluster gateways; supports active zone-redundancy, primary/secondary disaster recovery, and geo-redundancy
Data layer Stores and serves data to the application layer Middleware-dependent (for example, ApsaraDB RDS)

Why use multi-cluster gateways

Multi-cluster gateways offer several advantages over DNS-based traffic distribution:

  • Single IP address per region: DNS-based approaches require one load balancer IP per cluster. Multi-cluster gateways use a single load balancer IP address and deploy across multiple zones within the region for built-in high availability.

  • Layer 7 request forwarding: Multi-cluster gateways route traffic at Layer 7, enabling host- and path-based rules. DNS-based approaches do not support Layer 7 request forwarding.

  • Seamless failover: When a cluster goes down, traffic shifts immediately to healthy pods in another cluster. DNS-based approaches require clients to cache DNS query results during IP address switching, causing temporary service interruptions.

  • Centralized management: Multi-cluster gateways are region-level resources managed from the Fleet instance. No Ingress controller installation or Ingress creation is needed in each Container Service for Kubernetes (ACK) cluster.

Architecture

This tutorial uses a web application (a Deployment and a Service) to demonstrate zone-disaster recovery across two clusters.

image

The setup works as follows:

  • Cluster 1 and Cluster 2 are deployed in AZ 1 and AZ 2 within the China (Hong Kong) region.

  • ACK One GitOps distributes the web application to both clusters.

  • An AlbConfig on the ACK One Fleet instance creates an ALB multi-cluster gateway that spans both clusters.

  • Ingress rules on the Fleet instance route traffic across clusters by weight. When one cluster becomes unavailable, the gateway automatically redirects traffic to the other.

  • Data synchronization between clusters uses ApsaraDB RDS and has middleware dependencies.

Prerequisites

Before you begin, ensure that you have:

Step 1: Distribute the application to multiple clusters

ACK One supports two approaches for distributing an application across clusters: GitOps and the multi-cluster application distribution feature. This tutorial uses GitOps.

  1. Log on to the ACK One console. In the left-side navigation pane, choose Fleet > Multi-cluster Applications.

  2. In the upper-left corner of the Multi-cluster Applications page, click Dingtalk_20231226104633.jpg next to the Fleet instance name and select your Fleet instance from the drop-down list.

  3. Choose Create Multi-cluster Application > GitOps to open the Create Multi-cluster Application - GitOps page.

    If GitOps is not enabled for your Fleet instance, enable it first. See Enable GitOps for the Fleet instance. To allow internet access to GitOps, see Enable public access to Argo CD.
  4. On the Create from YAML tab, paste the following ApplicationSet template into the code editor and click OK. This template deploys web-demo to all clusters associated with your Fleet instance. Alternatively, use the Quick Create tab to select clusters individually — changes made there sync automatically to this YAML.

    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    metadata:
      name: appset-web-demo
      namespace: argocd
    spec:
      template:
        metadata:
          name: '{{.metadata.annotations.cluster_id}}-web-demo'
          namespace: argocd
        spec:
          destination:
            name: '{{.name}}'
            namespace: gateway-demo
          project: default
          source:
            repoURL: https://github.com/AliyunContainerService/gitops-demo.git
            path: manifests/helm/web-demo
            targetRevision: main
            helm:
              valueFiles:
                - values.yaml
              parameters:
                - name: envCluster
                  value: '{{.metadata.annotations.cluster_name}}'
          syncPolicy:
            automated: {}
            syncOptions:
              - CreateNamespace=true
      generators:
        - clusters:
            selector:
              matchExpressions:
                - values:
                    - cluster
                  key: argocd.argoproj.io/secret-type
                  operator: In
                - values:
                    - in-cluster
                  key: name
                  operator: NotIn
      goTemplateOptions:
        - missingkey=error
      syncPolicy:
        preserveResourcesOnDeletion: false
      goTemplate: true

For other distribution methods, see Getting started with GitOps, Create a multi-cluster application, and Get started with application distribution.

Step 2: Deploy an ALB multi-cluster gateway

Create an AlbConfig on the Fleet instance to provision an ALB gateway that spans both clusters.

  1. Get the IDs of two vSwitches in the VPC where your Fleet instance resides.

  2. Create a file named gateway.yaml with the following content. Replace ${vsw-id1} and ${vsw-id2} with the vSwitch IDs, and replace ${cluster1} and ${cluster2} with the IDs of the associated clusters.

    For ${cluster1} and ${cluster2}, configure the inbound rules of their security groups to allow access from all IP addresses and ports within the vSwitch CIDR block.
    Parameter Required Description
    metadata.name Yes Name of the AlbConfig.
    alb.ingress.kubernetes.io/remote-clusters Yes Comma-separated list of cluster IDs to associate with the ALB gateway. These clusters must already be associated with the Fleet instance.
    spec.config.name No Name of the ALB instance.
    spec.config.addressType No Network type: Internet (default) for public access, or Intranet for VPC-internal access only. Internet-facing instances require an elastic IP address (EIP) and incur EIP fees. See Pay-as-you-go.
    spec.config.zoneMappings Yes vSwitch IDs for the ALB instance. Specify vSwitches in at least two zones for high availability. The vSwitches must be in zones supported by ALB and in the same VPC as the clusters. See Regions and zones in which ALB is available and Create and manage a vSwitch.
    spec.listeners No Listener port and protocol. This example configures HTTP on port 8001. Keep the listener configuration — without it, you must create a listener manually before ALB Ingresses can route traffic.
    apiVersion: alibabacloud.com/v1
    kind: AlbConfig
    metadata:
      name: ackone-gateway-demo
      annotations:
        # Specify the IDs of the clusters that you want to associate with the ALB instance.
        alb.ingress.kubernetes.io/remote-clusters: ${cluster1},${cluster2}
    spec:
      config:
        name: one-alb-demo
        addressType: Internet
        addressAllocatedMode: Fixed
        zoneMappings:
        - vSwitchId: ${vsw-id1}
        - vSwitchId: ${vsw-id2}
      listeners:
      - port: 8001
        protocol: HTTP
    ---
    apiVersion: networking.k8s.io/v1
    kind: IngressClass
    metadata:
      name: alb
    spec:
      controller: ingress.k8s.alibabacloud/alb
      parameters:
        apiGroup: alibabacloud.com
        kind: AlbConfig
        name: ackone-gateway-demo
  3. Apply the configuration:

    kubectl apply -f gateway.yaml
  4. Wait 1–3 minutes, then verify the gateway is created:

    kubectl get albconfig ackone-gateway-demo

    Expected output:

    NAME                  ALBID      DNSNAME                               PORT&PROTOCOL   CERTID   AGE
    ackone-gateway-demo   alb-xxxx   alb-xxxx.<regionid>.alb.aliyuncs.com                           4d9h

    Note the DNSNAME value — you need it in the verification step.

  5. Confirm both clusters are connected to the gateway:

    kubectl get albconfig ackone-gateway-demo -ojsonpath='{.status.loadBalancer.subClusters}'

    The output lists the IDs of the associated clusters.

Step 3: Configure Ingress rules for zone-disaster recovery

Multi-cluster gateways use Ingress objects on the Fleet instance to route and distribute traffic across clusters.

  1. Create a namespace named gateway-demo on the Fleet instance. This must match the namespace where the application Services are deployed.

  2. Create a file named ingress-demo.yaml with the following content. Replace ${cluster1-id} and ${cluster2-id} with the actual cluster IDs.

    The weights across all alb.ingress.kubernetes.io/cluster-weight annotations must sum to 100.
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      annotations:
        alb.ingress.kubernetes.io/listen-ports: |
         [{"HTTP": 8001}]
        alb.ingress.kubernetes.io/cluster-weight.${cluster1-id}: "20"
        alb.ingress.kubernetes.io/cluster-weight.${cluster2-id}: "80"
      name: web-demo
      namespace: gateway-demo
    spec:
      ingressClassName: alb
      rules:
      - host: alb.ingress.alibaba.com
        http:
          paths:
          - path: /svc1
            pathType: Prefix
            backend:
              service:
                name: service1
                port:
                  number: 80
  3. Apply the Ingress:

    kubectl apply -f ingress-demo.yaml -n gateway-demo

Step 4: Verify active zone-redundancy

Weighted traffic distribution

Use the following command to send requests to the application. Replace alb-xxxx.<regionid>.alb.aliyuncs.com with the DNSNAME from Step 2.

curl -H "host: alb.ingress.alibaba.com" alb-xxxx.<regionid>.alb.aliyuncs.com:8001/svc1

The listener port is 8001, matching the value set in both the AlbConfig and the Ingress annotations.

To test the traffic distribution across 500 requests:

for i in {1..500}; do curl -H "host: alb.ingress.alibaba.com" alb-xxxx.cn-beijing.alb.aliyuncs.com:8001/svc1; done > res.txt

The output shows approximately 20% of responses from Cluster 1 (poc-ack-1) and 80% from Cluster 2 (poc-ack-2), matching the weights configured in the Ingress.

image

Automatic failover

To simulate a cluster failure and observe seamless failover, run the continuous request loop and then scale the pods in Cluster 2 down to 0:

for i in {1..500}; do curl -H "host: alb.ingress.alibaba.com" alb-xxxx.cn-beijing.alb.aliyuncs.com:8001/svc1; sleep 1; done

After the pods in Cluster 2 are removed, traffic is automatically switched to Cluster 1 in a seamless manner.

image

What's next