Deploy Multi-Zone Disaster Recovery with ALB Gateway - ACK One

Disaster recovery models

Cloud disaster recovery falls into three models. Choose the model that matches your recovery objectives and operational complexity.

Model	Protects against	Trade-off
Zone-disaster recovery	Zone-level events: fire, network outages, power failure	Low latency between same-region data centers; simpler to implement
Active geo-redundancy	Region-level disasters: floods, earthquakes	Higher inter-data-center latency; stronger protection
Three data centers across two zones	Both zone-level and region-level failures	Combines the two models above; highest complexity

Zone-disaster recovery includes two sub-modes:

Active zone-redundancy: Both clusters serve live traffic simultaneously.
Primary/secondary disaster recovery: One cluster handles traffic while the other stays on standby.

Architectural layers

A typical enterprise application has three layers, each requiring its own disaster recovery strategy:

Layer	Role	ACK One coverage
Access layer	Entry point for ingress traffic; routes requests to the backend based on forwarding rules	Multi-cluster gateways; zone-disaster recovery supported by default
Application layer	Hosts and processes application workloads	Multi-cluster gateways; supports active zone-redundancy, primary/secondary disaster recovery, and geo-redundancy
Data layer	Stores and serves data to the application layer	Middleware-dependent (for example, ApsaraDB RDS)

Why use multi-cluster gateways

Multi-cluster gateways offer several advantages over DNS-based traffic distribution:

Single IP address per region: DNS-based approaches require one load balancer IP per cluster. Multi-cluster gateways use a single load balancer IP address and deploy across multiple zones within the region for built-in high availability.
Layer 7 request forwarding: Multi-cluster gateways route traffic at Layer 7, enabling host- and path-based rules. DNS-based approaches do not support Layer 7 request forwarding.
Seamless failover: When a cluster goes down, traffic shifts immediately to healthy pods in another cluster. DNS-based approaches require clients to cache DNS query results during IP address switching, causing temporary service interruptions.
Centralized management: Multi-cluster gateways are region-level resources managed from the Fleet instance. No Ingress controller installation or Ingress creation is needed in each Container Service for Kubernetes (ACK) cluster.

Architecture

This tutorial uses a web application (a Deployment and a Service) to demonstrate zone-disaster recovery across two clusters.

The setup works as follows:

Cluster 1 and Cluster 2 are deployed in AZ 1 and AZ 2 within the China (Hong Kong) region.
ACK One GitOps distributes the web application to both clusters.
An AlbConfig on the ACK One Fleet instance creates an ALB multi-cluster gateway that spans both clusters.
Ingress rules on the Fleet instance route traffic across clusters by weight. When one cluster becomes unavailable, the gateway automatically redirects traffic to the other.
Data synchronization between clusters uses ApsaraDB RDS and has middleware dependencies.

Prerequisites

Before you begin, ensure that you have:

ALB activated. Go to the ALB console to activate it.
The Fleet management feature enabled. See Enable multi-cluster management.
An ACK One Fleet instance associated with two ACK clusters deployed in the same virtual private cloud (VPC). See Manage associated clusters.
The kubeconfig file of the Fleet instance downloaded from the ACK One console and kubectl connected to the Fleet instance.
The latest version of Alibaba Cloud CLI installed and configured.

Step 1: Distribute the application to multiple clusters

ACK One supports two approaches for distributing an application across clusters: GitOps and the multi-cluster application distribution feature. This tutorial uses GitOps.

Log on to the ACK One console. In the left-side navigation pane, choose Fleet > Multi-cluster Applications.
In the upper-left corner of the Multi-cluster Applications page, click next to the Fleet instance name and select your Fleet instance from the drop-down list.
Choose Create Multi-cluster Application > GitOps to open the Create Multi-cluster Application - GitOps page.

If GitOps is not enabled for your Fleet instance, enable it first. See Enable GitOps for the Fleet instance. To allow internet access to GitOps, see Enable public access to Argo CD.

On the Create from YAML tab, paste the following ApplicationSet template into the code editor and click OK. This template deploys web-demo to all clusters associated with your Fleet instance. Alternatively, use the Quick Create tab to select clusters individually — changes made there sync automatically to this YAML.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: appset-web-demo
  namespace: argocd
spec:
  template:
    metadata:
      name: '{{.metadata.annotations.cluster_id}}-web-demo'
      namespace: argocd
    spec:
      destination:
        name: '{{.name}}'
        namespace: gateway-demo
      project: default
      source:
        repoURL: https://github.com/AliyunContainerService/gitops-demo.git
        path: manifests/helm/web-demo
        targetRevision: main
        helm:
          valueFiles:
            - values.yaml
          parameters:
            - name: envCluster
              value: '{{.metadata.annotations.cluster_name}}'
      syncPolicy:
        automated: {}
        syncOptions:
          - CreateNamespace=true
  generators:
    - clusters:
        selector:
          matchExpressions:
            - values:
                - cluster
              key: argocd.argoproj.io/secret-type
              operator: In
            - values:
                - in-cluster
              key: name
              operator: NotIn
  goTemplateOptions:
    - missingkey=error
  syncPolicy:
    preserveResourcesOnDeletion: false
  goTemplate: true

For other distribution methods, see Getting started with GitOps, Create a multi-cluster application, and Get started with application distribution.

Step 2: Deploy an ALB multi-cluster gateway

Create an AlbConfig on the Fleet instance to provision an ALB gateway that spans both clusters.

Get the IDs of two vSwitches in the VPC where your Fleet instance resides.

Create a file named gateway.yaml with the following content. Replace ${vsw-id1} and ${vsw-id2} with the vSwitch IDs, and replace ${cluster1} and ${cluster2} with the IDs of the associated clusters.

For ${cluster1} and ${cluster2}, configure the inbound rules of their security groups to allow access from all IP addresses and ports within the vSwitch CIDR block.

Parameter	Required	Description
`metadata.name`	Yes	Name of the AlbConfig.
`alb.ingress.kubernetes.io/remote-clusters`	Yes	Comma-separated list of cluster IDs to associate with the ALB gateway. These clusters must already be associated with the Fleet instance.
`spec.config.name`	No	Name of the ALB instance.
`spec.config.addressType`	No	Network type: `Internet` (default) for public access, or `Intranet` for VPC-internal access only. Internet-facing instances require an elastic IP address (EIP) and incur EIP fees. See Pay-as-you-go.
`spec.config.zoneMappings`	Yes	vSwitch IDs for the ALB instance. Specify vSwitches in at least two zones for high availability. The vSwitches must be in zones supported by ALB and in the same VPC as the clusters. See Regions and zones in which ALB is available and Create and manage a vSwitch.
`spec.listeners`	No	Listener port and protocol. This example configures HTTP on port 8001. Keep the listener configuration — without it, you must create a listener manually before ALB Ingresses can route traffic.

apiVersion: alibabacloud.com/v1
kind: AlbConfig
metadata:
  name: ackone-gateway-demo
  annotations:
    # Specify the IDs of the clusters that you want to associate with the ALB instance.
    alb.ingress.kubernetes.io/remote-clusters: ${cluster1},${cluster2}
spec:
  config:
    name: one-alb-demo
    addressType: Internet
    addressAllocatedMode: Fixed
    zoneMappings:
    - vSwitchId: ${vsw-id1}
    - vSwitchId: ${vsw-id2}
  listeners:
  - port: 8001
    protocol: HTTP
---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: alb
spec:
  controller: ingress.k8s.alibabacloud/alb
  parameters:
    apiGroup: alibabacloud.com
    kind: AlbConfig
    name: ackone-gateway-demo

Apply the configuration:
```
kubectl apply -f gateway.yaml
```

Wait 1–3 minutes, then verify the gateway is created:

kubectl get albconfig ackone-gateway-demo

Expected output:

NAME                  ALBID      DNSNAME                               PORT&PROTOCOL   CERTID   AGE
ackone-gateway-demo   alb-xxxx   alb-xxxx.<regionid>.alb.aliyuncs.com                           4d9h

Note the DNSNAME value — you need it in the verification step.

Confirm both clusters are connected to the gateway:
```
kubectl get albconfig ackone-gateway-demo -ojsonpath='{.status.loadBalancer.subClusters}'
```
The output lists the IDs of the associated clusters.

Step 3: Configure Ingress rules for zone-disaster recovery

Multi-cluster gateways use Ingress objects on the Fleet instance to route and distribute traffic across clusters.

Create a namespace named gateway-demo on the Fleet instance. This must match the namespace where the application Services are deployed.

Create a file named ingress-demo.yaml with the following content. Replace ${cluster1-id} and ${cluster2-id} with the actual cluster IDs.

The weights across all alb.ingress.kubernetes.io/cluster-weight annotations must sum to 100.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/listen-ports: |
     [{"HTTP": 8001}]
    alb.ingress.kubernetes.io/cluster-weight.${cluster1-id}: "20"
    alb.ingress.kubernetes.io/cluster-weight.${cluster2-id}: "80"
  name: web-demo
  namespace: gateway-demo
spec:
  ingressClassName: alb
  rules:
  - host: alb.ingress.alibaba.com
    http:
      paths:
      - path: /svc1
        pathType: Prefix
        backend:
          service:
            name: service1
            port:
              number: 80

Apply the Ingress:

kubectl apply -f ingress-demo.yaml -n gateway-demo

Step 4: Verify active zone-redundancy

Weighted traffic distribution

Use the following command to send requests to the application. Replace alb-xxxx.<regionid>.alb.aliyuncs.com with the DNSNAME from Step 2.

curl -H "host: alb.ingress.alibaba.com" alb-xxxx.<regionid>.alb.aliyuncs.com:8001/svc1

The listener port is 8001, matching the value set in both the AlbConfig and the Ingress annotations.

To test the traffic distribution across 500 requests:

for i in {1..500}; do curl -H "host: alb.ingress.alibaba.com" alb-xxxx.cn-beijing.alb.aliyuncs.com:8001/svc1; done > res.txt

The output shows approximately 20% of responses from Cluster 1 (poc-ack-1) and 80% from Cluster 2 (poc-ack-2), matching the weights configured in the Ingress.

Automatic failover

To simulate a cluster failure and observe seamless failover, run the continuous request loop and then scale the pods in Cluster 2 down to 0:

for i in {1..500}; do curl -H "host: alb.ingress.alibaba.com" alb-xxxx.cn-beijing.alb.aliyuncs.com:8001/svc1; sleep 1; done

After the pods in Cluster 2 are removed, traffic is automatically switched to Cluster 1 in a seamless manner.

Container Service for Kubernetes:Build a zone-disaster recovery system