All Products
Search
Document Center

Server Load Balancer:Use ALB multi-cluster gateways of ACK One to implement geo-disaster recovery

Last Updated:Nov 21, 2025

Distributed Cloud Container Platform for Kubernetes (ACK One) supports geo-disaster recovery based on Application Load Balancer (ALB) multi-cluster gateways. You can implement geo-disaster recovery to protect data against region-level disasters, such as floods and earthquakes. However, this may increase the response latency, resource costs, and maintenance costs of your business. This topic describes the architecture and use scenarios of geo-disaster recovery based on ALB multi-cluster gateways of ACK One.

Architecture

image
  • Create a cluster (ACK Cluster 1), an ACK One Fleet instance, and an ALB multi-cluster gateway in Region 1. Create a cluster (ACK Cluster 2) and install the ALB Ingress controller in ACK Cluster 2. After you install the ALB Ingress controller, an ALB instance (ALB 2) is automatically created in Region 2 for cold backup.

  • Use Global Traffic Manager (GTM) to associate the ALB multi-cluster gateway in Region 1 with ALB 2 in Region 2. This way, ALB 2 takes over when geological disasters occur in Region 1.

  • The ACK One Fleet instance uses the ALB multi-cluster gateway to flexibly route traffic to ACK Cluster 1 and ACK Cluster 2 in different regions based on request headers or by using the zero round-trip time (0-RTT) feature of the Quick UDP Internet Connections (QUIC) protocol. This way, ACK Cluster 1 and ACK Cluster 2 serve as the backup of each other.

  • ACK Cluster 1 and ACK Cluster 2 are connected to each other by using Cloud Enterprise Network (CEN) or Virtual Private Cloud (VPC) peering connections. This way, traffic is forwarded across regions through Express Connect circuits to ensure reliability.

  • Data synchronization based on ApsaraDB RDS has middleware dependencies.

Benefits

Geo-disaster recovery based on ALB multi-cluster gateways of ACK One provides the following benefits:

  • Enhanced multi-cluster traffic routing: This solution provides content-based advanced traffic routing and a health check mechanism that is more flexible than GTM to meet the requirements of complex scenarios.

  • Unified multi-cluster traffic management: This solution uses an ACK One Fleet instance as a unified control plane for Ingress configurations. This simplifies service extensions and application maintenance and reduces management costs.

  • DNS cache on clients: When errors occur on frequently accessed services or the cluster, failover can be completed within seconds without the need to switch the domain name configured for DNS resolution.

Scenarios

Geo-disaster recovery based on ALB multi-cluster gateways of ACK One is suitable for the following scenarios:

  • Resources in the local region are insufficient and you want to achieve high availability across regions. For example, in the current AI boom, you may encounter GPU resource shortage.

  • Client applications do not require low latency but require improved multi-cluster traffic management.

This solution implements disaster recovery based on ALB multi-cluster gateways and GTM. ALB multi-cluster gateways can centrally manage and forward traffic to multiple clusters. The following list describes the scenarios in which GTM is used to switch traffic and the scenarios in which multi-cluster gateways are used to forward traffic.

  • When cluster errors or service errors occur in Region 1 or geological disasters occur in Region 2, the ALB multi-cluster gateway automatically switches traffic to the healthy cluster without switching the DNS IP address.

  • GTM switches traffic to the ALB instance in Region 2 based on health check results only when geological disasters or ALB errors occur in Region 1.

Prerequisites

Step 1: Plan the cluster network and configure cross-region communication

  1. Plan the cluster network.

    1. ACK Cluster 1 and ACK Cluster 2 reside in different regions. The node CIDR block and pod CIDR block of ACK Cluster 1 do not overlap with the node CIDR block and pod CIDR block of ACK Cluster 2.

    2. The ACK One Fleet instance and ACK Cluster 1 reside in the same region and VPC.

    3. Connect the VPC in Region 1 to the VPC in Region 2. For more information, see Network connectivity. If you use CEN, refer to Manage inter-region connections.

    4. After you connect the VPC in Region 1 to the VPC in Region 2, configure the security groups of ACK Cluster 1 and ACK Cluster 2 to allow access from the VPC where the ALB multi-cluster gateway resides to ACK Cluster 1 and ACK Cluster 2. For more information, see Configure security groups for clusters.

    For more information, see Network design for Fleet management.

  2. Associate ACK Cluster 1 and ACK Cluster 2 with the ACK One Fleet instance. For more information, see Manage associated clusters.

Step 2: Use the ALB multi-cluster gateway to implement geo-disaster recovery

The following list describes how geo-disaster recovery is implemented. For more information, see Build a zone-disaster recovery system.

  1. Use GitOps or the application distribution feature to distribute an application to ACK Cluster 1 and ACK Cluster 2. This ensures application consistency.

  2. Configure an AlbConfig on the Fleet instance to create an ALB multi-cluster gateway, and add ACK Cluster 1 and ACK Cluster 2 to the gateway.

  3. Create routing rules and Ingresses on the Fleet instance to implement geo-disaster recovery. The following code block is an example. For more information, see Traffic management in different scenarios and Configure Ingresses.

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      annotations:
        alb.ingress.kubernetes.io/listen-ports: |
         [{"HTTP": 8001}]
        alb.ingress.kubernetes.io/cluster-weight.${cluster1-id}: "20"
        alb.ingress.kubernetes.io/cluster-weight.${cluster2-id}: "80"
      name: web-demo
      namespace: gateway-demo
    spec:
      ingressClassName: alb
      rules:
      - host: alb.ingress.alibaba.com
        http:
          paths:
          - path: /svc1
            pathType: Prefix
            backend:
              service:
                name: service1
                port:
                  number: 80

Step 3: Regularly back up Ingresses

To ensure service continuity when errors occur in Region 1, we recommend that you regularly back up the Ingresses and AlbConfigs in the ACK One Fleet instance in Region 1. When errors occur, the system synchronizes the backup Ingresses to ACK Cluster 2 to allow ALB 2 to route traffic to the backend services in ACK Cluster 2. This ensures that your business can run as normal.

You can use custom backup methods. For example, you can store the backup data to Object Storage Service (OSS) buckets in Region 2.

Step 4: Configure primary/secondary disaster recovery based on GTM

After you complete the preceding steps, you can manage ACK Cluster 1 and ACK Cluster 2 in a centralized manner and implement disaster recovery. When cluster errors or service errors occur in Region 1 or when errors occur in Region 2, the ALB multi-cluster gateway automatically switches traffic to the healthy cluster in a seamless manner. To avoid service interruptions when geological disasters or ALB errors occur in Region 1, we recommend that you configure primary/secondary disaster recovery based on GTM. You can perform this step when errors occur in Region 1. Procedure:

  1. Use an ALB Ingress in ACK Cluster 2 to create an ALB instance in Region 2. For more information, see Create an AlbConfig.

  2. Specify the IP address of the ALB multi-cluster gateway in Region 1 and the domain name of the ALB instance in Region 2 in GTM. The gateway and ALB instance work in primary/secondary mode. This way, when errors occur in Region 1, GTM switches traffic to Region 2 based on health check results. For more information, see Use GTM to implement primary/secondary disaster recovery.

    Note

    When GTM switches traffic, your service may remain interrupted for a long period of time due to the DNS cache on clients.