All Products
Search
Document Center

Container Service for Kubernetes:Use ACK One MSE multi-cluster gateways to implement zone-disaster recovery

更新时间:Nov 25, 2025

Multi-cluster gateways are cloud-native gateways provided by ACK One for multi-cloud and multi-cluster environments. They use managed Microservices Engine (MSE) Ingresses and the Ingress API to manage Layer 7 north-south traffic. Multi-cluster gateways support automatic zone-disaster recovery and header-based phased releases, which simplifies multi-cluster application operations and maintenance (O&M) and reduces costs. When combined with the capabilities of ACK One GitOps, you can quickly build an active zone-redundancy or primary/secondary disaster recovery system. This system does not include data disaster recovery.

Disaster recovery overview

Disaster recovery solutions in the cloud include the following:

  • Intra-city Cross-AZ Disaster Recovery

    Zone-disaster recovery includes active zone-redundancy and primary/secondary disaster recovery. The network latency between data centers in the same region is low. Therefore, zone-disaster recovery is suitable for protecting data against zone-level disasters, such as fires, network interruptions, or power outages.

  • Cross-region active-active disaster recovery

    A cross-region active-active disaster recovery solution has higher network latency between data centers. However, this solution protects data against regional disasters, such as earthquakes and floods.

  • Three data centers across two regions

    This solution combines dual-center zone-disaster recovery and cross-region disaster recovery. It offers the benefits of both and is suitable for scenarios that require high application and data continuity and availability.

In practice, zone-disaster recovery is easier to implement for data than cross-region disaster recovery. Therefore, zone-disaster recovery remains a very important solution.

Benefits

A disaster recovery plan that uses multi-cluster gateways has the following advantages over a plan that uses DNS-based traffic distribution:

  • A DNS-based plan requires multiple Server Load Balancer (SLB) IP addresses, with one for each cluster. A multi-cluster gateway-based plan requires only one SLB IP address at the region level and provides high availability across multiple zones in the same region by default.

  • A multi-cluster gateway-based plan supports Layer 7 routing and forwarding. A DNS-based plan does not.

  • In a DNS-based plan, clients often cache IP addresses, which can cause brief service interruptions during an IP address switch. A multi-cluster gateway-based plan can smoothly fail over traffic to the backend services of another cluster.

  • Multi-cluster gateways are region-level resources. You can perform all operations, such as creating gateways and Ingress resources, on the fleet instance. You do not need to install an Ingress controller or create Ingress resources in each ACK cluster. This provides global traffic management at the region level and reduces multi-cluster management costs.

Architecture

ACK One multi-cluster gateways are implemented using managed MSE Ingresses. When combined with the multi-cluster application distribution capability of ACK One GitOps, you can quickly implement a zone-disaster recovery system for your applications. This topic uses an example of deploying a sample application with GitOps to two ACK clusters (Cluster 1 and Cluster 2) in different zones of the China (Hong Kong) region to show how to implement active zone-redundancy and primary/secondary disaster recovery.

The sample application in this topic is a web application that includes Deployment and Service resources. The following figure shows the architecture of the zone-disaster recovery plan that uses a multi-cluster gateway.image.png

  • Create an MSE gateway using an MseIngressConfig resource in the ACK One fleet.

  • Create two ACK clusters, Cluster 1 and Cluster 2, in two different availability zones (AZs), AZ 1 and AZ 2, within the same region.

  • Use ACK One GitOps to distribute the application to the created Cluster 1 and Cluster 2.

  • After you create the multi-cluster gateway, you can set traffic rules to route traffic by weight or to a specific cluster based on the request header. If one cluster becomes abnormal, traffic is automatically routed to the other cluster.

Prerequisites

Step 1: Use GitOps to deploy an application to multiple clusters

Deploy using the ArgoCD UI

  1. Log on to the ACK One console. In the left-side navigation pane, choose Fleet > Multi-cluster Applications.

  2. On the Multi-cluster GitOps page, click GitOps Console.

    Note
    • If GitOps is not enabled for the ACK One fleet instance, click Enable GitOps to log on to the GitOps Console.

    • To access GitOps over the public network, see Enable public access to Argo CD.

  3. Add the application repository.

    1. In the navigation pane on the left of the Argo CD UI, click Settings. Then, choose Repositories > + Connect Repo.

    2. In the panel that appears, configure the following parameters and click CONNECT.

      Area

      Parameter

      Value

      Choose your connection method

      -

      VIA HTTP/HTTPS

      CONNECT REPO USING HTTP/HTTPS

      Type

      git

      Project

      default

      Repository URL

      https://github.com/AliyunContainerService/gitops-demo.git

      Skip server verification

      Select this checkbox.

      image.png

      After the repository is connected, the CONNECTION STATUS of the Git repository changes to Successful.

      image.png

  4. Create an application.

    On the Applications page in ArgoCD, click + NEW APP and configure the following parameters.

    Section

    Parameter

    Description

    GENERAL

    Application Name

    Enter a custom application name.

    Project Name

    default

    SYNC POLICY

    Select a sync policy as needed. Valid values:

    • Manual: When changes are made in the Git repository, you must manually sync the changes to the destination cluster.

    • Automatic: The ArgoCD server checks for changes in the Git repository every 3 minutes and automatically deploys them to the destination cluster.

    SYNC OPTIONS

    Select AUTO-CREATE NAMESPACE.

    SOURCE

    Repository URL

    Select the Git repository that you added in the preceding step from the drop-down list. In this example, select https://github.com/AliyunContainerService/gitops-demo.git.

    Revision

    Select Branches: gateway-demo.

    Path

    manifests/helm/web-demo

    DESTINATION

    Cluster URL

    Select the URL of Cluster 1 or Cluster 2.

    You can also click URL on the right and select a cluster by its CLUSTER NAME.

    Namespace

    This topic uses gateway-demo. Application resources, such as Services and Deployments, will be created in this namespace.

    Helm

    Parameters

    Set the envCluster value to cluster-demo-1 or cluster-demo-2 to distinguish which cluster's backend processes the requests.

Deploy using the ArgoCD CLI

  1. Enable GitOps on the ACK One fleet instance. For more information, see Enable GitOps on an ACK One fleet instance.

  2. Access ArgoCD. For more information, see Access ArgoCD using the ArgoCD CLI.

  3. Create and publish the application.

    1. Run the following command to add the Git repository.

      argocd repo add https://github.com/AliyunContainerService/gitops-demo.git --name ackone-gitops-demos

      Expected output:

      Repository 'https://github.com/AliyunContainerService/gitops-demo.git' added
    2. Run the following command to view the list of added Git repositories.

      argocd repo list

      Expected output:

      TYPE  NAME  REPO                                                       INSECURE  OCI    LFS    CREDS  STATUS      MESSAGE  PROJECT
      git         https://github.com/AliyunContainerService/gitops-demo.git  false     false  false  false  Successful           default
    3. Run the following command to view the list of clusters.

      argocd cluster list

      Expected output:

      SERVER                          NAME                                        VERSION  STATUS      MESSAGE                                                  PROJECT
      https://1.1.XX.XX:6443      c83f3cbc90a****-temp01   1.22+    Successful
      https://2.2.XX.XX:6443      c83f3cbc90a****-temp02   1.22+    Successful
      https://kubernetes.default.svc  in-cluster                                           Unknown     Cluster has no applications and is not being monitored.
    4. Use an Application to create and publish the demo application to the destination cluster.

      1. Create an apps-web-demo.yaml file with the following content.

        Replace repoURL with your actual repository address.

        View the apps-web-demo.yaml file

        apiVersion: argoproj.io/v1alpha1
        kind: Application
        metadata:
          name: app-demo-cluster1
          namespace: argocd
        spec:
          destination:
            namespace: gateway-demo
            # https://1.1.XX.XX:6443
            server: ${cluster1_url}
          project: default
          source:
            helm:
              releaseName: "web-demo"
              parameters:
              - name: envCluster
                value: cluster-demo-1
              valueFiles:
              - values.yaml
            path: manifests/helm/web-demo
            repoURL: https://github.com/AliyunContainerService/gitops-demo.git
            targetRevision: gateway-demo
          syncPolicy:
            syncOptions:
            - CreateNamespace=true
        ---
        apiVersion: argoproj.io/v1alpha1
        kind: Application
        metadata:
          name: app-demo-cluster1
          namespace: argocd
        spec:
          destination:
            namespace: gateway-demo
            server: ${cluster2_url}
          project: default
          source:
            helm:
              releaseName: "web-demo"
              parameters:
              - name: envCluster
                value: cluster-demo-2
              valueFiles:
              - values.yaml
            path: manifests/helm/web-demo
            repoURL: https://github.com/AliyunContainerService/gitops-demo.git
            targetRevision: gateway-demo
          syncPolicy:
            syncOptions:
            - CreateNamespace=true
      2. Run the following command to deploy the application.

        kubectl apply -f apps-web-demo.yaml
      3. Run the following command to view the list of applications.

        argocd app list

        Expected output:

        # app list
        NAME                      CLUSTER                  NAMESPACE  PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                                       PATH                     TARGET
        argocd/web-demo-cluster1  https://10.1.XX.XX:6443             default  Synced  Healthy  Auto        <none>      https://github.com/AliyunContainerService/gitops-demo.git  manifests/helm/web-demo  main
        argocd/web-demo-cluster2  https://10.1.XX.XX:6443             default  Synced  Healthy  Auto        <none>      https://github.com/AliyunContainerService/gitops-demo.git  manifests/helm/web-demo  main

Step 2: Use kubectl in the fleet instance to create a multi-cluster gateway

Create an MseIngressConfig object in the ACK One fleet to create a multi-cluster gateway and add the associated clusters to it.

  1. Obtain and record the vSwitch ID of the ACK One fleet instance. For more information, see Obtain a vSwitch ID.

  2. Create a gateway.yaml file with the following content.

    Note
    • Replace ${vsw-id1} and ${vsw-id2} with the vSwitch IDs obtained from the preceding step, and replace ${cluster1} and ${cluster2} with the IDs of the associated clusters you want to add.

    • For associated clusters ${cluster1} and ${cluster2}, you must configure the inbound rules of their security group to allow access from all IP addresses and ports of the vSwitch CIDR block.

    apiVersion: mse.alibabacloud.com/v1alpha1
    kind: MseIngressConfig
    metadata:
      annotations:
        mse.alibabacloud.com/remote-clusters: ${cluster1},${cluster2}
      name: ackone-gateway-hongkong
    spec:
      common:
        instance:
          replicas: 3
          spec: 2c4g
        network:
          vSwitches:
          - ${vsw-id}
      ingress:
        local:
          ingressClass: mse
      name: mse-ingress

    Parameter

    Description

    mse.alibabacloud.com/remote-clusters

    The clusters to add to the multi-cluster gateway. Enter the IDs of the clusters that are already associated with the fleet instance.

    spec.name

    The name of the gateway instance.

    spec.common.instance.spec

    Optional. The instance type of the gateway. The default value is 4c8g.

    spec.common.instance.replicas

    Optional. The number of gateway replicas. The default is 3.

    spec.ingress.local.ingressClass

    Optional. The name of the ingressClass to listen on. This means listening to all Ingresses in the fleet instance where the ingressClass field is set to mse.

  3. Run the following command to deploy the multi-cluster gateway.

    kubectl apply -f gateway.yaml
  4. Run the following command to verify that the multi-cluster gateway is created and in the listening state.

    kubectl get mseingressconfig ackone-gateway-hongkong

    Expected output:

    NAME                      STATUS      AGE
    ackone-gateway-hongkong   Listening   3m15s

    The gateway status in the output is Listening. This indicates that the cloud-native gateway is created, running, and automatically listening for Ingress resources with the IngressClass mse in the cluster.

    The status of a gateway created from an MseIngressConfig changes in the following order: Pending, Running, and Listening. State description:

    • Pending: The cloud-native gateway is being created. This process may take about 3 minutes.

    • Running: The cloud-native gateway is created and running.

    • Listening: The cloud-native gateway is running and listens on Ingresses.

    • Failed: The cloud-native gateway is invalid. You can check the message in the Status field to troubleshoot the issue.

  5. Run the following command to confirm that the associated clusters were added successfully.

    kubectl get mseingressconfig ackone-gateway-hongkong -ojsonpath="{.status.remoteClusters}"

    Expected output:

    [{"clusterId":"c7fb82****"},{"clusterId":"cd3007****"}]

    The output includes the specified cluster IDs and no Failed message. This indicates that the associated clusters have been successfully added to the multi-cluster gateway.

Step 3: Use Ingresses to implement zone-disaster recovery

Multi-cluster gateways use Ingresses to manage traffic across multiple clusters. You can create Ingress objects in the ACK One fleet instance to implement active zone-redundancy and primary/secondary disaster recovery.

Important

Make sure that you have created the gateway-demo namespace in the fleet instance. The Ingress created here must be in the same namespace as the Service in the demo application deployed in the previous step, which is gateway-demo.

Active zone-redundancy disaster recovery

Create an Ingress to implement active zone-redundancy

Create the following Ingress object in the ACK One fleet instance to implement active zone-redundancy.

By default, traffic is load-balanced to each replica of the service backend with the same name in the two clusters added to the multi-cluster gateway. If the backend of Cluster 1 becomes abnormal, the gateway automatically routes all traffic to Cluster 2. If the replica ratio of Cluster 1 to Cluster 2 is 9:1, then by default, 90% of the traffic is routed to Cluster 1 and 10% is routed to Cluster 2. If all backends in Cluster 1 become abnormal, 100% of the traffic is automatically routed to Cluster 2. The following figure shows the topology.image.png

  1. Create an ingress-demo.yaml file with the following content.

    In the following code, the backend service service1 is exposed through the /svc1 routing rule under the domain name example.com.

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: web-demo
    spec:
      ingressClassName: mse
      rules:
      - host: example.com
        http:
          paths:
          - path: /svc1
            pathType: Exact
            backend:
              service:
                name: service1
                port: 
                  number: 80
  2. Run the following command to deploy the Ingress in the ACK One fleet instance.

    kubectl apply -f ingress-demo.yaml -n gateway-demo

Verify the canary release version

In an active zone-redundancy scenario, you can verify the canary release version of an application without affecting your services.

Create a new application in an existing cluster. Differentiate it from the original application only by the name and selector of the Service and Deployment. Then, use a request header to verify the new application.

  1. Create a new-app.yaml file in Cluster 1 with the following content. Only the Service and Deployment differ from the original application.

    apiVersion: v1       
    kind: Service
    metadata:
      name: service1-canary-1
      namespace: gateway-demo
    spec:
      ports:
      - port: 80
        protocol: TCP
        targetPort: 8080
      selector:
        app: web-demo-canary-1
      sessionAffinity: None
      type: ClusterIP
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-demo-canary-1
      namespace: gateway-demo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: web-demo-canary-1
      template:
        metadata:
          labels:
            app: web-demo-canary-1
        spec:
          containers:
            - env:
                - name: ENV_NAME
                  value: cluster-demo-1-canary
              image: 'registry-cn-hangzhou.ack.aliyuncs.com/acs/web-demo:0.6.0'
              imagePullPolicy: Always
              name: web-demo
  2. Run the following command to deploy the new application in Cluster 1.

    kubectl apply -f new-app.yaml
  3. Create a new-ingress.yaml file with the following content.

    Create a header-based Ingress in the fleet instance. Add annotations to enable the canary feature and specify a header. Use the header canary-dest: cluster1 in requests to route traffic to the canary release version for verification.

    • nginx.ingress.kubernetes.io/canary: Set to "true" to enable the canary release feature.

    • nginx.ingress.kubernetes.io/canary-by-header: The header key for requests to this cluster.

    • nginx.ingress.kubernetes.io/canary-by-header-value: The header value for requests to this cluster.

      apiVersion: networking.k8s.io/v1
      kind: Ingress
      metadata:
        name: web-demo-canary-1
        namespace: gateway-demo
        annotations:
          nginx.ingress.kubernetes.io/canary: "true"
          nginx.ingress.kubernetes.io/canary-by-header: "canary-dest"
          nginx.ingress.kubernetes.io/canary-by-header-value: "cluster1"
      spec:
        ingressClassName: mse
        rules:
        - host: example.com
          http:
            paths:
            - path: /svc1
              pathType: Exact
              backend:
                service:
                  name: service1-canary-1
                  port: 
                    number: 80
  4. Run the following command to deploy the header-based Ingress in the fleet instance.

    kubectl apply -f new-ingress.yaml

Verify active zone-redundancy

Change the number of replicas deployed to Cluster 1 to 9 and the number of replicas deployed to Cluster 2 to 1. The expected result is that traffic is routed to Cluster 1 and Cluster 2 at a 9:1 ratio by default. When Cluster 1 becomes abnormal, all traffic is automatically routed to Cluster 2.

Run the following command to obtain the public IP address of the multi-cluster gateway.

kubectl get ingress web-demo -n gateway-demo -ojsonpath="{.status.loadBalancer}"
  • Default traffic is routed to both clusters by ratio

    Run the following command to check the traffic routing.

    Replace XX.XX.XX.XX with the public IP address of the multi-cluster gateway that you obtained in the preceding step.

    for i in {1..100}; do curl -H "host: example.com" XX.XX.XX.XX done

    Expected output: image.pngThe output shows that traffic is load-balanced to Cluster 1 and Cluster 2. The traffic ratio between Cluster 1 and Cluster 2 is 9:1.

  • All traffic fails over to Cluster 2 when Cluster 1 is abnormal

    If you scale the replica count of the Deployment in Cluster 1 to 0, the following output is returned. This shows that the traffic has been automatically failed over to Cluster 2.image.png

  • Route requests to the canary release version using a header

    Run the following command to check the traffic routing.

    Replace XX.XX.XX.XX with the IP address obtained after the application and Ingress are deployed as described in the Verify the canary release version section.

    for i in {1..100}; do curl -H "host: example.com" -H "canary-dest: cluster1" xx.xx.xx.xx/svc1; sleep 1;  done  

    Expected output: image.pngThe output shows that requests with the header canary-dest: cluster1 are all routed to the canary release version in Cluster 1.

Primary/secondary disaster recovery

Create the following Ingress object in the ACK One fleet instance to implement primary/secondary disaster recovery.

When the backends of both clusters are normal, traffic is routed only to the service backend of Cluster 1. If the backend of Cluster 1 becomes abnormal, the gateway automatically routes traffic to Cluster 2. The following figure shows the topology.image.png

Create an Ingress to implement primary/secondary disaster recovery

  1. Create an ingress-demo-cluster-one.yaml file with the following content.

    Add the mse.ingress.kubernetes.io/service-subset and mse.ingress.kubernetes.io/subset-labels annotations to the YAML file of the Ingress object to use /service1 below the domain name example.com to expose the backend Service service1. For more information about the annotations supported by MSE Ingresses, see Annotations supported by MSE Ingress gateways.

    • mse.ingress.kubernetes.io/service-subset: The name of the service subset. We recommend that you define a readable value related to the destination cluster.

    • mse.ingress.kubernetes.io/subset-labels: Specifies the ID of the destination cluster.

      apiVersion: networking.k8s.io/v1
      kind: Ingress
      metadata:
        annotations:
          mse.ingress.kubernetes.io/service-subset: cluster-demo-1
          mse.ingress.kubernetes.io/subset-labels: |
            topology.istio.io/cluster ${cluster1-id}
        name: web-demo-cluster-one
      spec:
        ingressClassName: mse
        rules:
        - host: example.com
          http:
            paths:
            - path: /service1
              pathType: Exact
              backend:
                service:
                  name: service1
                  port: 
                    number: 80
  2. Run the following command to deploy the Ingress in the ACK One fleet instance.

    kubectl apply -f ingress-demo-cluster-one.yaml -ngateway-demo

Implement cluster-level canary releases

Multi-cluster gateways support creating header-based Ingresses to route requests to a specific cluster. You can use this capability, along with the full-replica load balancing capability of the Ingress object described in the Create an Ingress to implement primary/secondary disaster recovery section, to implement header-based canary releases.

  1. Create the Ingress as described in Create an Ingress to implement primary/secondary disaster recovery.

  2. Create an Ingress with header-related annotations in the ACK One fleet instance to implement cluster-level canary releases. When the request header matches the Ingress configuration, the request is routed to the backend of the canary release version.

    1. Create an ingress-demo-cluster-gray.yaml file with the following content.

      In the YAML file for the following Ingress object, replace ${cluster1-id} with the target cluster ID. In addition to the mse.ingress.kubernetes.io/service-subset and mse.ingress.kubernetes.io/subset-labels annotations, add the following annotations to expose the service1 backend service for the /service1 routing rule of the example.com domain name.

      • nginx.ingress.kubernetes.io/canary: Set to "true" to enable the canary release feature.

      • nginx.ingress.kubernetes.io/canary-by-header: The header key for requests to this cluster.

      • nginx.ingress.kubernetes.io/canary-by-header-value: The header value for requests to this cluster.

      apiVersion: networking.k8s.io/v1
      kind: Ingress
      metadata:
        annotations:
          mse.ingress.kubernetes.io/service-subset: cluster-demo-2
          mse.ingress.kubernetes.io/subset-labels: |
            topology.istio.io/cluster ${cluster2-id}
          nginx.ingress.kubernetes.io/canary: "true"
          nginx.ingress.kubernetes.io/canary-by-header: "app-web-demo-version"
          nginx.ingress.kubernetes.io/canary-by-header-value: "gray"
        name: web-demo-cluster-gray
      spec:
        ingressClassName: mse
        rules:
        - host: example.com
          http:
            paths:
            - path: /service1
              pathType: Exact
              backend:
                service:
                  name: service1
                  port: 
                    number: 80
    2. Run the following command to deploy the Ingress in the ACK One fleet instance.

      kubectl apply -f ingress-demo-cluster-gray.yaml -n gateway-demo

Verify primary/secondary disaster recovery

For easier verification, change the number of replicas deployed to both Cluster 1 and Cluster 2 to 1. The expected result is that default traffic is routed only to Cluster 1, traffic with the canary header is routed to Cluster 2, and when the application in Cluster 1 is abnormal, default traffic is also routed to Cluster 2.

Run the following command to obtain the public IP address of the multi-cluster gateway.

kubectl get ingress web-demo -n gateway-demo -ojsonpath="{.status.loadBalancer}"
  • Default traffic is routed to Cluster 1

    Run the following command to check if default traffic is routed to Cluster 1.

    Replace XX.XX.XX.XX with the public IP address of the multi-cluster gateway that you obtained in the preceding step.

    for i in {1..100}; do curl -H "host: example.com" xx.xx.xx.xx/service1; sleep 1;  done

    Expected output: image.pngThe output shows that all default traffic is routed to Cluster 1.

  • Route requests to the canary release version using a header

    Run the following command to check if requests with the header are routed to the canary release version.

    Replace XX.XX.XX.XX with the public IP address of the multi-cluster gateway that you obtained in the preceding step.

    for i in {1..50}; do curl -H "host: example.com" -H "app-web-demo-version: gray" xx.xx.xx.xx/service1; sleep 1;  done

    Expected output: image.pngThe output shows that requests with the header app-web-demo-version: gray are all routed to Cluster 2 (the canary release version).

  • Traffic fails over to Cluster 2 when Cluster 1 is abnormal

    If you scale the replica count of the Deployment in Cluster 1 to 0, the following output is returned. This shows that the traffic has been automatically failed over to Cluster 2.image.png

References