All Products
Search
Document Center

Alibaba Cloud Service Mesh:Disaster recovery scenario for multiple ACK clusters in different VPCs (based on CEN for VPC network connectivity)

Last Updated:Mar 11, 2026

When you run microservices across multiple regions, a single-region outage can disrupt all traffic unless you have a cross-region recovery mechanism in place. Service Mesh (ASM) addresses this with two geolocation-based traffic management capabilities:

  • Inter-region failover: Automatically reroutes traffic to a healthy region when services in the primary region become unavailable.

  • Inter-region traffic distribution: Splits traffic across regions based on configurable weight percentages for proactive load balancing.

This guide walks through both capabilities using the Bookinfo sample application deployed across two Container Service for Kubernetes (ACK) clusters in separate Virtual Private Clouds (VPCs), connected through Cloud Enterprise Network (CEN).

How it works

When you add multiple ACK clusters in different regions to a single ASM instance, ASM acts as the unified control plane for cross-region traffic management:

  1. Client requests arrive at the ingress gateway in the primary region.

  2. ASM evaluates geolocation-based load balancing rules to determine the target cluster.

  3. For failover, outlier detection identifies unhealthy endpoints and redirects traffic to the backup region.

  4. For traffic distribution, ASM splits traffic across regions according to configured weight percentages.

CEN provides the underlying network connectivity between VPCs in different regions, enabling pod-to-pod communication across clusters.

The following diagram illustrates the architecture used in this guide:

China (Hangzhou)                     China (Shanghai)
+--------------------------+        +--------------------------+
|  ack-hangzhou            | --CEN--|  ack-shanghai            |
|  (reviews v1, v3)        |        |  (reviews v2)            |
|  vpc-hangzhou            |        |  vpc-shanghai            |
+--------------------------+        +--------------------------+
        ^                                    ^
        |                                    |
   ASM instance ----------- manages --------+
   (vpc-hangzhou2)

Failover path: When the reviews service in ack-hangzhou becomes unavailable, ASM reroutes all traffic to the reviews v2 service in ack-shanghai.

Traffic distribution path: ASM splits traffic between ack-hangzhou and ack-shanghai based on configured weights (for example, 90%/10%).

CapabilityBehaviorUse when
Inter-region failoverReroutes all traffic to a backup region only when the primary region fails.You run services actively in one region with a standby backup.
Inter-region traffic distributionSplits traffic across regions based on fixed weight percentages at all times.You run services actively across multiple regions to share production load.

Prerequisites

Before you begin, make sure you have:

  • An Alibaba Cloud account with permissions to create VPC, ACK, ASM, and CEN resources

  • kubectl installed locally

  • Basic familiarity with Kubernetes and Istio traffic management concepts (VirtualService, DestinationRule, Gateway)

This guide creates all required infrastructure from scratch. If you already have multi-region ACK clusters connected through CEN, skip to Add clusters to ASM and create an ingress gateway.

Plan non-overlapping CIDR blocks

All VPCs, vSwitches, and cluster networks must use non-overlapping CIDR blocks to avoid routing conflicts when CEN connects the VPC networks. For detailed planning guidance, see Plan CIDR blocks for multiple clusters on the data plane.

The following tables show the example configurations used throughout this guide.

VPC configuration

ObjectVPC nameRegionIPv4 CIDR block
Clustervpc-hangzhoucn-hangzhou20.0.0.0/8
Clustervpc-shanghaicn-shanghai21.0.0.0/8
Service Meshvpc-hangzhou2cn-hangzhou192.168.0.0/16

vSwitch configuration

Important

No two vSwitches can share the same CIDR block. Overlapping CIDR blocks cause route conflicts when CEN connects the VPC networks.

ObjectvSwitch nameVPCIPv4 CIDR block
Clustervpc-hangzhou-switch-1vpc-hangzhou20.0.0.0/16
Clustervpc-shanghai-switch-1vpc-shanghai21.0.0.0/16
Service Meshvpc-hangzhou-switch-2vpc-hangzhou2192.168.0.0/24

Pod and Service CIDR blocks

Cluster nameRegionVPCPod CIDRService CIDR
ack-hangzhoucn-hangzhouvpc-hangzhou10.0.0.0/16172.16.0.0/16
ack-shanghaicn-shanghaivpc-shanghai10.1.0.0/16172.17.0.0/16

Step 1: Create clusters in different regions

  1. Create VPCs and vSwitches in the China (Hangzhou) and China (Shanghai) regions using the CIDR blocks listed above. See Create a VPC and a vSwitch and Create a vSwitch.

  2. Create an ACK managed cluster in each region using the corresponding VPC. See Create an ACK managed cluster.

  3. Create an ASM instance in the China (Hangzhou) region. See Create an ASM instance.

Step 2: Connect VPC networks through CEN

CEN connects the VPC networks between the two ACK clusters and between each cluster and the ASM instance, enabling cross-region pod-to-pod communication.

Create a CEN instance and transit routers

  1. Log on to the CEN console and create a CEN instance. See Create a CEN instance.

  2. On the CEN Instances page, click the CEN instance name. On the Basic Information tab, click Create Transit Router. Create two transit routers:

    • Region: China (Shanghai), Name: shanghai-router

    • Region: China (Hangzhou), Name: hangzhou-router

Attach VPCs to transit routers

Repeat the following steps for each transit router:

  1. Click the transit router ID.

  2. On the Intra-region Connections tab, click Create Connection.

  3. Set Instance Type to Virtual Private Cloud (VPC) and select the VPC that corresponds to the transit router's region under Network Instance.

  4. Keep other settings at their defaults and click OK.

Configure inter-region bandwidth

  1. Click the name of a transit router, then click Create Connection.

  2. In the Connection With Peer Network Instance dialog box, set Region to the local transit router's region and Peer Region to the remote region. For example, set Region to China (Hangzhou) and Peer Region to China (Shanghai). For parameter details, see Inter-region connections.

  3. After creation, verify the connection appears on the Inter-region Connections tab.

Add security group rules

Allow cross-cluster pod traffic by adding the peer cluster's Pod CIDR block to each cluster's security group.

Note

The following steps apply to clusters using the Flannel network plugin. For clusters using the Terway network plugin, use the cluster vSwitch CIDR block instead of the Pod CIDR block. Find the vSwitch CIDR in the IPv4 CIDR Block column on the vSwitch page of the VPC console.

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. Get the Pod CIDR block for each cluster:

    1. On the Clusters page, select the China (Shanghai) region. Click the ack-shanghai cluster name. On the Cluster Information page, click the Basic Information tab to find the Pod CIDR block.

    2. Repeat for the ack-hangzhou cluster in the China (Hangzhou) region.

  3. Add the peer cluster's Pod CIDR to each cluster's security group:

    1. On the Cluster Information page of each cluster, click the Basic Information tab. Click the security group ID next to Control Plane Security Group.

    2. On the Inbound tab, click Add Rule.

    3. Set Protocol Type to All and Source to the Pod CIDR block of the peer cluster. Keep other defaults and click Save.

  4. Verify connectivity by logging on to a node in each cluster and running ping against a node in the other cluster. See Log on to nodes.

Step 3: Add clusters to ASM and create an ingress gateway

  1. Add the ack-hangzhou and ack-shanghai clusters to the ASM instance. See Add a cluster to an ASM instance.

  2. Create an ingress gateway by applying the following YAML to the ASM instance:

       apiVersion: istio.alibabacloud.com/v1beta1
       kind: IstioGateway
       metadata:
         annotations:
           asm.alibabacloud.com/managed-by-asm: 'true'
         name: ingressgateway
         namespace: istio-system
       spec:
         gatewayType: ingress
         dnsPolicy: ClusterFirst
         externalTrafficPolicy: Local
         hostNetwork: false
         ports:
         - name: http
           port: 80
           protocol: TCP
           targetPort: 80
         - name: https
           port: 443
           protocol: TCP
           targetPort: 443
         replicaCount: 1
         resources:
           limits:
             cpu: '2'
             memory: 2G
           requests:
             cpu: 200m
             memory: 256Mi
         rollingMaxSurge: 100%
         rollingMaxUnavailable: 25%
         runAsRoot: true
         serviceType: LoadBalancer

Step 4: Deploy the Bookinfo application

Important

The following steps require switching between kubeconfig contexts for different clusters. Configure the kubeconfig of both clusters in a single config file and use kubectl config use-context to switch between them. Tools like kubecm or kubectx simplify multi-cluster kubeconfig management.

Deploy the application

Deploy Bookinfo in both the ack-hangzhou and ack-shanghai clusters:

kubectl apply -f bookinfo.yaml

Create routing rules

Switch kubectl to the ASM instance context and apply the following routing rules.

  1. Save the following YAML as asm.yaml:

    View the complete YAML

       apiVersion: networking.istio.io/v1alpha3
       kind: Gateway
       metadata:
         name: bookinfo-gateway
       spec:
         selector:
           istio: ingressgateway
         servers:
         - port:
             number: 80
             name: http
             protocol: HTTP
           hosts:
           - "*"
       ---
       apiVersion: networking.istio.io/v1alpha3
       kind: VirtualService
       metadata:
         name: bookinfo
       spec:
         hosts:
         - "*"
         gateways:
         - bookinfo-gateway
         http:
         - match:
           - uri:
               exact: /productpage
           - uri:
               prefix: /static
           - uri:
               exact: /login
           - uri:
               exact: /logout
           - uri:
               prefix: /api/v1/products
           route:
           - destination:
               host: productpage
               port:
                 number: 9080
       ---
       apiVersion: networking.istio.io/v1alpha3
       kind: DestinationRule
       metadata:
         name: productpage
       spec:
         host: productpage
         subsets:
         - name: v1
           labels:
             version: v1
       ---
       apiVersion: networking.istio.io/v1alpha3
       kind: DestinationRule
       metadata:
         name: reviews
       spec:
         host: reviews
         subsets:
         - name: v1
           labels:
             version: v1
         - name: v2
           labels:
             version: v2
         - name: v3
           labels:
             version: v3
       ---
       apiVersion: networking.istio.io/v1alpha3
       kind: DestinationRule
       metadata:
         name: ratings
       spec:
         host: ratings
         subsets:
         - name: v1
           labels:
             version: v1
         - name: v2
           labels:
             version: v2
         - name: v2-mysql
           labels:
             version: v2-mysql
         - name: v2-mysql-vm
           labels:
             version: v2-mysql-vm
       ---
       apiVersion: networking.istio.io/v1alpha3
       kind: DestinationRule
       metadata:
         name: details
       spec:
         host: details
         subsets:
         - name: v1
           labels:
             version: v1
         - name: v2
           labels:
             version: v2
  2. Apply the routing rules:

       kubectl apply -f asm.yaml

Verify the deployment

  1. Get the ingress gateway address.

  2. Open http://<ingress-gateway-ip>/productpage in a browser and refresh the page 10 times. The Bookinfo application distributes requests across v1, v2, and v3 of the reviews service in roughly equal proportions (1:1:1).

Step 5: Configure inter-region failover

Inter-region failover reroutes traffic to a backup region when the local region's service becomes unavailable. The following steps demonstrate failover by scaling down the reviews service in one cluster and verifying that traffic shifts to the other.

Simulate a regional failure

Scale the reviews Deployment in the ack-hangzhou cluster to zero replicas:

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the ack-hangzhou cluster name. In the left-side navigation pane, choose Workloads > Deployments.

  3. On the Deployments page, set Namespace to default. Click Scale in the Actions column for the reviews deployment.

  4. Set Desired Number Of Pods to 0 and click OK.

Configure outlier detection

Add outlier detection to the reviews DestinationRule so that unhealthy endpoints are ejected and failover is triggered.

  1. On the ASM instance details page, choose Traffic Management Center > DestinationRule in the left-side navigation pane.

  2. Click Edit YAML in the Actions column of the reviews DestinationRule.

  3. Add the following trafficPolicy block under spec and click OK: The following table explains each parameter and its role in failover behavior:

    ParameterValueDescription
    maxRequestsPerConnection1Limits each connection to a single request. This disables keep-alive, forcing new connections that can be routed to healthy endpoints.
    baseEjectionTime1mKeeps an unhealthy endpoint ejected for 1 minute before re-evaluating.
    consecutive5xxErrors1Ejects an endpoint after a single 5xx error.
    interval1sRuns the ejection scan every 1 second.
       spec:
         # ... existing content ...
         trafficPolicy:
           connectionPool:
             http:
               maxRequestsPerConnection: 1
           outlierDetection:
             baseEjectionTime: 1m
             consecutive5xxErrors: 1
             interval: 1s

Enable geolocation-based failover

ASM version 1.22.6.66 and later

  1. On the ASM instance details page, choose ASM Instance > Base Information in the left-side navigation pane.

  2. Click Configure a Geolocation-based Load Balancing next to Geolocation-based Load Balancing.

  3. Click Specify priority rules for regions. Set Region in which the failure occurs to cn-shanghai and The region to which the traffic is preferentially routed to cn-hangzhou.

  4. Click Add. Set Region in which the failure occurs to cn-hangzhou and The region to which the traffic is preferentially routed to cn-shanghai.

  5. Click Save Configuration.

ASM versions earlier than 1.22.6.66

  1. On the Base Information page, click Setting next to Geolocation-based Load Balancing.

  2. In the Geolocation-based Failover dialog box, configure the failover mapping:

    • When Policy is cn-shanghai, set Failover to to cn-hangzhou.

    • When Policy is cn-hangzhou, set Failover to to cn-shanghai.

  3. Click Confirm.

Verify failover

Run the following command to send 10 requests to the Bookinfo application and count responses from the v2 reviews service:

for ((i=1;i<=10;i++)); do
  curl http://<ingress-gateway-ip>/productpage 2>&1 | grep full.stars
done | wc -l

Replace <ingress-gateway-ip> with port 80 of the ingress gateway IP in the ack-hangzhou cluster.

Expected output:

20

Each request routed to the v2 reviews service returns two lines containing full stars. An output of 20 confirms that all 10 requests reached the v2 reviews service in the ack-shanghai cluster, which means failover is working.

Step 6: Configure inter-region traffic distribution

Important

Inter-region traffic distribution requires ASM version 1.22.6.66 or later.

Inter-region traffic distribution splits traffic across regions based on configurable weight percentages. Unlike failover, which activates only during outages, traffic distribution actively balances load across regions at all times.

The following table summarizes the intended traffic split configured below:

Source regionDestination regionTraffic percentage
cn-hangzhoucn-hangzhou (local)90%
cn-hangzhoucn-shanghai10%

Configure traffic distribution

Note

Geolocation-based load balancing defaults to failover mode. If failover is currently enabled, click Disable in the upper-right corner of the configuration page before enabling traffic distribution.

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the ASM instance name. In the left-side navigation pane, choose ASM Instance > Base Information.

  3. Click Configure a Geolocation-based Load Balancing next to Geolocation-based Load Balancing.

  4. Click Congifure a traffic distribution rule. Set Source to cn-hangzhou, Destination to cn-shanghai, and Traffic Percentage to 10%.

  5. Click Save Configuration.

Verify traffic distribution

Run the following command to send 10 requests and observe the distribution:

for ((i=1;i<=10;i++)); do
  curl http://<ingress-gateway-ip>/productpage 2>&1 | grep full.stars
done

Replace <ingress-gateway-ip> with port 80 of the ingress gateway IP in the ack-hangzhou cluster.

Expected output:

<!-- full stars: -->
<!-- full stars: -->

Out of 10 requests, approximately 9 reach the v1 reviews service in ack-hangzhou (no full stars output), and 1 reaches the v2 reviews service in ack-shanghai (2 lines of full stars). This confirms a 90%/10% traffic split matching the configured weights.

FAQ

Why does adding a cluster to ASM fail after connecting VPC networks through CEN?

The most likely cause is a missing or misconfigured inter-region data transfer plan in CEN. Without proper inter-region bandwidth, the ASM control plane in one region cannot reach the data plane cluster in another region, even though intra-region VPC connectivity works.

To fix this, verify and reconfigure the inter-region connection settings in CEN as described in Step 2: Connect VPC networks through CEN. Make sure inter-region bandwidth is allocated between the two transit routers.

What's next