×
Community Blog Disaster Recovery Solutions Based on Multi-Cluster Gateways of ACK One

Disaster Recovery Solutions Based on Multi-Cluster Gateways of ACK One

This article describes how to use the multi-cluster gateway ACK One to implement zone-disaster recovery of public cloud applications.

By Jing Cai

Overview

In most cases, the business architecture of an enterprise can be divided into the following layers from the top down: access layer, application layer, and data layer.

Access layer: serves as an entry point for ingress traffic. This layer routes ingress traffic to the backend application layer based on forwarding rules.

Application layer: hosts applications. This layer processes ingress traffic and sends the results back to the upper layer.

Data layer: stores data. This layer provides data and storage services for the application layer.

When you build a disaster recovery system for your business, you must enforce recovery measures on each layer.

Access layer: Cross-AZ high availability (HA) is supported. Active zone-disaster recovery and cross-region disaster recovery can also be implemented by controlling routers at the application layer.

Application layer: The application layer must be deployed in multiple clusters across availability zones or in multiple regions.

Data layer: Disaster recovery and data synchronization on the data layer.

This article describes how to use the multi-cluster gateway of Distributed Cloud Container Platform for Kubernetes (ACK One) to implement zone-disaster recovery of public cloud applications, active zone-disaster recovery of hybrid cloud applications, and cross-region disaster recovery.

ACK One

ACK One is an enterprise-grade distributed cloud container platform launched by Alibaba Cloud. ACK One is designed for scenarios such as hybrid cloud, multi-cluster, distributed computing, and disaster recovery. ACK One provides centralized management capabilities for multiple clusters. ACK One registered cluster can be used to connect other public cloud providers and on-premises Kubernetes clusters to the Container Service for Kubernetes (ACK) console. Fleet also provides unified application distribution, traffic management, observability, operational management, and security management for the registered clusters, and for the on-cloud ACK and ACK Edge clusters.

The multi-cluster gateway of ACK One is a service provided by Alibaba Cloud for application disaster recovery and north-south traffic management in hybrid cloud or multi-cluster environments. The service helps you quickly implement zone-disaster recovery or cross-region disaster recovery for hybrid cloud and multi-cluster applications, and facilitates multi-cluster traffic management.

The multi-cluster gateway of ACK One provides capabilities by hosting multi-cluster Ingress controllers on the Fleet instance and processing multi-cluster Ingress in a unified manner. The following section describes the main process:

• Create a Fleet instance

Associate a cluster: Associate ACK clusters or registered clusters with the Fleet instance to implement centralized management.

Create a multi-cluster gateway: You can use AlbConfig or MseIngressConfig to create an ALB multi-cluster or MSE multi-cluster gateway on the Fleet instance.

Create an Ingress: Create an Ingress on the Fleet instance, bind a Service in the sub-cluster, and configure a forwarding rule or router for the Service in the sub-cluster.

Use a multi-cluster gateway to access the service: You can use the domain name or IP address of the gateway to access the Service in the sub-cluster.

1

ACK One multi-cluster gateways provide the following benefits:

• Fully managed and O&M-free gateways.

• Reduce the number of gateways and costs. ACK One multi-cluster gateways serve as region-level multi-cluster gateways for layer-7 north-south traffic management.

• Simplify traffic management in multi-cluster environments. You can configure forwarding rules for multi-cluster Ingresses on the Fleet instance instead of configuring the rules in each cluster.

• Designed for cross-zone HA.

• Provide millisecond-level fallback. If the backend server error occurs in a cluster, multi-cluster gateways smoothly redirect traffic to other backends.

1. Active Zone-disaster Recovery of Public Cloud Applications

Active zone-disaster recovery is a solution selected by most customers. Compared with the active-standby zone-disaster recovery solution, the active zone-disaster recovery solution has the following advantages:

• Higher resource utilization and lower costs.

Higher service quality and stronger fault tolerance: The number of Service replicas improves service quality and response speed, and handles traffic peaks better. Service interruptions are not caused by switchovers when faults occur. In addition, system updates or maintenance can be performed without service interruption.

Enhanced scalability: If a zone has insufficient resources, you can quickly scale the application in other zones that have available resources.

ACK One allows you to use ALB multi-cluster gateways and MSE multi-cluster gateways to implement cross-AZ disaster recovery. The following figure shows the architecture

2

1.  Create Cluster 1 and Cluster 2 in AZ 1 and AZ 2 in the same region.

2.  Use ACK One GitOps to distribute the Service to Cluster 1 and Cluster 2.

3.  Create multi-cluster gateways by using ACK One Fleet instances.

4.  After a multi-cluster gateway is created, you can create an Ingress on the Fleet instance to implement zone-disaster recovery. When a cluster is abnormal, traffic is automatically rerouted to a healthy cluster. Multi-cluster gateways also provide various capabilities.

a) Load balancing and forwarding traffic based on the total number of replicas across multiple clusters.

b) Load balancing and forwarding traffic based on specified weights.

c) Forwarding traffic based on HTTP headers, which facilitates canary releases.

d) Automatic switching of traffic in milliseconds or seconds in case of application or cluster failures.

5.  Data synchronization based on ApsaraDB RDS has middleware dependencies.

Compare with Disaster Recovery by Using DNS Traffic Distribution

Compared with the active zone-disaster recovery solutions based on DNS traffic distribution, the active zone-disaster recovery system based on ACK One multi-cluster gateways has the following advantages:

  • Region-level global SLB and centralized management of multi-cluster layer-7 north-south traffic: Reduce the number of gateways and costs. DNS-based solution does not support some cross-cluster routing capabilities, such as the session persistence feature required by QUIC 0-RTT.
  • Millisecond-level and second-level failovers eliminate DNS caching issues.

    • Multi-cluster gateway based on ACK One: If the service in a cluster fails, traffic can be rerouted to other clusters in milliseconds or seconds. Failover is smoother than DNS-based traffic distribution.
    • When a failure occurs and the IP address is switched, DNS-based solutions often lead to service unavailability for minutes due to client caching. To mitigate the caching issue, the value of Time to Live (TTL) is reduced, which results in a significant increase in DNS access requests and leads to higher operational costs.
  • Simplified management: Manage Ingress configurations and services in one control panel (Fleet). This provides an easier method to extend and maintain services or applications and reduces management costs.
  • Transparent cluster migration during cluster update or rebuild: Traffic is migrated to a healthy cluster based on rules and then forwarded back after the update or rebuild is complete.

The following figure shows the architecture of common DNS-based zone-disaster recovery solutions.

3

2. Active Zone-disaster Recovery of Hybrid Cloud Applications

ACK One also supports ALB multi-cluster gateways and MSE multi-cluster gateways to implement a hybrid cloud or multi-cloud zone-disaster recovery system. This allows you to quickly build disaster recovery capabilities for on-premises services on Alibaba Cloud and improve service capabilities by using on-premises elasticity capabilities.

The following network requirements must be met:

  • The node CIDR block and the pod CIDR block between the on-cloud virtual private cloud (VPC) and the on-premises cluster must be connected.
  • If the on-premises cluster uses an overlay network plug-in:

    • The ALB multi-cluster gateway must be implemented by using NodePort type Service in the on-premises cluster.
    • The MSE multi-cluster gateway cannot provide a mature solution for connecting the VPC and the pod CIDR block. It requires routing traffic to a fixed node, which may cause single points of failure and bottlenecks.

The following figure shows the architecture of the active zone-disaster recovery system, which is based on ACK One ALB multi-cluster gateways (MSE multi-cluster gateways share the same architecture):

4

  1. Connect a Kubernetes cluster deployed in a data center or on a third-party public cloud to the registered cluster (AZ 2) in ACK. Use an Express Connect circuit to connect the data center to the VPC for communication.
  2. Create an ACK One Fleet instance in the same region and VPC as the registered cluster, and create an ACK cluster in AZ 1.
  3. Use ACK One GitOps to distribute the Service to Cluster 1 and IDC Cluster.
  4. Create multi-cluster gateways by using ACK One Fleet instances.
  5. After the multi-cluster gateway is created, you can create an Ingress on the Fleet instance to implement zone-disaster recovery. When an exception occurs in a cluster, traffic is automatically rerouted to a healthy cluster.
  6. Data synchronization based on MySQL or ApsaraDB RDS has middleware dependencies.

3. Geo-disaster Recovery

Geo-disaster recovery can prevent regional disaster damage. However, geo-disaster recovery has higher latency and higher fees and maintenance costs. The geo-disaster recovery system based on ACK One multi-cluster gateways and the geo-disaster recovery system based on DNS have different scenarios. The following describes their architectures and their respective scenarios.

Geo-disaster Recovery Solutions Based on ACK One Multi-cluster Gateways

ACK One allows you to use ALB multi-cluster gateways to implement a geo-disaster recovery solution. This solution is suitable for the following scenarios:

• Cross-region HA is required and resources in the local region are insufficient. For example, in the current AI boom, GPU resources are extremely scarce.

• Client applications do not require low latency but require improved multi-cluster traffic management.

The following figure shows this architecture.

5

  1. Create a cluster (ACK Cluster 1), an ACK One Fleet instance, and an ALB multi-cluster gateway in Region 1. Create a cluster (ACK Cluster 2) and install the ALB Ingress controller in ACK Cluster 2. After you install the ALB Ingress controller, an ALB instance (ALB 2) is automatically created in Region 2 for cold backup.
  2. Global Traffic Manager (GTM) is used to connect the ALB multi-cluster gateway in Region 1 to the ALB instance in Region 2. If Region 1 goes down, you can switch to Region 2.
  3. On the Fleet instance, multi-cluster gateways are used to implement flexible layer-7 traffic forwarding, such as QUIC 0-RTT and header-based forwarding between cross-region clusters, and can provide automatic fallback to Cluster 1 when Region 2 goes down.
  4. ACK Cluster 1 and ACK Cluster 2 are connected to each other by using Cloud Enterprise Network (CEN) or Virtual Private Cloud (VPC) peering connections. This way, traffic is forwarded across regions by using Express Connect circuits to ensure reliability.
  5. Data synchronization based on ApsaraDB RDS has middleware dependencies.

The following section describes the benefits of cross-region DR solutions based on ACK One multi-cluster gateways:

Enhanced multi-cluster traffic routing: This solution provides content-based advanced traffic routing and a health check mechanism that is more flexible than GTM to meet the requirements of complex scenarios.

Centralized multi-cluster traffic management: This solution uses an ACK One Fleet instance as a unified control plane for Ingress configurations. This simplifies service extensions and application maintenance and reduces management costs.

Mitigation of DNS Client Cache Issues: In the preceding disaster recovery scenarios, service exceptions or cluster exceptions occur more frequently. In comparison, cross-region DR solutions do not need to switch IP addresses. Failover is possible within milliseconds or seconds.

The architecture of this solution implements disaster recovery based on ALB multi-cluster gateways and GTM. ALB multi-cluster gateways can manage and forward traffic to multiple clusters in a centralized manner.

• If cluster errors and service exceptions in Region 1 and errors in Region 2 occur, the ALB multi-cluster gateway can automatically redirect traffic to healthy clusters without the need to switch the DNS IP address.

• GTM switches the IP address based on health check only when Region 1 is down or the ALB service in Region 1 is down.

DNS-based Geo-disaster Recovery Solutions

The advantage of the DNS-based geo-disaster recovery solution is that the GTM is global level and is suitable for scenarios such as nearby access.

The following figure shows the architecture of the DNS-based geo-disaster recovery solution.

6

  1. Create Cluster 1 and Cluster 2 in different regions and configure an ALB, NLB, or SLB instance in each cluster.
  2. Use ACK One GitOps to distribute the Service to Cluster 1 and Cluster 2.
  3. GTM is connected to the ALB instance, NLB instance, or SLB instance that provides the backend service in two ACK clusters to implement geo-disaster recovery. When an exception occurs in a cluster, GTM automatically switches the IP address to reroute traffic to another healthy cluster.
  4. Data synchronization based on ApsaraDB RDS has middleware dependencies.

Summary

In summary, the multi-cluster gateway of ACK One can help you quickly build an active zone-disaster recovery system, a hybrid cloud zone-disaster recovery system, and a geo-disaster recovery system. ACK One also allows you to smoothly failover in milliseconds or seconds. This allows you to manage and scale multi-cluster services and reduces management costs. For more information, see Overview of multi-cluster gateways and Multi-cluster disaster recovery.

0 1 0
Share on

Alibaba Container Service

196 posts | 33 followers

You may also like

Comments