The following architecture is recommended for a multi-active disaster recovery solution on a hybrid cloud.

Key points of the architecture

In a hybrid cloud-based multi-active architecture, remote redundant resources are used to ensure that services run properly even in extreme circumstances. When building an architecture running on a hybrid cloud that integrates your on-premises IDC and off-premises IDC, pay attention to the following points:
  • Unit node isolation:

    Before deploying an active geo-redundancy solution, you must solve the latency issues that are often associated with geographic distribution. Latency issues may result in data inconsistency and inaccuracy if users submit requests to modify the same row of records in the databases of unit nodes in different regions. Additionally, a long latency may occur if one operation requires multiple requests for data in a unit node and the node has to call a chain of services with interdependencies. Therefore, you have to spend lots of time handling latency issues. Unit node isolation is at the core of an active geo-redundancy solution. To be more specific, each unit node has independent read/write access permissions, and multiple unit nodes cannot modify the same row of records in different regions. To isolate unit nodes, you must divide them into different categories based on a certain dimension.

  • Properly categorizing data into unit nodes:
    You need to analyze your business before determining how to manage read and write access permissions of each unit node. For example, placing orders is the most important process for e-commerce business. To reduce restructuring charges and improve user experience, the optimal choice is to categorize data into unit nodes based on the user ID.

    In this case, you can perform read/write operations on buyers' orders in the corresponding unit node, and data is not read or written across unit nodes. However, other data that is not related to buyers' information may be distributed across unit nodes. For example, sellers' operations of modifying product data may involve multiple unit nodes. If necessary, you can use read/write split to ensure the eventual consistency of buyers' and sellers' data. If eventual consistency cannot meet your needs, you must ensure that data can be read or written across unit nodes.

To provide the optimal service experience, the hybrid cloud-based multi-active architecture must be further optimized based on the service scenario and application implementation method. For assistance with architecture design, you can contact our Professional Services. By taking a simple IT system as an example, the following section describes how to build a hybrid cloud-based active-active architecture in which instances are deployed in more than two Alibaba Cloud zones.

Recommended architecture

Architecture description:
  • We recommend that you activate Express Connect, build a hybrid cloud through a leased line, and deploy your application system in your on-premises IDC and off-premises IDC in exactly the same manner. In this way, you can deploy a hybrid cloud-based active-active solution that consists of on-premises and off-premises IDCs.
  • Both IDCs provide services to achieve load balancing.
    • Access side:

      Use intelligent DNS to distribute traffic to both IDCs and make your application stateless. Deploy your application system in both IDCs in exactly the same manner to overcome the carrier's regional restrictions and split traffic by region.

    • Application deployment:

      Deploy your application system in the on-premises and off-premises IDCs in exactly the same manner. In each IDC, mount the application cluster to the SLB instance in the IDC, which distributes the traffic to a node in the application cluster.

    • Cache side:
      We recommend that you use ApsaraDB for Redis to read the application cache. Alibaba Cloud ApsaraDB for Redis is compatible with the open source Redis protocol. When instances deployed in on-premises and off-premises IDCs are both ApsaraDB for Redis instances, two-way read-write synchronization is supported. The Conflict-free Replicated Data Type mechanism is used to detect and remove data conflicts and ensure data consistency. When open source Redis is used in the on-premises IDC, the ApsaraDB for Redis instance can receive the one-way read-write synchronization information from the Redis instance in the on-premises IDC.
      Note When open source Redis is used in the on-premises IDC, the on-premises Redis instance may be incompatible with the enhanced cache processing capability of the ApsaraDB for Redis instance. The ApsaraDB for Redis instance can read data from, but cannot write data to, the on-premises Redis instance. We recommend that you deploy an ApsaraDB for Redis instance in your on-premises IDC.
    • Data side:

      Application data is stored in off-premises and on-premises databases, and data is synchronized between the databases through DTS to ensure mutual data consistency.

Advantages of the architecture

  • Multiple IDCs: Alibaba Cloud deploys multiple IDCs around the world. You can purchase Alibaba Cloud products and deploy them in the nearest or most appropriate region.
  • Stability: Each region and each product are stable. After multiple rounds of iteration, SLB, ECS, ApsaraDB for Redis, ApsaraDB for RDS, and other key Alibaba Cloud products now provide excellent disaster recovery capabilities. Fine-grained disaster recovery control can be achieved through additional functional product modules.
  • Scalability: You can scale your existing services out or in, or up or down, or purchase additional services based on your needs.