For an enterprise, whether or not their services are on the cloud, service stability and continuity have always been crucial. To reduce the impact of uncontrollable factors on normal service operations, you must improve the availability and disaster recovery capabilities of your products. Although your products may already be highly available, you cannot ignore the important task of improving service availability and disaster recovery capabilities.

To improve service availability and disaster recovery capabilities, many users take advantage of these cloud products: Elastic Compute Service (ECS), Server Load Balancer (SLB), ApsaraDB for RDS, and Object Storage Service (OSS).

Zone

Zones are physical areas in the same region that have independent power grids and networks. The network latency is lower for ECS instances in the same zone.

Intranet communication is available across different zones in the same region, and fault isolation is supported between zones. The choice to deploy ECS instances in the same zone is a tradeoff that depends on factors such as network performance and disaster recovery requirements.

  • If your applications require high disaster recovery capabilities, we recommend that you deploy your ECS instances in different zones of the same region.
  • If your applications require low network latency between instances, we recommend that you deploy your ECS instances in the same zone.

In the Region List, you can view the number of zones in each region. Alternatively, you can use the Region List API in OpenAPI Explorer to view the list of all zones.

Product introduction

ECS

ECS is a basic cloud computing service provided by Alibaba Cloud. An ECS instance is a virtual computing environment that incorporates a CPU, memory, operating system, disks, bandwidth, and other basic server components. It is the operating entity presented to each user.

You can create ECS instances at any time according to your business needs, without having to purchase hardware in advance. As your service grows, you can resize the disks and increase the bandwidth of your ECS instances. When you no longer need an ECS instance, you can release it to reduce costs.

ECS instances themselves do not have high availability and disaster recovery capabilities. Instead, these capabilities are implemented through architecture construction.

SLB

SLB is a traffic distribution control service that distributes traffic to multiple backend ECS instances based on the routing algorithms. SLB extends application service capabilities and enhances application availability.

SLB sets a virtual service address to virtualize ECS instances into an application service pool with high performance and high availability. Then, it distributes requests from clients to ECS instances in the ECS instance pool based on the routing algorithms.

The following features allow SLB to improve the availability and disaster recovery capabilities of ECS instances:

  • SLB is deployed in clusters. Each cluster has a certain number of backend ECS instances to eliminate single point of failure (SPOF). This means that SLB is not affected if one or several backend ECS instances fail.

    The Layer-4 SLB (LVS) service, Layer-7 SLB (Tengine) service, control system, and other key components in the SLB system are all deployed in clusters to improve their scalability and availability.

  • Currently, most SLB instances are multi-zone instances, with primary and secondary instances located in the IDCs of different zones in the same city. When the IDC in which the primary instance is located experiences faults, services can quickly fail over to a secondary instance, supporting disaster recovery and the high availability of services. Click here for more information on the distribution of multiple zones in each region.

ApsaraDB for RDS

ApsaraDB for RDS is a stable, reliable, and scalable online database service. Based on the distributed file system and high-performance storage of Alibaba Cloud, ApsaraDB for RDS supports MySQL, SQL Server, PostgreSQL, and PPAS (Postgres Plus Advanced Server, a database highly compatible with Oracle) engines. It provides a complete set of solutions for disaster recovery, backup, monitoring, migration, and other functions, allowing you to focus on services rather than database O&M.

  • For more information about the basic edition of ApsaraDB for RDS, click here.
  • In the dual-host high-availability version of ApsaraDB for RDS, primary and secondary instances can be deployed in the same zone. When the primary instance experiences a fault, it fails over to a secondary instance, providing high availability and disaster recovery capabilities.
  • In multi-zone ApsaraDB for RDS, primary and secondary instances are deployed in different zones.
  • You can use Data Transmission Service (DTS) to synchronize and migrate data between ApsaraDB for RDS instances.
OSS

OSS is a massive, secure, cost-effective, and highly reliable cloud storage service provided by Alibaba Cloud. You can upload and download data for any application, anytime, anywhere by calling APIs. In addition, you can perform simple data management operations in the web console. OSS can store any type of file and is therefore suitable for various websites, development enterprises, and developers. Your OSS instance is only billed for the capacity that you actually use, allowing you to focus on your core services.

Files are chunked for storage. By default, three replicas of each chunk are saved on chunkserver nodes in different racks. In the Apsara Distributed File System cluster, up to one master and two chunkserver nodes can fail without affecting services, while multiple KVServers and WS nodes can fail.

The following describes the architecture and construction process for services with high availability and disaster recovery capabilities in detail.

Multi-zone SLB instances + ECS instances in different zones

In the following figure, ECS instances are bound to different zones under an SLB instance. This way, when Zone A works normally, user access traffic follows the path of the blue solid line shown in the figure. When a fault occurs in Zone A, user access traffic is distributed to the path of the black dotted line. This prevents a fault in a single zone from causing service unavailability, and reduces latency by selecting zones between different products.

Perform the following steps to construct this architecture:

  1. Log on to the Alibaba Cloud console and click Server Load Balancer. On the page that appears, click Create Server Load Balancer.

    Here, we use the China (Beijing) region as an example and purchase a multi-zone instance, with primary zone B and secondary zone A.

  2. Create ECS instances in both the primary and secondary zones of the SLB instance.
    Create a test instance in zone A and zone B of the China (Beijing) region. In this example, we use the default security group and VPC network with a 1-core 2-GB memory CentOS 7.2 configuration.
  3. Create listeners and add backend servers (ECS instances).
    1. In the SLB console, locate the instance you created, and click Manage.
    2. Click Backend Server and select Excluded Servers. Then, find your instance and click Add.
    3. After completing the process, you can view your ECS instances and their weights on the Included Servers page.
    4. Click the Listener tab on the left. On the tab page that appears, click Add Listener. Set listener attributes as needed. In this example, we use the Layer-4 TCP mode, set the listener port to port 80, set the backend forwarding port to port 80, and use the default weighted round robin method. We also enable session persistence and use the default 1,000-second time-out period.
    5. Set the health check mode to TCP and the backend check port to 80.
    6. After completing these steps, you can view the added listener and its status on the Listener tab page.
      Note You only need to deploy the relevant service on the ECS instances and listen to port 80. Then, resolve the domain name to the public IP address of the SLB instance, so the SLB instance can forward requests to backend ECS instances and provide service.

Multi-zone SLB instances + ECS instances in different zones + highly available ApsaraDB for RDS instances

The following figure shows the multi-zone ApsaraDB for RDS architecture.

In regions where multi-zone ApsaraDB for RDS is not supported, you can create an ApsaraDB for RDS instance in each zone, with the secondary zone used as the backup database. This database is synchronized with the ApsaraDB for RDS instance in the primary zone.

Perform the following steps to construct the multi-zone RDS architecture:

  1. After deploying a multi-zone SLB instance and multiple ECS instances in different zones, purchase ApsaraDB for RDS instances.
  2. Select a region that supports multi-zone ApsaraDB for RDS, as shown in the following figure.
  3. After purchasing ApsaraDB for RDS instances, you can view them in the console.

    In addition, you can view the high availability information of ApsaraDB for RDS instances and switch between primary and secondary instances in the console, as shown in the following figure.

The following describes the example in which an ApsaraDB for RDS instance is deployed in each zone.

  1. Purchase dual-host highly available ApsaraDB for RDS instances in Zone A and Zone B, respectively.
  2. Create a DTS synchronization task.

High availability - remote disaster recovery

When multiple zones are available in the same city and an environment is deployed in a remote region as well, the resulting architecture greatly increases the service availability and achieves remote disaster recovery.

Note Configure the DNS resolution to specify the ultimate service access region and use DTS for data synchronization between ApsaraDB for RDS instances.