This topic describes the high-availability architecture of Server Load Balancer (SLB) in terms of different system designs and product configurations to meet different business needs. You can also use SLB together with Alibaba Cloud DNS to achieve cross-region disaster recovery. SLB provides a multi-zone service availability of 99.99% and a single-zone service availability of 99.90%.

High availability of the SLB system

Deployed in clusters, SLB can synchronize sessions among node servers to protect the SLB system from single points of failure (SPOFs). This improves redundancy and guarantees service stability. Layer-4 SLB uses the open source software Linux Virtual Server (LVS) and Keepalived to achieve load balancing. Layer-7 SLB uses Tengine to achieve load balancing. Tengine, a Web server project based on Nginx, adds advanced features dedicated for high-traffic websites.

Requests from the Internet reach the LVS cluster through Equal-Cost Multi-Path (ECMP) routing. Each LVS in the LVS cluster synchronizes the session to other LVS machines through multicast packets, thereby implementing session synchronization among machines in the LVS cluster. At the same time, the LVS cluster performs health checks on the Tengine cluster and removes abnormal machines to guarantee the availability of layer-7 SLB.

Best practice:

Session synchronization protects persistent connections from being affected by server failures in the cluster. However, for short connections or when the session synchronization rule is not triggered by the connection (the three-way handshake is not completed), server failures in the cluster may still affect user requests. To prevent session interruptions caused by machine failures in the cluster, you can add a retry function to the service logic to reduce the impact on user access.

High availability of a single SLB instance

To provide more reliable services, multiple zones for SLB are deployed in most regions. If a primary zone becomes unavailable, SLB rapidly switches to a secondary zone to restore its service capabilities within 30 seconds. When the primary zone becomes available, SLB automatically switches back to the primary zone.

Note The primary zone and secondary zone form zone-level disaster tolerance. An SLB instance switches to the secondary zone only when Alibaba Cloud detects that the current zone is unavailable due to power outage or optical cable failures rather than the failure of an instance.
Best practice:
  1. We recommend that you create an SLB instance in a region with multiple zones for disaster tolerance.
  2. We recommend that you deploy ECS instances in both the primary and secondary zones for disaster recovery. You can set the zone to which most ECS instances belong as the primary zone to minimize access latency. High availability of a single SLB instance

High availability of multiple SLB instances

You can configure multiple SLB instances if a single SLB instance cannot meet your availability requirements. You can use Alibaba Cloud DNS to schedule requests or achieve cross-region disaster recovery through global SLB.

Best practice:

You can deploy SLB instances and ECS instances in multiple zones of a region or in multiple regions and schedule access requests by using Alibaba Cloud DNS.

High availability of multiple SLB instances

High availability of backend ECS instances

SLB checks the service availability of backend ECS instances by performing health checks. Health checks improve the overall availability of frontend services and help reduce the service impact when backend servers are abnormal.

When SLB discovers that an instance is unhealthy, it distributes requests to other healthy ECS instances, and only resumes distributing requests to the ECS instance when it has restored to a healthy status. For more information, see Health check overview.

Best practice:

You must enable and correctly configure the health check function. For more information, see Configure health checks.