All Products
Search
Document Center

ENS:Service continuity

Last Updated:Jun 18, 2026

Edge Node Service (ENS) reduces access latency by deploying applications closer to end users. To maintain reliable service delivery, you also need to plan for high availability. This topic covers key capabilities for building resilient edge computing applications.

Shared responsibilities

Service continuity in the cloud is a shared responsibility between Alibaba Cloud and its customers.

Alibaba Cloud ensures the stability of ENS and guarantees that its availability meets the commitments in the service level agreement (SLA).

Customers are responsible for designing their service architecture so that failover can be performed when necessary to maintain service continuity.

We recommend that you implement the following solutions to achieve service continuity for your edge computing applications.

Best practices

Multi-instance disaster recovery

To achieve high availability, applications must handle heavy loads and avoid service interruptions caused by single points of failure (SPOFs). You can use Edge Load Balancer (ELB) to distribute traffic across multiple ENS instances. For more information, see What is ELB? If an ENS instance fails, ELB redirects traffic to other instances to maintain service continuity.

Cross-region primary/secondary disaster recovery

Application primary/secondary switchover

When you deploy an application on an edge node, deploy backups on other edge nodes or in other Alibaba Cloud regions to guard against region-level faults.

If a region-level fault occurs, Global Traffic Manager (GTM) can automatically redirect the domain name to applications in other regions, ensuring service continuity. For more information, see What is Global Traffic Manager?

You can deploy the secondary service in other ENS regions or nearby Alibaba Cloud regions. Note that enabling the secondary service may increase access latency for users.

Data backup and restoration

Traffic redirection helps prevent service interruptions during region-level failures. However, data services in the failed region may become unavailable.

To keep your services running correctly after a failover, design a data synchronization solution that replicates data from the primary region to the secondary region during normal operation.

For example, you can perform the following operations:

  • Write data to the storage services in both the current region and the secondary region. This keeps the data nearly identical across regions, but may increase write latency.

  • Write data to the primary region first, then asynchronously replicate it to the secondary region. This avoids additional write latency, but data in the secondary region may lag behind the primary region during a failover.

You also need to design a restoration mechanism. After the primary region recovers, synchronize any new data recorded by the secondary service during the fault period back to the primary region. This prevents data loss when users are switched back to the primary service.

Deployment architecture

Combine the preceding practices to maximize service availability. The following figure shows a deployment architecture that uses ELB, primary/secondary switchover, and data backup and restoration.

image

Use these capabilities based on your business requirements.

  • The primary service is deployed on an edge node in Switzerland. It uses multiple instances with ELB to prevent service interruptions caused by SPOFs.

  • The secondary service is deployed on a nearby edge node in Germany and also uses multiple instances with ELB. Alternatively, you can deploy the secondary service in a nearby Alibaba Cloud region. During normal operation, data is synchronized from the primary region to the secondary region to keep the data consistent when a failover occurs.

  • GTM is integrated into the domain name resolution system.

    • GTM periodically performs health checks on the primary service at the frequency you specify.

    • If the primary service is healthy, the domain name resolves to the primary service.

    • If the primary service becomes unhealthy and the number of failed health checks reaches the threshold you specify, GTM switches the domain name to the secondary service for automatic failover.

  • During a primary service failure, traffic is redirected to the secondary service and data is recorded in its storage service. After the fault is resolved, synchronize data from the secondary service back to the primary service.