Edge Node Service (ENS) lets you run workloads close to end users, but proximity alone does not guarantee business continuity. This topic explains the HA capabilities available on the edge cloud and shows how to build a highly available application within a single node and across multiple nodes.
Choose a strategy
ENS supports two complementary high availability (HA) patterns. Choose based on your failure scope and recovery objectives:
| Pattern | Use this when... | How it works |
|---|---|---|
| Local high availability | Instance-level failures or traffic spikes are your primary concern | Distributes traffic and data across multiple instances on the same edge node |
| Cross-node high availability | Single-node or single-region outages are a risk you must survive | Replicates services and data across two or more edge nodes |
Use both patterns together for comprehensive coverage.
Local high availability
Local HA protects your workload from instance-level failures and traffic spikes within a single edge node.
Disaster recovery for multiple compute instances
To withstand heavy loads and eliminate single points of failure (SPOFs), distribute your workload across multiple ENS instances and use Edge Load Balancer (ELB) to manage traffic.
Traffic distribution
Deploy your business on multiple ENS instances and configure ELB to distribute incoming traffic across them.
Health checks and failover
ELB continuously monitors backend ENS instances using health checks. The full failover lifecycle works as follows:
ELB detects an unhealthy instance.
ELB stops routing new requests to that instance and redirects traffic to the remaining healthy instances.
Once the instance recovers, ELB automatically resumes sending requests to it (failback).
For more information about ELB, see ELB.
High-availability virtual IP addresses (HAVIPs)
Some applications require a stable IP address for external services. For example, Keepalived and Heartbeat use ARP (Address Resolution Protocol) to announce IP addresses and keep them stable during failover.
HAVIPs enable the same pattern on the edge cloud. A HAVIP floats between instances on the same node, so the private IP address exposed to external services stays unchanged even when the active instance fails.
For more information, see What is an HAVIP?
Data disaster recovery
ENS provides three storage options — disks, NAS (Network Attached Storage), and Edge Object Storage (EOS) — each with built-in redundancy.
Multi-replica redundancy
Within a node, data on disks and NAS is stored in three replicas across the storage cluster, providing availability and reliability of at least 99.9999%. EOS uses erasure coding for data durability and availability.
Snapshots
Create snapshots to back up disk data at a point in time. Snapshots capture the current state of all data blocks on a disk. Use them to:
Restore data after accidental deletion or corruption
Build development and testing environments that mirror production
Create custom images for batch deployment across nodes
Cross-node high availability
A single edge node is still a potential point of failure. To survive node-level or region-level outages, deploy your workload on two or more edge nodes and configure failover between them.
The overall approach is:
Plan your topology: deploy your business system on multiple nodes using active-active or active/standby mode.
Connect the nodes: use Edge Network Acceleration (ENA) to build a secure internal network between nodes.
Route traffic: use Alibaba Cloud DNS to direct public traffic and DNS for Multicloud Integration for internal service failover.
Replicate data: replicate images and snapshots across nodes so you can restore on any node.
Cross-node network connectivity
ENA connects virtual private clouds (VPCs) across different regions and edge nodes over a high-speed, secure private network. It supports:
Accelerated connections between edge nodes
Accelerated connections between data centers
Accelerated connections between the internal network and Alibaba Cloud central cloud
Accelerated connections between different types of public clouds
For details, visit the ENA product page.
Cross-node service high availability
Choose a deployment mode
Before configuring traffic management, select the mode that fits your requirements:
| Mode | Use this when... | Trade-off |
|---|---|---|
| Active-active | You want all nodes to handle traffic simultaneously and can tolerate partial capacity loss if one node fails | Maximum throughput; no idle capacity |
| Active/standby | You need a clear, deterministic recovery path and can accept the standby node being idle during normal operation | Simpler failover logic; standby capacity is reserved |
Public traffic failover with Global Traffic Manager
Global Traffic Manager (GTM), provided by Alibaba Cloud DNS, controls how public DNS resolves your domain across nodes. Configure the following in GTM:
Domain name: the domain that your business exposes to external users.
Address pools: group the elastic IP addresses (EIPs) from each node into separate address pools.
Load balancing policy: set how GTM selects addresses within a pool — for example, by weight.
Address working mode: choose Intelligently Returned or Always Online for each address.
Health check: define the protocol and port GTM uses to probe each address.
Access policy: configure intelligent DNS resolution, primary and standby address pool sets, and the switchover policy between them.
For active/standby disaster recovery: designate one pool as primary and another as standby.
For active-active load balancing: mark all pools as available address pools.
GTM includes configuration templates — Primary/Secondary Disaster Recovery and Multi-active Load Balancing — to accelerate setup. For more information, see Global Traffic Manager 3.0Global Traffic Manager 3.0.
Internal service failover with DNS for Multicloud Integration
For internal services accessed by domain name, deploy DNS for Multicloud Integration in private mode. It provides all-in-one intelligent resolution services for internal networks, including:
Internal DNS high availability: primary/secondary deployment across nodes. If the primary DNS fails, the cross-node secondary DNS takes over automatically.
Internal service routing: configure DNS resolution records and policies to implement active-active or active/standby routing for internal services.
Intelligent resolution for both internal and external domain names
Disaster recovery and scheduling of primary/secondary data centers
Unified management of Alibaba Cloud DNS
Replaces common open-source DNS services
Cross-node data disaster recovery
Images
Edge cloud system images are stored in Alibaba Cloud central cloud, inheriting its high availability and reliability. To make a custom image available across all edge nodes, create a custom image from an ENS instance. When deploying the same service on another node, pull the image from the central cloud to create a new ENS instance.
For more information, see Images.
Snapshots
In addition to local disk backups, ENS supports cross-node snapshot replication. Replicate snapshots from one node to another so that if a node fails, you can restore your business and data on a different node.
Databases
ENS does not provide native database services. Deploy your own database and protect it using one of the following approaches:
Use the database engine's built-in replication or disaster recovery mechanism to synchronize data across nodes.
Use ENS snapshots to back up database data to other edge nodes.
Shared responsibilities
High availability on the edge cloud is a shared responsibility:
Alibaba Cloud maintains the stability of ENS and ensures availability meets the service level agreement (SLA).
You design your application architecture to support failover and ensure business continuity when failures occur.
Implement the solutions described in this topic to meet your availability targets.