This topic describes how to use Edge Node Service (ENS) to build highly available applications. You can achieve high availability both within a single node and across multiple nodes. - ENS

Edge Node Service (ENS) lets you run workloads close to end users, but proximity alone does not guarantee business continuity. This topic explains the HA capabilities available on the edge cloud and shows how to build a highly available application within a single node and across multiple nodes.

Choose a strategy

ENS supports two complementary high availability (HA) patterns. Choose based on your failure scope and recovery objectives:

Pattern	Use this when...	How it works
Local high availability	Instance-level failures or traffic spikes are your primary concern	Distributes traffic and data across multiple instances on the same edge node
Cross-node high availability	Single-node or single-region outages are a risk you must survive	Replicates services and data across two or more edge nodes

Use both patterns together for comprehensive coverage.

Local high availability

Local HA protects your workload from instance-level failures and traffic spikes within a single edge node.

Disaster recovery for multiple compute instances

To withstand heavy loads and eliminate single points of failure (SPOFs), distribute your workload across multiple ENS instances and use Edge Load Balancer (ELB) to manage traffic.

Traffic distribution

Deploy your business on multiple ENS instances and configure ELB to distribute incoming traffic across them.

Health checks and failover

ELB continuously monitors backend ENS instances using health checks. The full failover lifecycle works as follows:

ELB detects an unhealthy instance.
ELB stops routing new requests to that instance and redirects traffic to the remaining healthy instances.
Once the instance recovers, ELB automatically resumes sending requests to it (failback).

For more information about ELB, see ELB.

High-availability virtual IP addresses (HAVIPs)

Some applications require a stable IP address for external services. For example, Keepalived and Heartbeat use ARP (Address Resolution Protocol) to announce IP addresses and keep them stable during failover.

HAVIPs enable the same pattern on the edge cloud. A HAVIP floats between instances on the same node, so the private IP address exposed to external services stays unchanged even when the active instance fails.

For more information, see What is an HAVIP?

Data disaster recovery

ENS provides three storage options — disks, NAS (Network Attached Storage), and Edge Object Storage (EOS) — each with built-in redundancy.

Multi-replica redundancy

Within a node, data on disks and NAS is stored in three replicas across the storage cluster, providing availability and reliability of at least 99.9999%. EOS uses erasure coding for data durability and availability.

Snapshots

Create snapshots to back up disk data at a point in time. Snapshots capture the current state of all data blocks on a disk. Use them to:

Restore data after accidental deletion or corruption
Build development and testing environments that mirror production
Create custom images for batch deployment across nodes

Cross-node high availability

A single edge node is still a potential point of failure. To survive node-level or region-level outages, deploy your workload on two or more edge nodes and configure failover between them.

The overall approach is:

Plan your topology: deploy your business system on multiple nodes using active-active or active/standby mode.
Connect the nodes: use Edge Network Acceleration (ENA) to build a secure internal network between nodes.
Route traffic: use Alibaba Cloud DNS to direct public traffic and DNS for Multicloud Integration for internal service failover.
Replicate data: replicate images and snapshots across nodes so you can restore on any node.

Cross-node network connectivity

ENA connects virtual private clouds (VPCs) across different regions and edge nodes over a high-speed, secure private network. It supports:

Accelerated connections between edge nodes
Accelerated connections between data centers
Accelerated connections between the internal network and Alibaba Cloud central cloud
Accelerated connections between different types of public clouds

For details, visit the ENA product page.

Cross-node service high availability

Choose a deployment mode

Before configuring traffic management, select the mode that fits your requirements:

Mode	Use this when...	Trade-off
Active-active	You want all nodes to handle traffic simultaneously and can tolerate partial capacity loss if one node fails	Maximum throughput; no idle capacity
Active/standby	You need a clear, deterministic recovery path and can accept the standby node being idle during normal operation	Simpler failover logic; standby capacity is reserved

Public traffic failover with Global Traffic Manager

Global Traffic Manager (GTM), provided by Alibaba Cloud DNS, controls how public DNS resolves your domain across nodes. Configure the following in GTM:

Domain name: the domain that your business exposes to external users.
Address pools: group the elastic IP addresses (EIPs) from each node into separate address pools.
Load balancing policy: set how GTM selects addresses within a pool — for example, by weight.
Address working mode: choose Intelligently Returned or Always Online for each address.
Health check: define the protocol and port GTM uses to probe each address.
Access policy: configure intelligent DNS resolution, primary and standby address pool sets, and the switchover policy between them.
- For active/standby disaster recovery: designate one pool as primary and another as standby.
- For active-active load balancing: mark all pools as available address pools.

GTM includes configuration templates — Primary/Secondary Disaster Recovery and Multi-active Load Balancing — to accelerate setup. For more information, see Global Traffic Manager 3.0Global Traffic Manager 3.0.

Internal service failover with DNS for Multicloud Integration

For internal services accessed by domain name, deploy DNS for Multicloud Integration in private mode. It provides all-in-one intelligent resolution services for internal networks, including:

Internal DNS high availability: primary/secondary deployment across nodes. If the primary DNS fails, the cross-node secondary DNS takes over automatically.
Internal service routing: configure DNS resolution records and policies to implement active-active or active/standby routing for internal services.
Intelligent resolution for both internal and external domain names
Disaster recovery and scheduling of primary/secondary data centers
Unified management of Alibaba Cloud DNS
Replaces common open-source DNS services

Cross-node data disaster recovery

Images

Edge cloud system images are stored in Alibaba Cloud central cloud, inheriting its high availability and reliability. To make a custom image available across all edge nodes, create a custom image from an ENS instance. When deploying the same service on another node, pull the image from the central cloud to create a new ENS instance.

For more information, see Images.

Snapshots

In addition to local disk backups, ENS supports cross-node snapshot replication. Replicate snapshots from one node to another so that if a node fails, you can restore your business and data on a different node.

Databases

ENS does not provide native database services. Deploy your own database and protect it using one of the following approaches:

Use the database engine's built-in replication or disaster recovery mechanism to synchronize data across nodes.
Use ENS snapshots to back up database data to other edge nodes.

Shared responsibilities

High availability on the edge cloud is a shared responsibility:

Alibaba Cloud maintains the stability of ENS and ensures availability meets the service level agreement (SLA).
You design your application architecture to support failover and ensure business continuity when failures occur.

Implement the solutions described in this topic to meet your availability targets.