All Products
Search
Document Center

ENS:High availability solutions for edge cloud

Last Updated:Apr 01, 2026

Edge Node Service (ENS) lets you run workloads close to end users, but proximity alone does not guarantee business continuity. This topic explains the HA capabilities available on the edge cloud and shows how to build a highly available application within a single node and across multiple nodes.

Choose a strategy

ENS supports two complementary high availability (HA) patterns. Choose based on your failure scope and recovery objectives:

PatternUse this when...How it works
Local high availabilityInstance-level failures or traffic spikes are your primary concernDistributes traffic and data across multiple instances on the same edge node
Cross-node high availabilitySingle-node or single-region outages are a risk you must surviveReplicates services and data across two or more edge nodes

Use both patterns together for comprehensive coverage.

Local high availability

Local HA protects your workload from instance-level failures and traffic spikes within a single edge node.

Disaster recovery for multiple compute instances

To withstand heavy loads and eliminate single points of failure (SPOFs), distribute your workload across multiple ENS instances and use Edge Load Balancer (ELB) to manage traffic.

Traffic distribution

Deploy your business on multiple ENS instances and configure ELB to distribute incoming traffic across them.

Health checks and failover

ELB continuously monitors backend ENS instances using health checks. The full failover lifecycle works as follows:

  1. ELB detects an unhealthy instance.

  2. ELB stops routing new requests to that instance and redirects traffic to the remaining healthy instances.

  3. Once the instance recovers, ELB automatically resumes sending requests to it (failback).

image
For more information about ELB, see ELB.

High-availability virtual IP addresses (HAVIPs)

Some applications require a stable IP address for external services. For example, Keepalived and Heartbeat use ARP (Address Resolution Protocol) to announce IP addresses and keep them stable during failover.

HAVIPs enable the same pattern on the edge cloud. A HAVIP floats between instances on the same node, so the private IP address exposed to external services stays unchanged even when the active instance fails.

For more information, see What is an HAVIP?
image

Data disaster recovery

ENS provides three storage options — disks, NAS (Network Attached Storage), and Edge Object Storage (EOS) — each with built-in redundancy.

Multi-replica redundancy

Within a node, data on disks and NAS is stored in three replicas across the storage cluster, providing availability and reliability of at least 99.9999%. EOS uses erasure coding for data durability and availability.

Snapshots

Create snapshots to back up disk data at a point in time. Snapshots capture the current state of all data blocks on a disk. Use them to:

  • Restore data after accidental deletion or corruption

  • Build development and testing environments that mirror production

  • Create custom images for batch deployment across nodes

image

Cross-node high availability

A single edge node is still a potential point of failure. To survive node-level or region-level outages, deploy your workload on two or more edge nodes and configure failover between them.

The overall approach is:

  1. Plan your topology: deploy your business system on multiple nodes using active-active or active/standby mode.

  2. Connect the nodes: use Edge Network Acceleration (ENA) to build a secure internal network between nodes.

  3. Route traffic: use Alibaba Cloud DNS to direct public traffic and DNS for Multicloud Integration for internal service failover.

  4. Replicate data: replicate images and snapshots across nodes so you can restore on any node.

image

Cross-node network connectivity

ENA connects virtual private clouds (VPCs) across different regions and edge nodes over a high-speed, secure private network. It supports:

  • Accelerated connections between edge nodes

  • Accelerated connections between data centers

  • Accelerated connections between the internal network and Alibaba Cloud central cloud

  • Accelerated connections between different types of public clouds

For details, visit the ENA product page.

Cross-node service high availability

Choose a deployment mode

Before configuring traffic management, select the mode that fits your requirements:

ModeUse this when...Trade-off
Active-activeYou want all nodes to handle traffic simultaneously and can tolerate partial capacity loss if one node failsMaximum throughput; no idle capacity
Active/standbyYou need a clear, deterministic recovery path and can accept the standby node being idle during normal operationSimpler failover logic; standby capacity is reserved

Public traffic failover with Global Traffic Manager

Global Traffic Manager (GTM), provided by Alibaba Cloud DNS, controls how public DNS resolves your domain across nodes. Configure the following in GTM:

  1. Domain name: the domain that your business exposes to external users.

  2. Address pools: group the elastic IP addresses (EIPs) from each node into separate address pools.

  3. Load balancing policy: set how GTM selects addresses within a pool — for example, by weight.

  4. Address working mode: choose Intelligently Returned or Always Online for each address.

  5. Health check: define the protocol and port GTM uses to probe each address.

  6. Access policy: configure intelligent DNS resolution, primary and standby address pool sets, and the switchover policy between them.

    • For active/standby disaster recovery: designate one pool as primary and another as standby.

    • For active-active load balancing: mark all pools as available address pools.

GTM includes configuration templates — Primary/Secondary Disaster Recovery and Multi-active Load Balancing — to accelerate setup. For more information, see Global Traffic Manager 3.0Global Traffic Manager 3.0.

Internal service failover with DNS for Multicloud Integration

For internal services accessed by domain name, deploy DNS for Multicloud Integration in private mode. It provides all-in-one intelligent resolution services for internal networks, including:

  • Internal DNS high availability: primary/secondary deployment across nodes. If the primary DNS fails, the cross-node secondary DNS takes over automatically.

  • Internal service routing: configure DNS resolution records and policies to implement active-active or active/standby routing for internal services.

  • Intelligent resolution for both internal and external domain names

  • Disaster recovery and scheduling of primary/secondary data centers

  • Unified management of Alibaba Cloud DNS

  • Replaces common open-source DNS services

Cross-node data disaster recovery

Images

Edge cloud system images are stored in Alibaba Cloud central cloud, inheriting its high availability and reliability. To make a custom image available across all edge nodes, create a custom image from an ENS instance. When deploying the same service on another node, pull the image from the central cloud to create a new ENS instance.

For more information, see Images.

Snapshots

In addition to local disk backups, ENS supports cross-node snapshot replication. Replicate snapshots from one node to another so that if a node fails, you can restore your business and data on a different node.

Databases

ENS does not provide native database services. Deploy your own database and protect it using one of the following approaches:

  • Use the database engine's built-in replication or disaster recovery mechanism to synchronize data across nodes.

  • Use ENS snapshots to back up database data to other edge nodes.

Shared responsibilities

High availability on the edge cloud is a shared responsibility:

  • Alibaba Cloud maintains the stability of ENS and ensures availability meets the service level agreement (SLA).

  • You design your application architecture to support failover and ensure business continuity when failures occur.

Implement the solutions described in this topic to meet your availability targets.