All Products
Search
Document Center

ApsaraMQ for MQTT:Cross-region disaster recovery for instances

Last Updated:Dec 17, 2025

You can use the disaster recovery feature provided by ApsaraMQ for MQTT to switch the traffic of an instance to a secondary instance in disaster scenarios such as network fluctuations and data center failures. This helps improve the anti-risk capability and availability of ApsaraMQ for MQTT. This topic describes the background, working mechanism, and permissions of this feature. This topic also describes how to switch traffic during disaster recovery.

Background information

Introduction

The disaster recovery feature lets you create ApsaraMQ for MQTT instances in multiple regions. When exceptions occur in one of the instances, you can switch its traffic to an instance in another region for service availability.

Positioning

The disaster recovery feature of ApsaraMQ for MQTT addresses only the issue of permission compatibility during the traffic switch of instances. To synchronize the metadata and permission data, you must create the same metadata and configure the same permissions on the secondary instance. Messages on each instance are isolated. When you send messages to a device, the backend server checks the route table to determine the instance to which the messages are forwarded.

Limits

You can use the disaster recovery feature only if you use ApsaraMQ for MQTT Enterprise Platinum Edition instances. To use the feature, submit a ticket.

Working mechanism

You want to switch the traffic between two ApsaraMQ for MQTT instances that you purchased in the China (Hangzhou) and China (Shanghai) regions.

image
  1. The edge device connects to the cloud-based ApsaraMQ for MQTT instance using the endpoint for the corresponding region or the endpoint for global disaster recovery.

  2. The cloud-based ApsaraMQ for MQTT instance is compatible with the device parameters. By default, the cloud-based ApsaraMQ for MQTT instance is mapped to another one in the region where the edge device resides to provide services. Note that both instances must belong to the same Alibaba Cloud account.

  3. You must deploy the backend server in multiple regions and enable virtual private cloud (VPC) connection for it using a service such as Cloud Enterprise Network (CEN). To save a global route table, the backend server must subscribe to messages in ApsaraMQ for MQTT instances in all regions, especially device status notifications. You can use the device status notifications to determine the ApsaraMQ for MQTT instance to which the device is connected. When you send messages to the device, the backend server queries the route table to determine the connected ApsaraMQ for MQTT instance, then pushes messages to the device using the instance.

  4. Each ApsaraMQ for MQTT provides an internal IP address that can be used to connect VPCs in different regions.

Note
  • During instance switchover, only the public endpoint, also known as the virtual IP address (VIP), is switched.

  • ApsaraMQ for MQTT does not impose a limit on the number of instances or regions that are involved in a disaster recovery task. You can use three or more instances based on your needs.

Permission compatibility

User

The disaster recovery feature of ApsaraMQ for MQTT does not support data synchronization, including permission data. As a result, you must configure the same permissions on the involved instances, including permissions on topics and groups.

image

ApsaraMQ for MQTT

ApsaraMQ for MQTT provides services based on regions and instances. By default, parameters, such as instance ID, username, and password, are configured on an ApsaraMQ for MQTT device. During a domain name switchover, compatibility is required because these parameters cannot be identified by the ApsaraMQ for MQTT instance in another region. The following figure demonstrates how to switch the instance ID from the primary to the secondary instance. After the instance ID is switched from Instance A to Instance B, the device will use the context of Instance B for connection and message receiving.

image

Disaster recovery domain names

In normal cases, a device uses the instance domain name to access an instance in the nearest region. Each instance domain name corresponds to an instance. When an instance switchover is required, you can point the domain name of the primary instance to the VIP of a secondary instance without affecting the device. A global disaster recovery domain name is also provided. When an exception occurs in the primary instance, the system automatically removes the VIP of the primary instance. If you do not want to access an instance in the nearest region, use the global disaster recovery domain name.

image

Switchover methods

API switching

ApsaraMQ for MQTT provides the DisasterDowngrade and DisasterRecovery API operations for you to manually perform instance switchover. During the switchover, the domain name and VIP pointing of the primary instance are changed. Meanwhile, the VIP of the primary instance is removed from the global disaster recovery domain name.

Inspection switching

To improve the reliability of the disaster recovery architecture, ApsaraMQ for MQTT provides a global inspection mechanism to check whether disaster recovery is required for an instance. If yes, the system automatically switches the traffic of the instance to another instance.

Device connection

A device uses the DNS-resolved VIP to access an instance, regardless of the instance status. Therefore, the device is not affected during the instance switchover. If you want to forcibly perform instance switchover, you must actively disconnect the device from the primary instance and switch to the new VIP after the VIP is once again resolved. Note that a VIP is resolved again only after the DNS cache expires.