Compared with open source Apache RocketMQ, Alibaba Cloud ApsaraMQ for RocketMQ is more stable and secure and provides a more comprehensive O&M system. You can migrate Apache RocketMQ clusters to ApsaraMQ for RocketMQ instances to improve your service experience. This topic describes the migration solution provided by ApsaraMQ for RocketMQ and the working mechanism of the migration solution.

Comparison between Apache RocketMQ and ApsaraMQ for RocketMQ

Compared with Apache RocketMQ, ApsaraMQ for RocketMQ has advantages in terms of technical architecture, scalability, O&M, and enterprise-level capabilities. The following table provides a comparison between Apache RocketMQ and ApsaraMQ for RocketMQ.

ItemSelf-managed Apache RocketMQ clusterApsaraMQ for RocketMQ 5.x instance
Storage scalabilityNo resource pool is provided. A coupled storage-computing architecture is used. Large-scale resource pools are provided based on the cloud infrastructure of Alibaba Cloud. Storage is decoupled from computing.
API and SDK accessApache RocketMQ SDKs are supported.
  • Apache RocketMQ SDKs are supported.
  • Alibaba Cloud ONS SDKs are supported.
Technical architecture
  • The technology is implemented by using local disks.
  • Storage cannot be scaled. If the storage space is insufficient, the system clears data in advance.
  • Multiple replicas lead to high storage costs.
  • The technology is implemented by using serverless cloud storage.
  • Storage space can be used based on your business requirements.
  • The storage fee is calculated based on the pay-as-you-go billing method. If the number of replicas in a ApsaraMQ for RocketMQ instance and an Apache RocketMQ cluster is the same, the costs for the ApsaraMQ for RocketMQ instance are only a third of the costs for the Apache RocketMQ cluster.
Computing scalability
  • The number of machines is planned based on cluster usage.
  • You must reserve computing resources in advance. Computing resources are difficult to scale in.
  • Computing scalability is not supported for scenarios in which bursty traffic is required due to limits on scaling speed.
  • Computing resources are scalable based on the resource pools that are provided by Alibaba Cloud.
  • Planned scaling: You can upgrade or downgrade the edition of an instance based on your business requirements. The upgrade or downgrade takes effect in a few minutes.
  • Unplanned scaling: The elastic transactions per second (TPS) feature is provided. The feature allows you to use a specific amount of TPS when the specification limit is exceeded. This helps you save costs because you do not need to reserve a large specification to handle infrequent bursty traffic.
O&M complexity
  • O&M is performed by running commands, which leads to higher costs and risks.
  • No observability and monitoring systems are provided.
  • Fully-managed Platform as a Service (PaaS) allows you to deploy, operate, and manage resources without using machines.
  • The service is provided for immediate usage. The service supports features such as dashboard diagnostics, tracing, monitoring, and alerting.
Stability guaranteeO&M is performed by users and requires the assistance of senior technical support personnel. Service Level Agreements (SLAs) in which service capabilities are guaranteed is provided.
  • Data reliability: up to 10 nines.
  • Service availability: up to 99.99%.
Enterprise-level capabilitiesEnterprise-level capabilities are developed by users and require the assistance of senior technical support personnel. The service can be used out-of-the-box and provides capabilities such as end-to-end canary release, message routing, message replication, extract, transform, load (ETL), event integration, and event analysis.
Systematic disaster recoveryO&M is performed by users and requires the assistance of senior technical support personnel. The following disaster recovery solutions are provided:
  • Zone active-active
  • Geo-disaster recovery
  • Remote active-active

Working mechanism

Basic requirements

In most cases, Apache RocketMQ is used in core systems such as orders and online payment. The upstream and downstream applications of messaging services have high requirements on the stability of these services. We recommend that you take necessary precautions when you migrate and replace Apache RocketMQ clusters. The following items describe the basic requirements for designing a migration solution:
  • Uninterrupted messaging services

    Make sure that the upper-layer messaging applications are not affected, and no large number of errors and failures are reported during migration.

  • No large number of duplicated messages

    Make sure that a large number of duplicated messages do not exist during migration. This way, users do not need to process duplicated messages in the system.

  • No noticeable latency in messaging

    Make sure that no noticeable changes in end-to-end messaging latency occur during migration. This way, you can receive messages even during migration.

Solution design

To meet the preceding migration requirements, ApsaraMQ for RocketMQ provides a migration tool that allows you to efficiently migrate your self-managed Apache RocketMQ cluster to a ApsaraMQ for RocketMQ instance without affecting your business. You can use the tool to migrate messages and metadata such as topics, groups, and consumption progress.

  • Metadata migration: The migration tool reads metadata in a self-managed Apache RocketMQ cluster and copies the metadata to a ApsaraMQ for RocketMQ instance. This way, metadata is created and synchronized.
  • Message migration: The migration tool can control the routing information about topics at the backend to dynamically switch between the read and write traffic of clients by using the built-in routing control component of ApsaraMQ for RocketMQ. The traffic switch does not affect your business. Message migration

    Assume that Topic A in the source cluster in the preceding figure has eight read partitions and eight write partitions.

    After you create the metadata migration task, a topic that is named Topic A is also created in the destination instance. Topic A in the destination instance has the same number of read partitions and write partitions as Topic A in the source cluster.

    During message migration, the routing control component of ApsaraMQ for RocketMQ can control the routing information about the source cluster and destination instance and returns read and write information about the partitions to the clients based on the migration stage. Examples:
    • Read operations in the source cluster and destination instance: In this scenario, the information about the 16 read partitions in the source cluster and destination instance are returned.
    • Write-only operations in the destination cluster: The information about the eight write partitions in the destination cluster is returned.

Benefits

The migration solution provided by ApsaraMQ for RocketMQ is implemented based on the metadata proxy component for routing messages. This component is developed by Alibaba Cloud and supports topic-specific message routing, messaging scheduling, and traffic switch management. The migration solution provides the following benefits:

  • Uninterrupted messaging services

    The migration solution supports traffic switching without affecting your business. During the traffic switching, messages can be sent and received. In rare cases, message latency and duplication occur.

  • No additional resources required

    You do not need to scale out your cluster or deploy multiple clusters. You need only to configure parameters to perform updates or changes.

  • Very low impacts on business and easy implementation

    You only need to manually change the endpoint of the application and restart the application once. The subsequent traffic switch takes effect based on the dynamic configurations on the ApsaraMQ for RocketMQ broker. You can upgrade the upstream and downstream applications of the messaging service without sorting out messaging dependencies. The operation has very low impacts on your business. This way, the upstream and downstream applications can easily cope with the migration.

  • Support for canary release and rollback

    Migration tasks are accurate to the topic level. You can perform a topic-specific canary release. If business risks arise during migration, you can roll back.