Redis is a popular in-memory database used in a wide variety of business scenarios. The open-source version of Redis provides a distributed Redis cluster solution that can be used to improve memory capacity storage and high-speed performance. The cluster architecture inevitably involves elastic scaling of data node shards and data migration between shards. However, the data migration capability of the Redis Community Edition cluster has always been a pain point for developers and O&M personnel.
In order to overcome the shortcomings of data migration in Redis Community Edition, Alibaba Cloud developed Tair. Tair has launched a new generation of non-inductive data migration architecture based on the principle of slot replication, which has provided long-term and stable services on Alibaba Cloud.
Tair is a cloud-native in-memory database developed by Alibaba Cloud, fully compatible with Redis and provides rich data models and enterprise-level capabilities to help customers build real-time online scenarios. At the same time, Tair combines with a new type of storage media named persistent memory, which reduces the cost by more than 30% compared to memory. Also, Tair can realize persistent data and provide the performance similar to the memory. At present, Tair has been widely used for customers in various industries such as finance, manufacturing, healthcare, and internet to meet customers' requirements for high-speed query and computing scenarios.
This article describes the technology of data migration of open-source Redis Community Edition, the early enhancement and improvement of Alibaba Cloud Tair for Redis clusters for community data migration, and principle of the evolution to a new generation of data migration based on slot replication.
The open-source Redis Cluster (7.0) uses a distributed architecture without central control nodes. The cluster topology and other meta-information are transferred between nodes through the gossip protocol. The cluster topology takes slot units as the smallest set of data, and each node belongs to a part of slots. Data migration is to move slots between nodes.
The open-source Redis cluster migrates slots by using Per Key that refers to the migration of a single slot by traversing and migrating some keys.
This process also introduces obvious problems:
The enhancement and improvement of Per Key data migration of Tair for Redis mainly focus on improving synchronous blocking, mitigating big key migration blocking, and shortening migration interaction time, so as to increase migration speed and reduce the loss-aware impact of business. Specific improvement points are described below.
The source node performs the following three operations when executing the native migrate command:
The three operations are executed in one synchronous blocking command, thus causing other business requests to be inaccessible for a long time during the execution.
Tair for Redis uses the exclusive kernel migration state machine to split these three steps into three asynchronous processes to reduce the duration of synchronous blocking.
If Per Key is migrated to a big key, the time consumed for the three phases of dumping keys, transferring payload, and restoring keys will increase accordingly. Even asynchronization cannot reduce the impact on time consumed in a single operation.
Before the Tair for Redis migration, it needs to determine whether the migrated key belongs to a big key. The big key is decomposed into chunks. The three stages of dumping keys, transferring payload, and restoring keys are completed by chunk. The time consumed in each stage is reduced, which can effectively reduce the impact of access blocking on other keys, but the overall time consumed in migration of the big key is prolonged.
The optimization and improvement cannot solve all the pain points. Here are pain points that still exist:
After continuous exploration and evolution, Alibaba Cloud Tair for Redis cluster has introduced a new generation of data migration technology based on slot replication.
The primary and secondary Redis instances synchronize data through data replication. Based on the technology of primary/secondary replication, Tair for Redis has derived slot replication (Slot Mig) in its kernel. In general, it is to copy a part of slots between two nodes for data synchronization, and add the central controller (CS) to accurately and dynamically control the millisecond-level Slot Wait write prohibition technology. When the cluster topology is switched, the client requests are redirected to the new data node through moved semantics. Thus, the read and write requests of business are lossless, and the migration process does not affect the user's business.
As mentioned above, here are the benefits of the entire migration process:
Let's summarize the features of the new generation of non-inductive data migration of Alibaba Cloud Tair for Redis cluster and the data migration of the open-source Redis Community Edition.
The non-inductive data migration technology of Alibaba Cloud Tair for Redis cluster has already applied to the feature of specification changes (scale-out or scale-in nodes) on Alibaba Cloud. You are welcome to purchase and try it.
Alibaba Cloud Community - June 10, 2022
ApsaraDB - November 28, 2022
ApsaraDB - August 12, 2020
ApsaraDB - December 8, 2022
ApsaraDB - October 16, 2020
Alibaba Container Service - April 28, 2019
A unified, efficient, and secure platform that provides cloud-based O&M, access control, and operation audit.Learn More
A key value database service that offers in-memory caching and high-speed access to applications hosted on the cloudLearn More
Tair is a Redis-compatible in-memory database service that provides a variety of data structures and enterprise-level capabilities.Learn More
TSDB is a stable, reliable, and cost-effective online high-performance time series database service.Learn More
More Posts by ApsaraDB