This topic lists the drawbacks of the scaling solutions available for open source Redis clusters and ApsaraDB for Redis cluster instances and describes the imperceptible scaling solution available for ApsaraDB for Redis Enhanced Edition (Tair) cluster instances.

Requirements for elastic scaling of data nodes and migration of data between shards are often involved with open source Redis clusters. However, these common scaling solutions have issues such as slow key migration, unavailability of commands that involve multiple keys, inability to migrate Lua scripts, freezing or even high availability switchover triggered by large key migration, and complex rollback process.

In response to these issues, ApsaraDB for Redis Enhanced Edition (Tair) developed a new architecture based on slot replication to ensure imperceptible data migration. This architecture optimizes the thread-scheduling algorithm within instances to allow instances to be managed in an efficient and accurate manner. This is the principle of imperceptible scaling. This topic lists the drawbacks of the scaling solutions available for open source Redis clusters and ApsaraDB for Redis cluster instances and describes the imperceptible scaling solution available for ApsaraDB for Redis Enhanced Edition (Tair) cluster instances.

Common scaling solutions for open source Redis clusters and their drawbacks

  • Elastic scaling of open source Redis clusters

    Open source Redis clusters use gossip protocols to transfer data. During data transfer, a slot is migrated as the smallest dataset by traversing and migrating keys in the slot. This scaling solution has the following issues:
    • Low stability
      • Commands that involve multiple keys in the same slot may fail to run because data is being migrated by key.
      • Lua scripts cannot be replicated at the same time when data is migrated. As such, Lua scripts may be lost after the data migration is complete.
      • When data is replicated, migration of large keys may cause freezing or even errors that may trigger high availability switchover.
    • High O&M difficulty
      • If an error is returned during data migration, you must manually restore the database data. This process is difficult, time-consuming, and error-prone.
      • It takes a long time to scale clusters because migrating data by key is time-consuming. As such, scaling is often performed during off-peak hours, and business may be affected.
  • Scaling based on data synchronization and migration components

    This solution involves middleware components rather than open source Redis clusters to migrate data. For example, to perform scaling operations, you can create a cluster, use a middleware component to migrate data to the cluster, and then use a load balancer component to switch access paths. This scaling solution has the following issues:
    • It takes a long time to synchronize full data.
    • The costs are high because you must create two sets of resources to perform scaling operations.
    • Clients are disconnected when a load balancer component performs a switch. It takes a long time for the switch to take effect, and the service may be unavailable for up to 10 seconds.

Imperceptible scaling solution for ApsaraDB for Redis cluster instances and its drawbacks

The imperceptible scaling solution available for ApsaraDB for Redis cluster instances can address the preceding issues of open source Redis clusters. However, the solution also has the following issues:
Note This solution supports ApsaraDB for Redis cluster instances that use cloud disks.
  • Data migration performed during scaling operations affects instances, and instances may become read-only for a few seconds.
    Note Data migration needs may arise when you perform scaling operations. During data migration, when the proportion of synchronized incremental data to the total amount of incremental data to be synchronized is less than a threshold, the instance becomes read-only. After the remaining incremental data is synchronized, clients can re-connect to the instance by using a new endpoint. If an update request is sent to the instance while the instance is in the read-only state, the request is rejected and a read-only error is returned. Read-only error responses cannot be smoothly handled, which may affect your business.
  • Data migration performed during scaling operations competes for resources with regular operations. As such, scaling operations are often performed during off-peak hours and become less flexible.

Imperceptible scaling solution for ApsaraDB for Redis Enhanced Edition (Tair)

ApsaraDB for Redis Enhanced Edition (Tair) cluster architecture is developed based on the new management architecture of ApsaraDB for Redis. The cluster architecture uses centralized components to manage instances in an efficient and accurate manner. This allows the architecture to implement imperceptible scaling.

Note This solution supports performance-enhanced instances that use cloud disks and persistent memory-optimized instances that use cloud disks of the ApsaraDB for Redis Enhanced Edition (Tair). For more information, see Performance-enhanced instances and Persistent memory-optimized instances.
Imperceptible scaling solution of ApsaraDB for Redis Enhanced Edition (Tair)
This solution has the following benefits:
  • Imperceptible scaling

    When an ApsaraDB for Redis Enhanced Edition (Tair) cluster instance is being scaled, your clients are not affected, your business is not interrupted, and the instance does not stay in the read-only state. You can scale an ApsaraDB for Redis Enhanced Edition (Tair) cluster instance at any time.

    The key to imperceptible scaling is reducing the read-only time period on instances during data migration. ApsaraDB for Redis Enhanced Edition (Tair) cluster instances dynamically estimate the amount of time required to migrate remaining incremental data and keep the read-only time period within milliseconds. This theoretically prevents instances from entering the read-only state because this read-only time period is far less than the TCP retransmission time that is measured in hundreds of milliseconds. When an instance stays in the read-only state, the write requests made for the keys to be migrated are not written but cached to the instance. After the data migration is complete, clients receive redirection messages. At the same time, the management system and the database engine work together to update instance information as soon as the data migration is complete. This process ensures that scaling operations are imperceptible to clients.

  • Smooth scaling

    ApsaraDB for Redis Enhanced Edition (Tair) optimizes the thread-scheduling algorithm within cluster instances to implement fine-grained management of data migration tasks. This improves thread execution efficiency from 10% to a maximum of 80%. You can specify a custom efficiency value within this range. This way, the data migration speed is maximized without impacting your business. ApsaraDB for Redis Enhanced Edition (Tair) cluster instances also support fine-grained scaling without increasing the reaction time (RT) to prevent high availability switchover caused by network jitter. This ensures high data reliability.

  • Efficient and easy O&M

    ApsaraDB for Redis Enhanced Edition (Tair) cluster instances can address scaling issues of open source Redis clusters by using the following methods:

    • Pre-backup in the background: Pre-backup in the background can be implemented for instances. This method does not affect online services, and the full data of your instance can be replicated in advance. This prevents freezing caused by large key migration.
    • Rollback with a few clicks: You can roll back instances with a few clicks if exceptions occur during scaling.
    • Data migration by slot: Data can be migrated by slot. This ensures that commands that involve multiple keys in the same slot can run as expected.
    • Lua script replication: During data migration, Lua scripts can be replicated to prevent Lua script loss.
    • Horizontal scaling: Up to 256 shards can be added to or deleted from a single instance.
  • Low costs

    Compared with solutions that require a middleware component, this solution reduces costs because you do not need to create two sets of resources.