Global Distributed Cache for Redis, also known as Global Replica, is an active geo-redundancy database system that is developed by Alibaba Cloud based on ApsaraDB for Redis. Global Distributed Cache for Redis supports business scenarios in which multiple sites in different regions provide services at the same time. It helps enterprises replicate the active geo-redundancy architecture of Alibaba.

Background information

As your business rapidly grows to cover a wide range of areas, the architecture of cross-region and long-distance access causes a high latency that affects user experience. The Global Distributed Cache for Redis feature of Alibaba Cloud can help you reduce the high latency caused by cross-region access. Global Distributed Cache for Redis has the following benefits:

  • You can directly create child instances or specify the child instances that need to be synchronized without the need to implement redundancy in your business logic. This greatly reduces the complexity of business design and allows you to focus on the development of upper-layer business.
  • The geo-replication capability is provided for you to implement geo-disaster recovery or active geo-redundancy.

This feature applies to cross-region data synchronization scenarios and global business deployment in industries such as multimedia, gaming, and e-commerce.

Scenarios

Scenario Description
Active geo-redundancy In the active geo-redundancy scenario, multiple sites in different regions provide services at the same time. Active geo-redundancy is a type of high-availability architecture. The difference from the traditional disaster recovery design is that all sites provide services at the same time in the active geo-redundancy architecture. This means that applications can be connected to nearby nodes.
Disaster recovery Global Distributed Cache for Redis can synchronize data among child instances in a two-way manner. This means that the feature is applicable to scenarios such as zone-disaster recovery, three data centers across two regions, and three-region disaster recovery.
Load balancing In specific scenarios such as large promotional events, the ultra-high queries per second (QPS) and a large amount of access traffic are predicted to occur. In such scenarios, you can balance load among child instances to overcome the load limit of a single instance.
Data synchronization Global Distributed Cache for Redis can synchronize data among child instances in a distributed instance in a two-way manner. This means that the feature can be used in scenarios such as data analysis and tests.

Billing

You are not charged for creating a distributed instance. Only child instances in the distributed instance are billed based on the billing standards for regular ApsaraDB for Redis instances. For more information, see Billable items and prices.

Supported instance series

Performance-enhanced instances of ApsaraDB for Redis Enhanced Edition (Tair)

Architecture

Architecture of Global Distributed Cache for Redis

In the architecture of Global Distributed Cache for Redis, a distributed instance is a logical collection of distributed child instances and synchronization channels. Data is synchronized in real time among the child instances by using the synchronization channels. A distributed instance consists of the following components:

Child instances
A child instance is the basic service unit that constitutes a distributed instance. Each child instance is an independent ApsaraDB for Redis instance. All child instances are readable and writable. Data is synchronized in real time among child instances in a two-way manner. A distributed instance supports geo-replication. You can create child instances in different regions to implement geo-disaster recovery or active geo-redundancy.
Note A child instance must be a performance-enhanced instance of ApsaraDB for Redis Enhanced Edition (Tair).
Synchronization channels
A synchronization channel is a one-way link that is used to synchronize data in real time from one child instance to another. Two opposite synchronization channels are required to implement two-way replication between two child instances.
Note In addition to append-only files (AOFs) supported by native Redis, Global Distributed Cache for Redis also uses information such as server-id and opid to synchronize data. Global Distributed Cache for Redis transmits binlogs over synchronization channels to synchronize data.
Channel manager
The channel manager manages the lifecycle of synchronization channels and handles exceptions that occur in child instances, such as a switchover between the primary and secondary databases and the rebuilding of the secondary database.

Benefits

Benefit Description
High synchronization reliability
  • Supports resumable upload and tolerates day-level synchronization interruptions. This avoids the limits of the native Redis architecture for incremental synchronization across data centers or regions.
  • Exceptions that occur in child instances, such as a switchover between the primary and secondary databases and the rebuilding of the secondary database, are automatically handled.
High synchronization performance
  • High throughput
    • For child instances in the standard architecture, a synchronization channel supports up to 50,000 transactions per second (TPS) in one direction.
    • For child instances in the cluster or read/write splitting architecture, the throughput can linearly change with the number of Redis shards or nodes.
  • Low latency
    • For synchronization between regions in the same continent, the latency ranges from 100 milliseconds to seconds, and the average latency is about 1.2 seconds.
    • For synchronization between regions in different continents, the latency is about 1 to 5 seconds. The latency is determined by the throughput and round-trip time (RTT) of links.
High synchronization correctness
  • Binlogs are synchronized to the peer instance in the order in which they are generated.
  • Backloop control is supported to prevent binlogs from being synchronized in a loop.
  • The exactly once mechanism is supported to ensure that synchronized binlogs are executed only once.

Comparison between Global Distributed Cache for Redis and the two-way synchronization solution of DTS

The following figure compares Global Distributed Cache for Redis with the two-way synchronization solution of Data Transmission Service (DTS) in the scenario where data is synchronized in one direction. The following figure also shows the phases that cause latency.

Figure 1. Architecture comparison for data synchronization in one direction
Architecture and latency comparison for data synchronization in one direction

The overall performance of Global Distributed Cache for Redis is better than that of the two-way synchronization solution of DTS. The following table describes the compared items.

Item Global Distributed Cache for Redis Two-way synchronization solution of DTS
Cost You are not charged for creating a distributed instance. Only child instances in the distributed instance are billed based on the billing standards for regular ApsaraDB for Redis instances. For more information, see Billable items and prices. You are charged for the data synchronization link. For more information, see DTS pricing.
Latency The latency is consistent with small fluctuations.

The replicator in the T2 phase shown in the preceding figure has independent resources. If a large amount of data is written to the source, the replicator can still obtain the data to be synchronized in a fast manner. The latency in the T1 and T2 phases is fixed at about 400 milliseconds in most cases.

The latency fluctuates based on the amount of data that is written to the source.

Binlogs are accumulated in the T1 phase shown in the preceding figure. The T2 phase does not have a Service Level Agreement (SLA) guarantee. If the amount of data that is written to the source is large, the latency in the T1 phase increases from 10 milliseconds to 400 milliseconds, or even several seconds. The affects the efficiency of pulling data and the performance of the entire link.

Number of synchronization destinations Data can be synchronized among up to three instances. Data can be synchronized among more instances. DTS can synchronize data from one instance to multiple instances because the pulled binlogs can be consumed by multiple instances at the same time.
Scenarios This feature is suitable for customers that need to write a large amount of data to the source and have high requirements on the average latency. For example, customers can use this feature to implement active geo-redundancy or modular business.
Note Cross-region synchronization is affected by the latency of carrier networks. We recommend that you configure your business system to write data to multiple sources at the same time if your business needs to support instant response.
This solution is suitable for scenarios in which only a small amount of data is written and data is read from nearby nodes, such as the cache update scenario.