Frequently queried keys in Redis are hotkeys. If not taken care of, hotkeys may cause serious services problems. You can find the solutions for hotkey problems in this topic.



There are two causes of hotkey problems:

  • The size of user consumption data is much greater than that of production data, as in the cases of hot sale items, hot news, hot issue comments, and celebrity broadcasts.

    Hotkey problems tend to occur unexpectedly, for example, the sales price promotion of popular commodities during Double 11 Shopping Festival. When one of these commodities is browsed or purchased tens of thousands of times, a large number of requests occur, which causes a hotkey problem. Similarly, hotkey problems tend to occur in scenarios where there are more writes than reads. For example, hot news, hot issue comments, and celebrity broadcasts.

  • In these cases, the hotkey access is much higher than the access of other Redis keys. Therefore, most of the access traffic is centralized to a specific Redis instance, and the Redis instance may reach a performance bottleneck.

    When a piece of data is accessed on the server, the data is usually split or sliced. During this process, the corresponding key is accessed on the server. When the access traffic exceeds the performance threshold of the server, the hotkey key problem occurs.

Impact of hotkey problems

  • The traffic is concentrated and reaches the upper limit of the physical network adapter.
  • Too many requests queue up, crashing the sharding service of the cache.
  • The database is overloaded. A service avalanche occurs.

As mentioned above, when the number of hotkey requests on a server exceeds the upper limit of the network adapter on the server, the server stops providing other services due to the excessive concentration of traffic. If the distribution of hotspots is too dense, a large number of hotkeys are cached. When the cache capacity is exhausted, the sharding service of the cache crashes. After the cache service crashes, the newly generated requests are cached on the background database. Due to its poor performance, this database is prone to exhaustion when handling a large number of requests. The exhaustion of the database leads to a service avalanche and a dramatic downgrading of the performance.

Common solutions

Reconstruct the server or client to improve the performance.

Server cache solution

The client sends requests to the server. The server is a multi-thread service, and a local cache space based on the cache LRU policy is available. When the server is congested, it directly repatriates the requests rather than forwarding them to the database. Only after the congestion is cleared can the server send the requests from the client to the database and re-write the data to the cache. By using this solution, the cache is accessed and rebuilt.

However, this solution has the following problems:

  • Cache building problem of the multi-thread service when the cache fails
  • Cache building problem when the cache is missing
  • Dirty reading problem

"MemCache + Redis" solution

In this solution, a separate cache is deployed on the client to resolve the hotkey problem. The client first accesses the service layer and then the cache layer of the same server. This solution has the following advantages: nearby access, high speeds, and no bandwidth limit. However, it has the following disadvantages:

  • Wasted memory resources
  • Dirty reading problem

Local cache solution

Using the local cache incurs the following problems:

  • Hotspots must be detected in advance.
  • The cache capacity is limited.
  • The inconsistency duration is long.
  • The omission of hotkeys.

If traditional hotkey solutions are all defective, how can the hotkey problems be resolved?

ApsaraDB for Redis solution for hotkey problems

Read/write splitting solution

The following describes the functions of different nodes in the architecture:

  • Load balancing is implemented at the SLB layer.
  • Read/write splitting and automatic routing are implemented at the proxy layer.
  • Write requests are processed by the master node.
  • Read requests are processed by the read-only node.
  • High availability (HA) is implemented on the replica node and the master node.

In practice, the client sends requests to SLB, and SLB distributes these requests to multiple proxies. The proxies identify and classify the requests and distribute them. For example, a proxy sends all write requests to the master node and all read requests to the read-only node. But the read-only node in the module can be expanded to solve the problem of hotkey reading. Read/write splitting supports flexible scaling for hotkey reading and can store a large number of hotkeys. It is client-friendly.

Hotspot data solution

In this solution, hotkeys are discovered and stored to resolve the hotkey problem. The client accesses SLB and distributes requests to a proxy through SLB. Then, the proxy forwards the requests to the background Redis by the means of routing.

A cache is added on the server. Specifically, a local cache is added to the proxy. This cache uses the LRU algorithm to cache hotspot data. A hotspot data calculation module is added to the background database node to return the hotspot data.

The proxy architecture has the following benefits:

  • The proxy caches the hotspot data locally, and its reading capability is horizontally scalable.
  • The database node regularly calculates the hotspot data set.
  • The database feeds the hotspot data back to the proxy.
  • The proxy architecture is completely transparent to the client, and no compatibility is required.

Process hotkeys

Read hotspot data

The processing of hotkeys is divided into two jobs: writing and reading. During the data writing process, SLB receives data K1 and writes it to a Redis database through a proxy. If K1 becomes a hotkey after the calculation conducted by the background hotspot module, the proxy caches the hotspot. In this way, the client can directly access K1 the next time, without using Redis. The proxy can be horizontally expanded, so the accessibility of the hotspot data can be enhanced infinitely.

Discover hotspot data

The database first counts the requests that occur in a cycle. When the number of requests reaches the threshold, the database locates the hotkeys and stores them in an LRU list. When a client attempts to access data by sending a request to the proxy, Redis enters the feedback phase and marks the data if it finds that the target access point is a hotspot.

The database uses the following methods to calculate the hotspots:

  • Hotspot statistics based on statistical thresholds.
  • Hotspot statistics based on the statistical cycle.
  • Statistics collection method based on the version number without resetting the initial value.
  • Calculating hotspots on the database has little impact on the performance and only occupies a small amount of memory.

Comparison of two solutions

The preceding analysis shows that compared with the traditional solutions, Alibaba Cloud has made significant improvements in resolving the hotkey problem. The read/write splitting solution and the hotspot data solution can be expanded horizontally. These two solutions are transparent to the client, though they cannot ensure complete data consistency. The read/write splitting solution supports storing a larger amount of hotspot data, while the proxy-based hotspot data solution is more cost-effective.