All Products
Search
Document Center

Tair (Redis® OSS-Compatible):Large keys and hotkeys

Last Updated:Mar 31, 2025

Large keys and hotkeys may cause service performance degradation, poor user experience, and even system failures. This topic describes how to quickly identify and optimize large keys and hotkeys, analyze their causes and impacts, and provide preventive measures to minimize their impact on your business.

Step 1: Identify large keys and hotkeys

Alibaba Cloud console tools

Tair and Redis provide the top key statistics and offline key analysis features in the console to help you quickly identify large keys and hotkeys.

Method

Limits

Description

Procedure

Use the top key statistics feature (recommended)

This feature is available only for Redis Open-Source Edition instances that run Redis 5.0 or later and Tair (Enterprise Edition) DRAM-based and persistent memory-optimized instances.

  • Displays the top three large keys and hotkeys of each data type in each shard in real time.

  • Allows you to view the historical information of large keys and hotkeys within the last four days.

  1. Log on to the console and go to the Instances page. In the top navigation bar, select the region in which the instance that you want to manage resides. Then, find the instance and click the instance ID.

  2. In the left-side navigation pane, choose CloudDBA > Real-time Key Statistics or Offline Key Analysis.

Use the offline key analysis feature

The offline key analysis feature is unavailable for ESSD/SSD-based instances.

  • Allows you to analyze Redis Database (RDB) backup files in a customized manner. You can view the statistics of keys in an instance, such as the memory usage, distribution, and time-to-live (TTL) of keys.

  • Does not allow for rapid analysis, and takes longer to analyze large RDB files.

  • Unable to parse hotkey information.

If your instance cannot use the preceding features, use the following methods.

Other methods to identify large keys and hotkeys

Method

Advantage and disadvantage

Description

Use the bigkeys, memkeys, and hotkeys parameters in redis-cli.

  • Advantages: This method is convenient, fast, and secure.

  • Disadvantages: This method does not support custom analysis, provides limited precision, and does not allow for rapid analysis. It must iterate through all existing keys in the instance, which may affect the performance of the instance.

The bigkeys, memkeys, and hotkeys parameters provided by redis-cli can retrieve the overall statistics of keys and the top one large key or hotkey of each data type.

Difference between the parameters:

  • bigkeys: collects information about large keys and returns the number of elements for sets or lists.

  • memkeys: collects information about large keys and returns the memory usage of all data types.

  • hotkeys: collects information about hotkeys.

Supported data types: STRING, LIST, HASH, SET, ZSET, and STREAM.

Sample command of bigkeys: redis-cli -h r-***************.redis.rds.aliyuncs.com -a <password> --bigkeys.

Analyze a specific key by using built-in commands

  • Benefits: This method has little impact on online services.

  • Disadvantages: The returned serialized length of a key is not equal to the actual length of the key in the memory. This method provides limited precision and is for reference only.

The following section lists low-risk commands for analyzing keys of various data types to determine whether a key is a large key:

  • For a STRING key, run the STRLEN command. This command returns the length (number of bytes) of a string value stored at the key.

  • For a LIST key, run the LLEN command. This command returns the length of a list value stored at the key.

  • For a HASH key, run the HLEN command. This command returns the number of members in the key.

  • For a SET key, run the SCARD command. This command returns the number of members in the key.

  • For a ZSET key, run the ZCARD command. This command returns the number of members in the key.

  • For a STREAM key, run the XLEN command. This command returns the number of members in the key.

Note

The DEBUG OBJECT and MEMORY USAGE commands consume large amounts of resources when they are run. In addition, the time complexity of these commands is O(N), which indicates that these commands may block instances. Therefore, we recommend that you do not use these commands.

Identify hotkeys at the business layer

  • Advantages: This method can identify hotkeys in a timely and accurate manner.

  • Disadvantages: To implement this method, you must write business code that has increased complexity. In addition, this method may degrade performance.

This method allows you to add code to the business layer to record requests that were sent to instances and asynchronously analyze the collected statistics.

Identify large keys in a customized manner by using the redis-rdb-tools project

  • Advantages: This method supports customized analysis without affecting online services.

  • Disadvantages: This method does not allow for rapid analysis, and it takes longer to analyze large RDB files.

redis-rdb-tools is an open-source tool written in the Python programming language. It supports custom analysis of Redis Database (RDB) files. After you download RDB files, you can analyze the memory usage of all keys in the instance based on your business requirements and query the memory usage.

Identify hotkeys by using the MONITOR command

  • Advantages: This method is convenient and secure.

  • Disadvantages: This method consumes CPU, memory, and network resources, provides limited precision, and does not allow for rapid analysis.

The MONITOR command can display the statistics of all requests related to an instance, including statistics about time, clients, commands, and keys.

In case of an emergency, you can run the MONITOR command and export the output to a file. You can then analyze and classify the requests in the output to identify hotkeys generated during the emergency period after you disable the MONITOR command.

Note

However, the MONITOR command significantly degrades the performance of the instance. We recommend that you use the MONITOR command only in special cases.

Step 2: Optimize large keys and hotkeys

Large keys

Solution

Scenario

Suggestion

Clean up expired data

A large amount of accumulated expired data, such as uncleaned incremental data in HASH keys.

Use a combination of the HSCAN and HDEL commands to clear invalid data. This prevents the instance from being blocked when a large amount of data is cleared.

Compress large keys

Data such as logs and configurations involving compressible data formats such as JSON and XML.

  • Use a compression algorithm such as GZIP or Snappy to initiate compression during serialization.

  • Use a binary serialization protocol, such as Protocol Buffers.

Note

Compression and decompression operations consume additional CPU resources and may affect processing performance.

Split large keys

Frequently accessed data types like HASH and ZSET, such as leaderboards.

  • Split large keys based on business logic, such as user ID and time range.

  • Use sharding keys, such as user:1001:shard1 and user:1001: shard2.

Splitting large keys can effectively avoid data skew.

Dump large keys

STRING-type large files or BLOBs.

Dump unsuitable data to other storage solutions such as Object Storage Services (OSS) and delete the data from the instance.

  • Redis Open-Source Edition 4.0 and later: You can run the UNLINK command to safely delete large keys or super large keys. This command can be used to gradually delete keys from an instance to prevent the instance from being blocked.

  • Redis Open-Source Edition versions earlier than 4.0: You can run the SCAN command to read some data and then delete the data. To prevent an instance from being blocked, we recommend that you do not delete a large number of keys at a time.

Hotkeys

Solution

Scenario

Suggestion

Replicate hotkeys for cluster instances

A hotkey is stored as a whole in a single shard. Requests cannot be distributed by migrating part of the data.

Replicate the hotkey in the data shard to generate identical keys and migrate these new keys to other data shards. For example, you can replicate a hotkey named foo in a data shard to generate three identical hotkeys named foo2, foo3, and foo4. Then, you can migrate foo2, foo3, and foo4 to other data shards to reduce the pressure on the data shard that contains foo.

Note

The disadvantage of this solution is that you need to modify the code to maintain multiple replicas, and it is difficult to ensure data consistency among multiple replicas. For example, update operations need to be synchronized to all replicas. We recommend that you use this solution as a temporary solution to alleviate urgent issues.

Enable read/write splitting

Read operations are more frequently performed than write operations.

If the read load remains high after you enable read/write splitting, you can further alleviate the load by increasing the number of read replicas.

Note

If a large number of requests are sent to a read/write splitting instance, master-replica synchronization can inevitably introduce some latency, and dirty data may be read from the instance. Therefore, read/write splitting is not the optimal solution for scenarios that have high requirements for read and write capabilities and data consistency.

Enable the proxy query cache feature

The same command is repeatedly issued to query the same key.

After you enable the proxy query cache feature, Tair and Redis use algorithms to identify hotkeys. Hotkeys are keys that receive more than 5,000 queries per second (QPS). Proxy nodes cache only the request and response data of a hotkey, instead of the entire key. If a proxy node receives a duplicate request within the validity period of the cached data, the proxy node directly returns the response of the request to the client without the need to interact with backend data shards.

Step 3: Prevent large keys and hotkeys from affecting business

Causes of large keys and hotkeys

In Tair and Redis, keys serve as the smallest unit of data distribution. Each key is stored in a specific data shard and cannot be split. Large keys and hotkeys may occur due to a variety of reasons, such as insufficient workload planning, accumulation of invalid data, and traffic spikes.

Category

Cause

Large keys

  • Inappropriate use of Tair and Redis may result in excessively large key values. For example, if a STRING key is used to store a large binary file, the size of the key may be larger than necessary.

  • Insufficient workload planning: Before a feature is released, a failure to sufficiently plan for workloads can result in problems. For example, members may not be properly split between keys and some keys may have more members than required.

  • Accumulation of invalid data: This occurs when invalid data is not deleted on a regular basis. For example, the number of members of a HASH key constantly increases when invalid data is not cleared in a timely manner.

  • Code failures: Code failures occur on consumer applications that use LIST keys, which causes the members of the keys to only increase.

Hotkeys

  • Unexpected traffic spikes: Unexpected traffic spikes may occur for a variety of reasons, such as high product popularity, hot news, a large number of "likes" flooding in from the viewers of a livestream, or battles between multiple large teams in a game.

Impacts of large keys and hotkeys

Category

Impact

Large keys

  • It takes longer for the client to run commands.

  • When an instance reaches its memory usage limit (maxmemory), operations may be blocked, important keys may be evicted, or out-of-memory (OOM) errors may occur.

  • The memory usage of a data shard in a cluster instance far exceeds that of other data shards, which results in imbalanced memory usage across data shards in the instance.

  • When a read request is made for a large key, the response time may increase and other services may be affected. This is because the bandwidth of the instance to which the key belongs is exhausted.

  • The primary database may be blocked for an extended period of time while a large key is being deleted. This may lead to a synchronization failure or a master-replica switchover.

Hotkeys

  • Hotkeys consume large amounts of CPU resources and may also increase network bandwidth usage. This can adversely affect other requests and lead to a decrease in overall system performance.

  • Request skews may take place for cluster instances. Request skews occur when one data shard in an instance receives a large number of requests while other data shards in the instance remain idle. In this situation, the maximum number of connections to a data shard may be reached and new connections to the shard may be rejected.

  • During flash sales, overselling may occur if the key corresponding to a commodity receives more requests than can be handled by the instance.

  • A cache breakdown occurs if a hotkey receives more requests than can be handled by the instance. In this case, a large number of requests are directly sent to the backend storage, and a backend storage breakdown may occur. This affects other business.

Prevention policies

Policy

Description

Configure alert settings

Specify appropriate alert thresholds for metrics, such as CPU utilization, memory usage, and connection usage of an instance. For example, you can specify 70% as the alert threshold for the memory usage of an instance and 20% as the alert threshold for the memory usage increase of the instance over an 1-hour period. When an alert is triggered, identify and optimize the large keys and hotkeys according to the guidelines in Step 1 and Step 2 of this topic to address them before they affect business.

Use Tair (Enterprise Edition)

Tair (Enterprise Edition) provides the TairHash data structure for scenarios involving large keys of the HASH type. TairHash allows you to specify TTL and version numbers for fields. The appropriate use of TairHash can significantly reduce the O&M workload, simplify business code, and effectively address the issues caused by large keys and hotkeys.