Large keys and hotkeys may cause service performance degradation, poor user experience, and even system failures. This topic describes how to quickly identify and optimize large keys and hotkeys, analyze their causes and impacts, and provide preventive measures to minimize their impact on your business.
Step 1: Identify large keys and hotkeys
Alibaba Cloud console tools
Tair and Redis provide the top key statistics and offline key analysis features in the console to help you quickly identify large keys and hotkeys.
Method | Limits | Description | Procedure |
Use the top key statistics feature (recommended) | This feature is available only for Redis Open-Source Edition instances that run Redis 5.0 or later and Tair (Enterprise Edition) DRAM-based and persistent memory-optimized instances. |
|
|
The offline key analysis feature is unavailable for ESSD/SSD-based instances. |
|
If your instance cannot use the preceding features, use the following methods.
Other methods to identify large keys and hotkeys
Step 2: Optimize large keys and hotkeys
Large keys
Solution | Scenario | Suggestion |
Clean up expired data | A large amount of accumulated expired data, such as uncleaned incremental data in HASH keys. | Use a combination of the HSCAN and HDEL commands to clear invalid data. This prevents the instance from being blocked when a large amount of data is cleared. |
Compress large keys | Data such as logs and configurations involving compressible data formats such as JSON and XML. |
Note Compression and decompression operations consume additional CPU resources and may affect processing performance. |
Split large keys | Frequently accessed data types like HASH and ZSET, such as leaderboards. |
Splitting large keys can effectively avoid data skew. |
Dump large keys | STRING-type large files or BLOBs. | Dump unsuitable data to other storage solutions such as Object Storage Services (OSS) and delete the data from the instance.
|
Hotkeys
Solution | Scenario | Suggestion |
Replicate hotkeys for cluster instances | A hotkey is stored as a whole in a single shard. Requests cannot be distributed by migrating part of the data. | Replicate the hotkey in the data shard to generate identical keys and migrate these new keys to other data shards. For example, you can replicate a hotkey named foo in a data shard to generate three identical hotkeys named foo2, foo3, and foo4. Then, you can migrate foo2, foo3, and foo4 to other data shards to reduce the pressure on the data shard that contains foo. Note The disadvantage of this solution is that you need to modify the code to maintain multiple replicas, and it is difficult to ensure data consistency among multiple replicas. For example, update operations need to be synchronized to all replicas. We recommend that you use this solution as a temporary solution to alleviate urgent issues. |
Read operations are more frequently performed than write operations. | If the read load remains high after you enable read/write splitting, you can further alleviate the load by increasing the number of read replicas. Note If a large number of requests are sent to a read/write splitting instance, master-replica synchronization can inevitably introduce some latency, and dirty data may be read from the instance. Therefore, read/write splitting is not the optimal solution for scenarios that have high requirements for read and write capabilities and data consistency. | |
The same command is repeatedly issued to query the same key. | After you enable the proxy query cache feature, Tair and Redis use algorithms to identify hotkeys. Hotkeys are keys that receive more than 5,000 queries per second (QPS). Proxy nodes cache only the request and response data of a hotkey, instead of the entire key. If a proxy node receives a duplicate request within the validity period of the cached data, the proxy node directly returns the response of the request to the client without the need to interact with backend data shards. |
Step 3: Prevent large keys and hotkeys from affecting business
Causes of large keys and hotkeys
In Tair and Redis, keys serve as the smallest unit of data distribution. Each key is stored in a specific data shard and cannot be split. Large keys and hotkeys may occur due to a variety of reasons, such as insufficient workload planning, accumulation of invalid data, and traffic spikes.
Category | Cause |
Large keys |
|
Hotkeys |
|
Impacts of large keys and hotkeys
Category | Impact |
Large keys |
|
Hotkeys |
|
Prevention policies
Policy | Description |
Specify appropriate alert thresholds for metrics, such as CPU utilization, memory usage, and connection usage of an instance. For example, you can specify 70% as the alert threshold for the memory usage of an instance and 20% as the alert threshold for the memory usage increase of the instance over an 1-hour period. When an alert is triggered, identify and optimize the large keys and hotkeys according to the guidelines in Step 1 and Step 2 of this topic to address them before they affect business. | |
Use Tair (Enterprise Edition) | Tair (Enterprise Edition) provides the TairHash data structure for scenarios involving large keys of the HASH type. TairHash allows you to specify TTL and version numbers for fields. The appropriate use of TairHash can significantly reduce the O&M workload, simplify business code, and effectively address the issues caused by large keys and hotkeys. |