ApsaraDB for Redis is a high-performance database service. This topic describes the development and O&M standards that you can follow to design a more efficient business system and better use ApsaraDB for Redis. The standards are developed by Alibaba Cloud based on years of experience and are applicable to the following scenarios: business deployment, key design, SDK usage, command usage, and O&M management.

Understand the performance limits of ApsaraDB for Redis

Figure 1. Performance limits of ApsaraDB for Redis
Performance limits of ApsaraDB for Redis
Resource category Description
Computing resources Wildcard characters, concurrent Lua scripts, one-to-many Pub/Sub commands, and hotkeys consume a large number of computing resources. For cluster instances, these items can also cause skewed requests and underutilization of data shards.
Storage resources Streaming jobs and big keys consume a large number of storage resources. For cluster instances, these items can also cause data skew and underutilization of data shards.
Network resources Database-wide scans by running the KEYS command and range queries of big values and big keys by running the HGETALL command consume a large number of network resources and often cause thread congestion.
Notice The high-concurrency capability of ApsaraDB for Redis does not translate to a high-throughput capability. For example, the storage of big values in ApsaraDB for Redis does not improve access performance to a large degree. Rather, this affects the overall performance of ApsaraDB for Redis.

For cluster instances, hotkeys, big keys, or big values can also cause skewed storage or skewed requests.In a production environment, you must avoid reaching the performance limits of ApsaraDB for Redis. The following tables describe the business deployment, key design, SDK usage, command usage, and O&M management standards for ApsaraDB for Redis. These standards help you design a more efficient business system and better use the capabilities of ApsaraDB for Redis.

Business deployment standards

Importance Standard Description
★★★★★

Determine whether the scenario is high-speed cache or in-memory databases.

  • High-speed cache: We recommend that you disable append-only file (AOF) in the cache-only scenario to reduce overheads and avoid strong dependence on the data in the cache because data may be evicted. For example, after an ApsaraDB for Redis database is full, the data eviction policy is triggered to reclaim space for writing new data. The latency increases with the amount of data to be written.
    Notice To use the data flashback feature, you must enable AOF persistence. For more information, see Use data flashback to restore data by point in time.
  • In-memory databases: We recommend that you choose persistent memory-optimized instances of ApsaraDB for Redis Enhanced Edition (Tair). Persistent memory-optimized instances offer command-level persistence. In addition, you can monitor memory usage by configure alerts in the databases. For more information, see Alert settings.
★★★★★ Deploy your business close to ApsaraDB for Redis. For example, you can deploy your business in an Elastic Compute Service (ECS) instance that is created in the same virtual private cloud (VPC) as your ApsaraDB for Redis instances. ApsaraDB for Redis is a high-performance database service. However, if you deploy your business server far from ApsaraDB for Redis instances, and the business server and instances are connected over the Internet, the performance of ApsaraDB for Redis is reduced due to network latency.
Note For cross-region deployment, you can use the geo-replication capability of Global Distributed Cache for Redis which is also known as Global Replica to implement geo-disaster recovery or active geo-redundancy, reduce network latency, and simplify business design. For more information, see Overview.
★★★★☆ Create an ApsaraDB for Redis instance for each service. Do not use an ApsaraDB for Redis instance for different services. For example, do not use an ApsaraDB for Redis instance for both high-speed cache and in-memory database services. Otherwise, the eviction policies, slow queries, and FLUSHDB command of one service affect other services.
★★★★☆ Configure appropriate eviction policies to evict expired keys. The default eviction policy is volatile-lru.For more information about eviction policies, see Supported parameters.
★★★☆☆ Manage stress testing data and duration. ApsaraDB for Redis does not automatically delete stress testing data. To prevent impacts on your business, you must manage stress testing data and duration by yourself.

Key design standards

Importance Standard Description
★★★★★ Configure key values to an appropriate size. We recommend that you configure key values to a size smaller than 10 KB. Values that are too large can cause data skew, hotkeys, and high bandwidth or CPU utilization. You can prevent these issues from the beginning by making sure that key values are of appropriate size.
★★★★★ Configure appropriate key names that have appropriate length.
  • Key names:
    • Use readable strings as key names. If you want to combine a database name, table name, and field name in a key name, we recommend that you use colons (:) to separate them. For example, you can use project:user:001 as a key name.
    • Shorten key names without compromising their ability to describe your business. For example, username can be shortened to u.
    • In ApsaraDB for Redis, braces {} are recognized as hash tags. In this case, if you use cluster instances, you must correctly use braces in key names to avoid data skew.For more information, see keys-hash-tags.
      Note For a cluster instance, if you want to manage multiple keys by running a command, such as the RENAME command, but do not use hash tags to ensure that the keys reside in the same data shard, the command cannot be run.
  • Length: We recommend that you configure key names to be no more than 128 bytes in length. The shorter, the better.
★★★★★ For complex data structures that support sub-keys, you must avoid including excessive sub-keys in one key. We recommend that you include less than 1,000 sub-keys in a key.
Note Common complex data structures include hashes, sets, Zsets, GEO structures, streams, and structures that are provided only by performance-enhanced instances of ApsaraDB for Redis Enhanced Edition (Tair), such as TairHash, TairBloom, and TairGIS.
The time complexity of some commands, such as HGETALL, is directly related to the number of sub-keys. Excessive sub-keys increase the time complexity of a command. If you frequently run commands whose time complexity is O(N) or higher, many issues occur, such as slow queries, data skew, and hotkeys.
★★★★☆ Use the serialization method to convert values into readable structures. The bytecode of a programming language may change when the version of the language changes. Therefore, if you store naked objects, such as Java objects and C# objects, in ApsaraDB for Redis instances, the upgrade of the software stack may become difficult. We recommend that you use the serialization method to convert values into readable structures.
★★★★☆ Focus on how active eviction policies impact memory usage of TairHash and adjust active eviction policies based on different scenarios. The TairHash data structures available in performance-enhanced instances of ApsaraDB for Redis Enhanced Edition (Tair) support efficient and dynamic eviction policies to free up memory. However, these policies increase the memory usage of TairHash data. For more information, see TairHash memory consumption and expiration policies.

SDK usage standards

Importance Standard Description
★★★★★ Use JedisPool or JedisCluster clients to connect to ApsaraDB for Redis instances.
Note We recommend that you use TairJedis clients to connect to performance-enhanced instances of ApsaraDB for Redis Enhanced Edition (Tair), because TairJedis clients support the encapsulation of new data structures. For more information, see TairJedis client.
If you use a single client, the client cannot automatically reconnect to ApsaraDB for Redis instances after a connection times out. For more information about how to use JedisPool clients to connect to ApsaraDB for Redis instances, see Jedis client, JedisPool optimization, and JedisCluster.
★★★★☆ Do not use Lettuce clients. Lettuce clients do not automatically reconnect to ApsaraDB for Redis instances after multiple requests time out. If failures occur in ApsaraDB for Redis instances and cause failover on proxy servers or data shards, connection timeouts may occur and Lettuce clients cannot reconnect to ApsaraDB for Redis instances. To prevent these risks, we recommend that you use Redis clients. For more information, see Jedis client.
★★★★☆ Design appropriate fault tolerance mechanisms for your clients. Network fluctuations and high usage of resources may cause connection timeouts or slow queries. In this situation, you must design appropriate fault tolerance mechanisms for your clients.
★★★★☆ Set longer retry intervals for your clients. If retry intervals are shorter than required, such as shorter than 200 milliseconds, a large number of retries may occur in a short period of time. This can result in a service avalanche. For more information, see Retry mechanisms for Redis clients.

Command usage standards

Importance Standard Description
★★★★★ Avoid range queries, such as range queries by running the KEYS * command. Instead, use multiple point queries or run the SCAN command to reduce latency. Range queries may cause service interruptions, slow queries, or congestion.
★★★★★ Use extended data structures to perform complex operations. For more information, see Integration with multiple Redis modules. Do not use Lua scripts. Lua scripts consume a large number of computing and memory resources and do not support multi-threading acceleration. Overly complex or improper Lua scripts may result in the exhaustion of resources.
★★★★☆ Use pipelines to reduce the round-trip time of data. If you want to send multiple commands to a server, and your client does not need to wait for responses from the server, you can use a pipeline to send the commands at a time. Take note of the following items when you use pipelines:
  • A client that uses pipelines exclusively connects to a server. We recommend that you establish a dedicated connection for pipeline operations to separate them from conventional operations.
  • Each pipeline must contain a proper number of commands. We recommend that you use each pipeline to send no more than 100 commands.
★★★★☆ Use transaction commands. For more information, see Transaction command group. When you use transaction commands, take note of the following limits:
  • Transactions do not have rollback conditions.
  • If you want to run transaction commands on cluster instances, use hash tags to ensure that the keys to be managed are distributed to the same hash slot. You must also prevent skewed storage that hash tags may cause.
  • Do not encapsulate transaction commands in Lua scripts, because the compilation and loading of these commands consume a large number of computing resources.
★★★★☆ Do not use the Pub and Sub command group to perform a large number of message distribution tasks. For more information, see Pub and Sub command group. The Pub and Sub command group does not support data persistence or acknowledge mechanisms that ensure data reliability. We recommend that you do not use Pub and Sub commands to perform a large number of message distribution tasks. For example, you cannot use these commands to distribute a message whose value is greater than 1 KB to more than 100 subscriber clients. Otherwise, server resources may be exhausted, and subscriber clients may not receive the message.
Note To improve performance and balance, ApsaraDB for Redis is optimized for Pub and Sub commands. In cluster instances, proxy servers calculate the hash values of commands based on channel names and allocate commands to corresponding data nodes.

O&M management standards

Importance Standard Description
★★★★★ Understand the impacts of different instance management operations. Configuration changes or restarts affect the status of an ApsaraDB for Redis instance. For example, the instance may become disconnected for a few seconds. Before you perform the preceding operations, make sure that you understand the impacts. For more information, see Instance states and impacts.
★★★★★ Verify the error handling capabilities or disaster recovery logic of a client. ApsaraDB for Redis can monitor the health status of nodes. If a master node in an instance becomes unavailable, ApsaraDB for Redis automatically triggers a master-replica switchover. The roles of master and replica nodes are switched over to ensure the high availability of the instance. Before a client is officially released, we recommend that you manually trigger the master-replica switchover. This can help you verify the error handling capabilities or disaster recovery logic of the client. For more information, see Manually switch workloads from a master node to a replica node.
★★★★★ Disable time-consuming or high-risk commands. In a production environment, abuse of commands may cause problems. For example, the FLUSHALL command can delete all data. The KEYS command may cause network congestion. To improve the stability and efficiency of services, you can disable these commands to minimize risks. For more information, see Disable high-risk commands.
★★★★☆ Handle pending events at the earliest opportunity. To enhance user experience and provide improved service performance and stability, Alibaba Cloud occasionally generates pending events to upgrade the hardware and software of specific servers or replace network facilities. For example, a pending event is generated when the minor version of databases needs to be updated. After you receive an event notification from Alibaba Cloud, you can check the impacts of the event and change the scheduled time of the event to meet your business requirements. For more information, see Query and manage pending events.
★★★★☆ Configure alerts for core metrics and better monitor the status of your instances. Configure alerts for core metrics such as CPU utilization, memory utilization, and bandwidth utilization to monitor the status of your instances in real time. For more information, see Alert settings.
★★★★☆ Use O&M features provided by ApsaraDB for Redis to check the status of instances or troubleshoot resource consumption exceptions on a regular basis.
  • Analyze slow query logs: Slow query logs help you locate slow queries and the IP addresses of the clients that send the query requests. Slow query logs provide a reliable basis for addressing timeouts.
  • Query monitoring data: ApsaraDB for Redis supports various performance metrics. These metrics allow you to gain insights into the status of ApsaraDB for Redis instances and troubleshoot issues at the earliest opportunity.
  • Create a diagnostic report: Diagnostic reports help you evaluate the status of ApsaraDB for Redis instances, such as performance level, skewed requests, and slow queries. Diagnostic reports also help you identify anomalies on ApsaraDB for Redis instances.
  • Use the cache analysis feature to display details about big keys: The cache analysis feature helps you identify big keys of ApsaraDB for Redis instances. You can also learn the memory usage, distribution, and expiration time of big keys.
  • Query real-time hotkeys: The real-time hotkey query feature helps you identify hotkeys of ApsaraDB for Redis instances and allows you to further optimize your databases.
★★★☆☆ Enable the audit log feature and evaluate audit logs. After you enable the audit log feature, the audit information about write operations are recorded. ApsaraDB for Redis also allows you to query and export audit logs, and analyze them online. These features help you monitor the security and performance of your ApsaraDB for Redis instances. For more information, see Enable the new audit log feature.
Notice After you enable the audit log feature, the performance of ApsaraDB for Redis instances may degrade by 5% to 15%. The actual performance degradation varies based on the number of write operations or audit operations. If your business expects a large number of write operations, we recommend that you enable the audit log feature only when you perform O&M operations, such as troubleshooting. This helps you prevent performance degradation.