Due to network and running environments, applications may encounter temporary faults, such as transient network jitter, temporary unavailability of services, and timeout caused by busy services. You can configure automatic retry mechanisms to avoid temporary failures and ensure successful operations.

Causes for temporary failures

Cause Description
The high availability mechanism triggered ApsaraDB for Redis can monitor the health status of nodes. If a master node in an instance fails, ApsaraDB for Redis automatically triggers a master-replica switchover. The roles of master and replica nodes are switched to ensure high availability of the instance. At this time, the client may encounter the following temporary failures:
  • Transient connections in seconds
  • Read-only state within 30 seconds (to avoid potential risks of data loss and dual writes caused by primary/secondary failover).
Note For more information, see Causes and impacts of master-replica switchovers.
Request jams caused by slow queries Request jams and slow queries occur when operations with time complexity of O(n) are executed. In this case, other requests initiated by the client may experience temporary failures.
Complex network environments Complex network environments between the client and Redis server may cause problems such as occasional network jitter and data retransmission. In this case, requests initiated by the client may temporarily fail.

Recommended retry rules

Retry rule Description
Only retry idempotent operations A timeout event may occur at the following phases:
  • A command is sent by the client but has not reached ApsaraDB for Redis.
  • The command reaches ApsaraDB for Redis, but the execution times out.
  • The command is executed on ApsaraDB for Redis, but a timeout event occurs when the result is returned to the client.

A retry may cause an operation to be repeated on ApsaraDB for Redis. Therefore, not all operations are suitable for a retry mechanism. We recommend that you retry only idempotent operations, such as SET commands. After you run the SET a b command multiple times, the value of a can only be b or failed executions. When you run the LPUSH mylist a command which is not idempotent multiple times, mylist may contain multiple elements.

Appropriate number and interval of retries Adjust the number and interval of retries based on business requirements and actual scenarios. Otherwise, the following issues may occur:
  • If the number of retries is very low or the interval is very long, the application may fail because operations cannot be performed.
  • If the number of retries is very high or the interval is very short, the application may consume more system resources and request jams may cause the server to fail.

Common retry interval methods include immediate retry, fixed-time retry, exponentially increasing time retry, and random retry.

Avoid retry nesting Retry nesting may cause repeated or even unlimited retries.
Record retry exceptions and generate failure reports During the retry process, we recommend that you configure the system to generate retry logs at the WARN level and only when the retry fails.

Jedis client

  • In JedisPool mode, Jedis does not provide retry mechanisms. We recommend that you use TairJedis which is based on Jedis encapsulation and encapsulates the Jedis retry class to quickly implement retry policies.
    Note If Performance-enhanced instances instances of ApsaraDB for Redis Enhanced Edition (Tair) are used, this client allows you to use the data structures developed by Alibaba Cloud. For more information about the data structures, see Commands supported by extended data structures of ApsaraDB for Redis Enhanced Edition (Tair).
  • In JedisCluster mode, you can specify the maxAttempts parameter to define the number of retries in case of a failure. The default value is 5.

An example of retry settings on the Jedis client:

//Add a dependency.
<dependency>
  <groupId>com.aliyun.tair</groupId>
  <artifactId>alibabacloud-tairjedis-sdk</artifactId>
  <version>Enter the latest version number</version>
</dependency>

//Set the key value command to automatically retry five times and the maximum overall retry period to 10 seconds. For each retry, the system waits for a while between class indexes. If the command fails, an exception is thrown. 

int maxRetries = 5; //Specify the maximum number of retries.
Duration maxTotalRetriesDuration = Duration.ofSeconds(10); //Specify the maximum retry period. Unit: seconds.
try {
    String ret = new JedisRetryCommand<String>(jedisPool, maxRetries, maxTotalRetriesDuration) {
        @Override
        public String execute(Jedis connection) {
            return connection.set("key", "value");
        }
    }.runWithRetries();
} catch (JedisException e) {
     // Indicates that maxRetries attempts have been made or the maximum query time maxTotalRetriesDuration reached. 
    e.printStackTrace();
}

Redisson client

The Redisson client provides two parameters to control the retry logic:

  • retryAttempts: the number of retries. Default value: 3.
  • retryInterval: the retry interval. Default value: 1,500 milliseconds.

An example of retry settings on the Jedis client:

Config config = new Config();
config.useSingleServer()
    .setTimeout(1000)
    .setRetryAttempts(3)
    .setRetryInterval(1500) //ms
    .setAddress("redis://127.0.0.1:6379");
RedissonClient connect = Redisson.create(config);

StackExchange.Redis client

The StackExchang.Redis client only supports connection retries. An example of retry settings on the StackExchange.Redis client:

var conn = ConnectionMultiplexer.Connect("redis0:6380,redis1:6380,connectRetry=3");
Note For more information about API-level retry policies, see Polly.

Lettuce client

Although the Lettuce client does not provide parameters for retries after a command times out, you can use the following parameters to implement retry policies:

  • at-most-once execution: The command can be executed once at most. If the client is disconnected and then reconnected, the command may be lost.
  • at-least-once execution (default): A minimum of one successful command execution is ensured. This means that multiple attempts may be made to ensure a successful execution. If this method is used and a primary/secondary switchover for an ApsaraDB for Redis instance occurs, a large number of retry commands may be accumulated on the client. After the primary/secondary switchover is complete, the CPU utilization of the ApsaraDB for Redis instance may surge.
Note For more information, see Client-Options and Command execution reliability.

An example of retry settings on the Lettuce client:

clientOptions.isAutoReconnect() ? Reliability.AT_LEAST_ONCE : Reliability.AT_MOST_ONCE;