Retry Mechanism in Java Fault-tolerant Programming

By Shuyang

Fault-tolerant programming is a programming concept aimed at ensuring the reliability and stability of applications. It incorporates the following measures:

Exception handling: Capture and handle exceptions to prevent application crashes.
Error handling: Handle errors by checking error codes and taking appropriate actions, such as retrying or rolling back.
Retry mechanism: When an error occurs, attempt to re-execute the code block until it succeeds or reaches the maximum number of attempts.
Backup mechanism: Switch to a backup system when the primary system fails to maintain normal application operation.
Log records: Record errors and exceptions for troubleshooting purposes.

Fault-tolerant programming is an essential programming concept that improves application reliability, stability, and code robustness.

1. The Importance of Retry

In the development of business technology, it is crucial to design a system architecture that is reusable, scalable, and orchestratable. This directly determines the efficiency of iterating business requirements. At the same time, business and technical support personnel should also adopt a pessimistic perspective. In a distributed environment, occasional fluctuations in HSF services caused by single-point issues are not uncommon. Common hardware and software problems include system fluctuations, single points of failure, service timeouts, service exceptions, middleware fluctuations, network timeouts, and configuration errors. Ignoring these exceptions directly weakens the service's robustness and can potentially impact user experience, lead to user complaints, and even result in system failures. Therefore, when designing solutions and implementing technologies, it is essential to fully consider various failure scenarios and employ defensive programming accordingly.

When calling third-party interfaces, failures often occur. For such cases, we usually follow the logic of retrying the failure or storing the failure. However, retrying is not suitable for all scenarios. For example, invalid parameter verification, whether read and write operations are suitable for retries, and whether data is idempotent are factors to consider. Retrying is applicable when remote calls timeout or the network is suddenly interrupted. Multiple retries can be set to increase the likelihood of a successful call. For the convenience of subsequent troubleshooting and failure rate calculation, recording the number of failures and whether they were successfully stored can facilitate the counting and scheduling of retry tasks.

This article summarizes elegant retry techniques in the face of service failures, such as AOP and CGLIB. It also provides an analysis of the source code of retry tools and components, along with some important notes.

2. How to Retry

2.1 Simple Method

Test demo:

@Test
public Integer sampleRetry(int code) {
    System.out.println("sampleRetry,time：" + LocalTime.now());
    int times = 0;
    while (times < MAX_TIMES) {
        try {
            postCommentsService.retryableTest(code);
        } catch (Exception e) {
            times++;
            System.out.println("Number of retries" + times);
            if (times >= MAX_TIMES) {
                //Store the record and retry the subsequent scheduled task.
                //do something record... 
                throw new RuntimeException(e);
            }
        }
    }
    System.out.println("sampleRetry,return！");
    return null;
}

2.2 Dynamic Proxy Mode

In certain scenarios, it may not be appropriate or possible for one object to directly reference another. In these instances, a proxy object can serve as a mediator, facilitating communication between the client and the target object. The benefit of employing a proxy is its high compatibility, allowing it to be invoked by any retry method.

Usage

public class DynamicProxyTest implements InvocationHandler {
    private final Object subject;
    public DynamicProxy(Object subject) {
        this.subject = subject;
    }

      /**
     * Obtain a dynamic proxy.
     *
     * @param realSubject proxy object.
     */
    public static Object getProxy(Object realSubject) {
        //    Pass the real object that you want to proxy. The method is called through the real object.
        InvocationHandler handler = new DynamicProxy(realSubject);
        return Proxy.newProxyInstance(handler.getClass().getClassLoader(),
                realSubject.getClass().getInterfaces(), handler);
    }

    @Override
    public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
        int times = 0;
        while (times < MAX_TIMES) {
            try {
                // When the proxy object calls the method of the real object, it will automatically skip to the invoke method of the handler object associated with the proxy object to call.
                return method.invoke(subject, args);
            } catch (Exception e) {
                times++;
                System.out.println("Number of retries" + times);
                if (times >= MAX_TIMES) {
                    //Store the record and retry the subsequent scheduled task.
                    //do something record... 
                    throw new RuntimeException(e);
                }
            }
        }

        return null;
    }
}

Test Demo

@Test
 public Integer V2Retry(int code) {
         RetryableTestServiceImpl realService = new RetryableTestServiceImpl();
        RetryableTesterviceImpl proxyService = (RetryableTestServiceImpl) DynamicProxyTest.getProxy(realService);
        proxyService.retryableTest(code);
}

2.3 Generating Retry Proxy Using Bytecode Technology

CGLIB is a library for generating code that enables the extension of Java classes and implementation of interfaces at runtime. It offers powerful features, high performance, and excellent quality. CGLIB can generate subclasses to act as proxies for target objects, allowing for extension and enhancement without modifying the original class. This technology finds wide application in AOP frameworks, ORM frameworks, caching frameworks, and various other Java applications. By generating bytecode, CGLIB creates proxy classes that deliver high performance.

Usage

public class CglibProxyTest implements MethodInterceptor {

    @Override
    public Object intercept(Object o, Method method, Object[] objects, MethodProxy methodProxy) throws Throwable {
        int times = 0;
        while (times < MAX_TIMES) {
            try {
                //Call the parent class method through the proxy subclass.
                return methodProxy.invokeSuper(o, objects);
            } catch (Exception e) {
                times++;

                if (times >= MAX_TIMES) {
                    throw new RuntimeException(e);
                }
            }
        }
        return null;
    }

    /**
     * Obtain the proxy class.
     * @param clazz class information
     * @return result of the proxy class
     */
    public Object getProxy(Class clazz){
        Enhancer enhancer = new Enhancer();
        //The class of the target object.
        enhancer.setSuperclass(clazz);
        enhancer.setCallback(this);
        //Create a subclass instance of the target object class as a proxy through bytecode.
        return enhancer.create();
    }

}

Test Demo

@Test
 public Integer CglibRetry(int code) {
        RetryableTestServiceImpl proxyService = (RetryableTestServiceImpl) new CglibProxyTest().getProxy(RetryableTestServiceImpl.class);
        proxyService.retryableTest(code);
}

2.4 Retrying on HSF Call Timeout

In our daily development, experiencing momentary jitter when calling a third-party HSF service is quite common. To mitigate the impact of call timeouts on your business, you can utilize HSF synchronous retry based on the business and downstream service characteristics. If no specific framework is specified, the HSF interface will not automatically retry when it times out. Within the @HSFConsumer annotation, there is a retries parameter that can be used to set the number of retries on failure. By default, the value of this parameter is 0.

  @HSFConsumer(serviceVersion = "1.0.0", serviceGroup = "hsf",clientTimeout = 2000, methodSpecials = {
            @ConsumerMethodSpecial(methodName = "methodA", clientTimeout = "100", retries = "2"),
            @ConsumerMethodSpecial(methodName = "methodB", clientTimeout = "200", retries = "1")})
    private XxxHSFService xxxHSFServiceConsumer;

Principle of Retry on HSFConsumer Timeout

The following figure shows the process of calling an HSF service:

Retries on an HSF timeout occur in the AsyncToSyncInvocationHandler # invokeType(.): If the retries parameter is set to be greater than 0, the retry() method will be used to retry, and a retry is only triggered by TimeoutExcenptions.

Source Code Analysis

private RPCResult invokeType(Invocation invocation, InvocationHandler invocationHandler) throws Throwable {
        final ConsumerMethodModel consumerMethodModel = invocation.getClientInvocationContext().getMethodModel();
        String methodName = consumerMethodModel.getMethodName(invocation.getHsfRequest());

        final InvokeMode invokeType = getInvokeType(consumerMethodModel.getMetadata(), methodName);
        invocation.setInvokeType(invokeType);

        ListenableFuture<RPCResult> future = invocationHandler.invoke(invocation);

        if (InvokeMode.SYNC == invokeType) {
            if (invocation.getBroadcastFutures() != null && invocation.getBroadcastFutures().size() > 1) {
                //broadcast
                return broadcast(invocation, future);
            } else if (consumerMethodModel.getExecuteTimes() > 1) {
                //retry
                return retry(invocation, invocationHandler, future, consumerMethodModel.getExecuteTimes());
            } else {
                //normal
                return getRPCResult(invocation, future);
            }
        } else {
            // pseudo response, should be ignored
            HSFRequest request = invocation.getHsfRequest();
            Object appResponse = null;
            if (request.getReturnClass() != null) {
                appResponse = ReflectUtils.defaultReturn(request.getReturnClass());
            }
            HSFResponse hsfResponse = new HSFResponse();
            hsfResponse.setAppResponse(appResponse);

            RPCResult rpcResult = new RPCResult();
            rpcResult.setHsfResponse(hsfResponse);
            return rpcResult;
        }
    }

As you can see from the above code, a retry is only triggered by synchronous calls. If the number of times that the metadata of the consumer method is executed is greater than 1 (consumerMethodModel.getExecuteTimes() > 1), it will switch to the retry method to try again:

private RPCResult retry(Invocation invocation, InvocationHandler invocationHandler,
                            ListenableFuture<RPCResult> future, int executeTimes) throws Throwable {

        int retryTime = 0;

        while (true) {
            retryTime++;
            if (retryTime > 1) {
                future = invocationHandler.invoke(invocation);
            }

            int timeout = -1;
            try {
                timeout = (int) invocation.getInvokerContext().getTimeout();
                RPCResult rpcResult = future.get(timeout, TimeUnit.MILLISECONDS);

                return rpcResult;
            } catch (ExecutionException e) {
                throw new HSFTimeOutException(getErrorLog(e.getMessage()), e);
            } catch (TimeoutException e) {
                //retry only when timeout
                if (retryTime < executeTimes) {
                    continue;
                } else {
                    throw new HSFTimeOutException(getErrorLog(e.getMessage()), timeout + "", e);
                }
            } catch (Throwable e) {
                throw new HSFException("", e);
            }
        }
    }

The HSF consumer timeout retry principle is based on a simple while loop with try-catch.

Defects

Retries only occur when the method is called synchronously.
The retry method is called only when a TimeoutException occurs in the HSF interface.
If the retries parameter is set for a method in an HSF consumer and a timeout exception occurs when the method is returned, the HSF SDK will automatically retry. The retry is implemented as a while loop with try-catch. Therefore, if the automatic retry interface becomes slow and the number of retries is set too large, the response time will become longer. In extreme cases, the HSF thread pool may be fully occupied. Therefore, the automatic retry feature of HSF is a basic and simple capability that is not recommended for large-scale use.

2.5 Spring Retry

Spring Retry, a subproject of the Spring Framework, provides declarative retry support that allows for standardized handling of retries for specific operations. This framework is well-suited for business scenarios that require retries, such as network requests and database access. With Spring Retry, you can use annotations to set up retry policies without writing lengthy code. All configurations are based on annotations, making Spring Retry easy to use and understand.

POM dependencies

<dependency>
    <groupId>org.springframework.retry</groupId>
    <artifactId>spring-retry</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-aspects</artifactId>
</dependency>

Enable @Retryable

After the spring-retry jar package is installed, add the @EnableRetry annotation to the startup class of Spring Boot.

@EnableRetry
@SpringBootApplication(scanBasePackages = {"me.ele.camp"},excludeName = {"me.ele.oc.orm.OcOrmAutoConfiguraion"})
@ImportResource({"classpath*:sentinel-tracer.xml"})
public class Application {

    public static void main(String[] args) {
        System.setProperty("APPID","alsc-info-local-camp");
        System.setProperty("project.name","alsc-info-local-camp");
}

Add the @Retryable annotation to the service implementation class

    @Override
    @Retryable(value = BizException.class, maxAttempts = 6)
    public Integer retryableTest(Integer code) {
        System.out.println("retryableTest,time：" + LocalTime.now());
        if (code == 0) {
            throw new BizException("Exception", "Exception");
        }
        BaseResponse<Object> objectBaseResponse = ResponseHandler.serviceFailure(ResponseErrorEnum.UPDATE_COMMENT_FAILURE);

        System.out.println("retryableTest,Correct！");
        return 200;
        
    }


    @Recover
    public Integer recover(BizException e) {
        System.out.println("Callback method executed!");
        //Add logs to the database or call the remaining methods.
        return 404;
        };

The code shows that the @Retryable annotation is added to the implementation method. @ Retryable has the following parameters that can be configured:

value	A retry is triggered only when a specified exception is thrown.
include	Similar to the value, this parameter is empty by default. If the exclude parameter is also empty, all exceptions can trigger a retry by default.
exclude	Specifies the exceptions not to trigger a retry.
maxAttempts	The maximum number of retries. The default value is 3.
backoff	The retry delay policy. The @Backoff annotation is used by default, and the default value is 1000 (unit: milliseconds).
multiplier	Specifies the delay multiple. The default value is 0, which means a one-second delay between retry attempts. If the multiplier is set to 1.5, the first retry is 2 seconds after, the second 3 seconds, and the third 4.5 seconds.

Spring Retry also offers the @Recover annotation, which is used to handle failures after @Retryable retries fail. If you do not need a callback method, you can simply omit writing a callback method. In this case, when the retries are exhausted and the business criteria are still not met, an exception is thrown. The parameter passed, BizException e, serves as a signal for the callback. This means that when all retries are used up and have failed, we throw this BizException e to trigger the callback method.

Notes:

• When using the @Recover annotation to enable the call method on retry failure, the annotated parameter must be an exception thrown by @Retryable, otherwise, it will not be recognized.

• The return value of the method annotated with @Recover must be the same as that of the method annotated with @Retryable.

• The callback method and the retry method are written in the same implementation class.

• As it is based on AOP, it does not support self-calls within the class.

• You cannot use try-catch within a method. You can only throw an exception to the outside, and the exception must be of the Throwable type.

Principle

Sequence diagram of a Spring Retry call:

The basic principle of Spring Retry is to introduce AOP capability through the @EnableRetry annotation. When the Spring container starts, it scans all methods with @Retryable and @CircuitBreaker annotations and generates a PointCut and Advice for them. When a method call occurs, Spring delegates the call to the interceptor RetryOperationsInterceptor, which implements the backoff retry on failure and the degradation recovery method. This design pattern makes the implementation of the retry logic simple and makes full use of the AOP capabilities provided by the Spring framework, thus achieving an efficient and elegant retry mechanism.

Defects

While Spring Retry can elegantly implement retries, it still has two unfriendly designs:

Firstly, the retry entity is restricted to a Throwable subclass, which means that retries are designed for catching functional exceptions. However, we may want to rely on a data object entity as a retry entity, but the Spring Retry framework must forcefully cast it to a Throwable subclass.

Secondly, the assertion object at the retry root uses a doWithRetry Exception instance, which does not conform to the return design of normal internal assertions.

Spring Retry advocates annotated retries of methods. The retry logic is executed synchronously, and the failure of retries refers to a Throwable exception. If you are trying to determine whether a retry is needed based on the state of the returned value, you may have to judge the returned value by yourself and then explicitly throw an exception.

2.6 Guava Retrying

Guava Retrying is a library based on the retry mechanism of Guava, a core Java library developed by Google. It provides a general-purpose method for retrying arbitrary Java code with specific stop, retry, and exception-handling capabilities that are enhanced by Guava's predicate matching. This library supports a variety of retry policies, such as specifying the number and wait interval of retries. Additionally, it supports predicate matching to decide whether the retry should be performed and what to do during the retry. The most important feature of Guava Retrying is that it can flexibly integrate with other Guava libraries, making it easy to use.

POM Dependencies

      <dependency>
      <groupId>com.github.rholder</groupId>
      <artifactId>guava-retrying</artifactId>
      <version>2.0.0</version>
    </dependency>

Test Demo

public static void main(String[] args) {
     Callable<Boolean> callable = new Callable<Boolean>() {
            @Override
            public Boolean call() throws Exception {
                // do something useful here
                log.info("call...");
                throw new RuntimeException();
            }
        };

        Retryer<Boolean> retryer = RetryerBuilder.<Boolean>newBuilder()
             //retryIf Retry conditions
                .retryIfException()
                .retryIfRuntimeException()
                .retryIfExceptionOfType(Exception.class)
                .retryIfException(Predicates.equalTo(new Exception()))
                .retryIfResult(Predicates.equalTo(false))
           //Wait policy: Each request is sent at an interval of 1s.
                .withWaitStrategy(WaitStrategies.fixedWait(1, TimeUnit.SECONDS))
          //Stop policy: 6 attempts
              .withStopStrategy(StopStrategies.stopAfterAttempt(6))
                //Time limit: A request cannot exceed 2s.
              .withAttemptTimeLimiter(
          AttemptTimeLimiters.fixedTimeLimit(2, TimeUnit.SECONDS))
           //Register a custom listener (you can implement the listener after failure).
              .withRetryListener(new MyRetryListener()).build();
        try {
            retryer.call(callable);
        } catch (Exception ee) {
            ee.printStackTrace();
        }
}

If you require additional processing actions to occur when a retry is attempted, such as sending an alert email, then you can use RetryListener. After each retry, Guava Retrying automatically calls back your registered listener. You can register multiple RetryListeners, and Guava Retrying will sequentially call them in the order of registration.

public class MyRetryListener implements RetryListener {
    @Override
    public <V> void onRetry(Attempt<V> attempt) {
         // The number of retries.
        System.out.print("[retry]time=" + attempt.getAttemptNumber());
        // The delay from the first retry.
        System.out.print(",delay=" + attempt.getDelaySinceFirstAttempt());

        // Retry result: terminated with exceptions or returned normally.
        System.out.print(",hasException=" + attempt.hasException());
        System.out.print(",hasResult=" + attempt.hasResult());

        // The cause of the exception.
        if (attempt.hasException()) {
            System.out.print(",causeBy=" + attempt.getExceptionCause().toString());
            // do something useful here
        } else {
            // The normally returned result.
            System.out.print(",result=" + attempt.getResult());
        }
        System.out.println();
    }
}

RetryerBuilder is a factory builder that allows customization of retry sources and supports multiple retry sources. You can configure the number of retries, retry timeout, and waiting interval. Additionally, you can create a Retryer instance.

The retry source of RetryerBuilder supports exception objects and custom assertion objects. It can simultaneously support multiple objects and is compatible with them.

• retryIfException: A retry is triggered when a runtime exception or a checked exception is thrown. It will not be triggered when an error is thrown.

• retryIfRuntimeException: A retry is triggered only when a runtime exception is thrown. It will not be triggered when a checked exception or error is thrown.

• retryIfExceptionOfType: A retry is triggered only when specific exceptions occur. For example, runtime exceptions such as NullPointerException and IllegalStateException, as well as custom errors.

• RetryIfResult: A retry is triggered only when a specified Callable method returns a value.

StopStrategy: Stop the retry policy. The following methods are provided:

StopAfterDelayStrategy	Set a maximum allowed execution time. For example, set a maximum execution time of 10s. Then, regardless of the number of task executions, as long as the retry time exceeds the maximum time, the task is terminated and a retry exception is returned.
NeverStopStrategy	It is used in situations where you need to perform round robin until the expected result is returned.
StopAfterAttemptStrategy	Set the maximum number of retries. If the maximum number of retries is exceeded, the retries are stopped and a retry exception is returned.
WaitStrategy	The wait interval strategy. It can control the time interval.
FixedWaitStrategy	The fixed wait interval strategy.
RandomWaitStrategy	The random wait interval strategy. You can provide a minimum and maximum interval, and the wait interval is a random value within this range.
IncrementingWaitStrategy	The incremental wait interval strategy. You can provide an initial value and step size, and the wait interval increases as the number of retries increases.
ExponentialWaitStrategy	The exponential wait interval strategy.
FibonacciWaitStrategy	The wait interval strategy.
ExceptionWaitStrategy	The exception wait interval strategy.
CompositeWaitStrategy	The composite wait interval strategy.

Advantage

The Guava Retryer tool is similar to Spring Retry in that it wraps normal retry logic by defining the role of the retryer. However, Guava Retryer has a more advanced strategy definition. It not only supports setting the number of retries and retry frequency control, but also allows the definition of multiple exceptions or custom objects as retry sources, providing more flexibility. This makes Guava Retryer suitable for a wider range of business scenarios, such as network requests and database access. Additionally, Guava Retryer is highly extensible and can easily be integrated with other Guava libraries.

3. Commonalities and Principles of Elegant Retries

Both Spring Retry and Guava Retry are thread-safe retry tools that support retry logic in concurrent business scenarios and ensure the correctness of retries. These tools support the retry wait interval, differentiated retry strategy, and retry timeout, which further enhance the effectiveness of retries and the stability of the process.

Furthermore, both Spring Retry and Guava Retryer utilize the command design pattern to delegate the retry object and complete the corresponding logical operation. They both internally encapsulate the retry logic. This design pattern makes it easy to extend and modify the retry logic, while also improving code reusability.

4. Summary

In certain functional logics, there are scenarios with unstable dependencies. In such cases, the retry mechanism is needed to obtain the desired result or attempt to re-execute the logic instead of immediately terminating it. For example, the retry mechanism can be used in scenarios such as remote interface access, data loading access, and data upload verification.

Different exception scenarios require different retry methods. It is also important to decouple the normal logic from the retry logic. When setting up the retry strategy, various factors need to be considered according to the situation. For instance, when is the appropriate time to retry? Should it be done synchronously with blocking or asynchronously with a delay? Does it have the ability to fail fast with one click? Additionally, the impact on user experience when failure occurs without retry should be carefully considered. When setting the timeout, retry strategy, retry scenarios, and retry times, it is crucial to take these factors into account.

This article only covers a small part of the retry mechanism. In actual applications, an appropriate failure retry scheme should be adopted based on the specific situation.

References (In Chinese)

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

Community

Retry Mechanism in Java Fault-tolerant Programming

1. The Importance of Retry

2. How to Retry

2.1 Simple Method

Test demo:

2.2 Dynamic Proxy Mode

Usage

Test Demo

2.3 Generating Retry Proxy Using Bytecode Technology

Usage

Test Demo

2.4 Retrying on HSF Call Timeout

Principle of Retry on HSFConsumer Timeout

Source Code Analysis

Defects

2.5 Spring Retry

POM dependencies

Enable @Retryable

Add the @Retryable annotation to the service implementation class

Principle

Defects

2.6 Guava Retrying

POM Dependencies

Test Demo

Advantage

3. Commonalities and Principles of Elegant Retries

4. Summary

References (In Chinese)

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

ApsaraDB for ClickHouse