All Products
Search
Document Center

Microservices Engine:Implementing graceful shutdown based on MSE XXL-JOB

Last Updated:Apr 23, 2025

MSE XXL-JOB adds graceful shutdown functionality while maintaining compatibility with the open-source version. When an application shuts down, it first notifies the scheduling server to stop dispatching new tasks, waits for existing tasks to complete, and then safely shuts down the application, ensuring business continuity during restarts. This topic describes how to enable the graceful shutdown feature based on Alibaba Cloud Task Scheduling XXL-JOB to help you handle actual business restart or shutdown scenarios.

Overview of scheduled task graceful shutdown

In actual business scenarios, scheduled tasks within application processes continuously run at fixed intervals. When an application restarts during deployment, running scheduled tasks are forcibly interrupted, which may lead to incomplete data and a sharp decrease in scheduling success rates, ultimately resulting in business data damage. The following issues may occur:

  • Task execution interruption: When a task is running and the application process shuts down, business processing is interrupted, which may lead to incomplete business data.

  • Task scheduling failure: During the restart process, the scheduler distributes tasks to nodes that are shutting down, causing scheduling failures that affect overall processing efficiency.

Therefore, in scenarios where scheduled task scheduling is used, graceful shutdown of scheduled tasks is necessary to ensure smooth business operation during rolling deployments and restarts.

Implementing graceful shutdown based on open-source XXL-JOB

Open-source XXL-JOB execution principles and graceful shutdown issues

Currently, in the continuous task scheduling process of open-source XXL-Job, the task execution side cannot effectively implement graceful shutdown. Therefore, to implement graceful shutdown using the open-source version, custom modifications are required. Before modifying open-source XXL-JOB for graceful shutdown, you can first analyze the overall process of XXL-JOB task distribution and task execution. The entire process involves two modules: XXL-JOB Admin and XXL-Job Executor.

The following explains the main interaction and execution process of XXL-Job, highlighting the logical points that affect graceful shutdown to serve as a reference for custom modifications. Currently, the following issues exist:

Issue 1: Delayed traffic removal from offline nodes

When an executor goes offline, it cannot update the scheduling and execution machine list in time, resulting in scheduling failures when tasks are scheduled to offline nodes.

Logic process 1: Executor registration
  • After a business application starts with the XXL-JOB SDK dependency, it initializes an ExecutorRegistryThread thread that continuously reports heartbeats to the scheduling center.

  • Upon receiving the heartbeats, the scheduling center uses JobRegistryHelper to write the registered executor information to the database table xxl_job_registry.

  • There is a thread in JobRegistryHelper that periodically queries and updates the address_list (the list actually used) in the xxl_job_group table.

image

image

Logic process 2: selecting online execution machines
  • After a task is triggered by the scheduling thread, it is handed over to XxlJobTrigger to complete the triggering action.

  • Before triggering execution, it reads the available executor list from the address_list in the xxl_job_group table.

  • Through ExecutorRouter, it selects a machine from the above machine list according to the corresponding routing policy.

  • After selecting a machine, it distributes the task to the corresponding IP node through an RPC request. If an offline node is selected, the task triggering will fail.

image

Summary: Due to the asynchronous periodic updates between the executor registration logic and the trigger task execution list data retrieval, there is a delay in refreshing the available online machine list.

Issue 2: Forced interruption of task execution

Currently, when XXL-Job Executor exits, it directly interrupts the task running thread and treats it as a failure. Additionally, all execution requests waiting in the queue are discarded and treated as failures.

Logic process 1: Distributing tasks to corresponding machines for execution
  • After the business application executor receives the corresponding task, it creates a JobThread thread for each task based on the task ID for task execution.

  • This task execution request is added to the current task thread's execution queue, and different blocking strategies will have different handling methods.

  • The JobThread continuously reads trigger records from the queue and executes the corresponding JobHandler to complete business logic processing.

  • After the task execution is completed, the execution result information is submitted to the execution response queue of TriggerCallbackThread, and then it continues with the next execution.

  • When the executor stops, it executes the XxlJobExecutor.destroy method, which interrupts running threads and clears scheduling requests waiting in the queue.

Logic process 2: Task execution result feedback
  • TriggerCallbackThread continuously runs and loads the current execution result queue, batch distributing execution results to the scheduling center.

  • If sending response results to the scheduling center fails, they are written to a local file for retry.

  • After receiving the execution results, the scheduling center updates the execution records in the database.

image

Summary: In the above shutdown process, removeJobThread directly interrupts running task threads, and tasks waiting in the thread queue are directly ignored and treated as failures.

Implementing graceful shutdown for open-source XXL-JOB

Based on the analysis of open-source XXL-JOB execution principles, we can implement graceful shutdown functionality based on the open-source code. The three core steps of application graceful shutdown are first removing traffic, then waiting for running business to complete, and finally shutting down.

The com.xxl.job.core.executor.XxlJobExecutor#destroy method in the XXL-JOB Core module is automatically called back when the application process exits in SpringBoot mode, which includes some offline recycling actions for the application executor. However, the current related logic processing cannot fully implement the graceful shutdown functionality. Therefore, the following modification steps are needed based on the above analysis:

Step 1: Remove traffic from application nodes

  • First, in the XxlJobExecutor#destroy method, there is a stopEmbedServer() method that stops heartbeat registration and sends a registryRemove request to the scheduling center to remove the current execution node.

  • After receiving the request, the scheduling server removes the current node from the xxl_job_registry table in the database. However, as explained in the principle analysis, the actual list used is the address_list in the xxl_job_group table (which is not synchronized and updated), so the traffic removal action is not truly completed at this point.

  • The scheduling server needs to be modified to implement traffic removal. Modification points to choose from (choose one):

    • Add subsequent processing to the JobRegistryHelper.registryRemove method to directly refresh the address_list in the xxl_job_group table, or implement refresh logic in freshGroupRegistryInfo.

    • Modify the XxlJobTrigger#trigger() method to adjust how addressList is read from group. For automatic registration, read the address list directly from the xxl_job_registry table.

After completing the above modifications, the first step of traffic removal is completed.

Step 2: Wait for running business to complete

  • Modify the next step in the XxlJobExecutor#destroy method to wait for all pending tasks to complete. Refer to the following code:

    public void destroy(){
    
        // destroy executor-server
        stopEmbedServer();
    
        // destroy jobThreadRepository
        if (jobThreadRepository.size() > 0) {
            List keyList = new ArrayList(jobThreadRepository.keySet());
            for (int i=0; i < keyList.size(); i++) {
                JobThread jobThread = jobThreadRepository.get(keyList.get(i));
                // Wait for all task queues to complete execution
                while (jobThread != null && jobThread.isRunningOrHasQueue()) {
                    try {
                        TimeUnit.SECONDS.sleep(1L);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        }
        jobHandlerRepository.clear();
    
        // destroy JobLogFileCleanThread
        JobLogFileCleanThread.getInstance().toStop();
    
        // destroy TriggerCallbackThread
        TriggerCallbackThread.getInstance().toStop();
    
    }
  • Wait for all execution results in the response queue to be sent. The current open-source implementation of the TriggerCallbackThread.getInstance().toStop() method processes the results one last time after interrupting the response result thread, so no additional processing is needed.

At this point, waiting for running tasks to complete is finished. Additionally, you can customize the shutdown process for different task types as needed.

Step 3: Application process shutdown

  • It is recommended to trigger the above JVM Hook callback in the deployment script by using kill -15 to stop the application, with timeout forced shutdown control as needed for business requirements.

  • You can integrate task scheduling graceful shutdown with SpringBoot Actuator functionality and shut down the application through the /actuator/shutdown interface.

Prerequisites

  • The engine version is 2.1.0 or later. For more information, see XXL-JOB engine version.

  • The dependency related to the SchedulerX plugin package is added to the pom.xml file of the client. For more information, see XXL-JOB plugin version.

    <dependency>
      <groupId>com.aliyun.schedulerx</groupId>
      <artifactId>schedulerx3-plugin-xxljob</artifactId>
      <version>Latest version</version>
    </dependency>

Procedure

Configure and enable a complete graceful shutdown solution in different business forms and deployment scenarios. The procedure consists of two steps:

Step 1: Perform initial integration with executors by using different frameworks

Different deployment modes of business applications require different methods for initial integration.

Mode 1: SpringBoot business application (recommended)

If you integrate a business application with XXL-Job executors by using Spring Boot, the system can automatically perform initial integration for graceful shutdown. Perform the following steps:

  1. Add the SchedulerX plugin Maven dependency. For more information, see XXL-JOB plugin version.

    <dependency>
      <groupId>com.aliyun.schedulerx</groupId>
      <artifactId>schedulerx3-plugin-xxljob</artifactId>
      <version>Latest version</version>
    </dependency>
  2. Add the application configuration parameter and enable graceful shutdown. For detailed parameter descriptions, see Configuration parameter description.

    # Configure graceful shutdown
    xxl.job.executor.shutdownMode=WAIT_ALL

Mode 2: Spring business application

If your business application is a web application started through the Spring framework, in addition to adding POM dependency and application startup parameters (refer to Mode 1: SpringBoot business application (recommended)), you also need to initialize the configuration XxlJobExecutorEnhancerInitializer by adding the following configuration to web.xml:

<web-app>
  <context-param>
        <!-- Spring ApplicationContextInitializer is used to improve the capabilities of XXL-JOB executors. -->
        <param-name>globalInitializerClasses</param-name>
        <param-value>com.aliyun.schedulerx.xxljob.enhance.XxlJobExecutorEnhancerInitializer</param-value>
    </context-param>
</web-app>

Mode 3: Frameless Java business application

If you integrate a business application with XXL-JOB executors in the frameless mode, as mentioned in the use cases of XXL-JOB executors, when you start the business application in pure Java, you can perform initial integration for graceful shutdown by using custom code. First, the business application still needs to add POM dependency and application startup parameters (refer to Mode 1: SpringBoot business application (recommended)). The following is a reference example:

  • Before starting the Executor, add: EnhancerLoader.load(xxlJobProp) to load the enhancement functionality.

  • Before starting the Executor, add: Runtime.getRuntime().addShutdownHook(...) to add a shutdown Hook implementation for the current application.

Sample code

public static void main(String[] args) {
    try {
        // load executor prop
        Properties xxlJobProp = FrameLessXxlJobConfig.loadProperties("xxl-job-executor.properties");

        // Load the enhancement features for XXL-job executors during the initial integration.
        EnhancerLoader.load(xxlJobProp);

        // start xxl-job executor
        FrameLessXxlJobConfig.getInstance().initXxlJobExecutor(xxlJobProp);

        // Add a shutdown hook for graceful shutdown.
        Runtime.getRuntime().addShutdownHook(new Thread(){
            @Override
            public void run() {
                FrameLessXxlJobConfig.getInstance().destroyXxlJobExecutor();
            }
        });
        // Blocks until interrupted
        while (true) {
            try {
                TimeUnit.HOURS.sleep(1);
            } catch (InterruptedException e) {
                break;
            }
        }
    } catch (Exception e) {
        logger.error(e.getMessage(), e);
    } finally {
        // destroy
        FrameLessXxlJobConfig.getInstance().destroyXxlJobExecutor();
    }
}

Step 2: Application shutdown processing

Self-built deployment, shutdown using kill -15

In a self-built CD process, there is usually an application process stop node. This node can build a stop.sh script for application process termination. The script content must contain the logic for the graceful shutdown of the application. The following code shows the sample script content for application process shutdown.

Sample script content for application process shutdown:

# The process ID information will be written to the app.pid file after the application is successfully started
PID="{Application deployment path}/app.pid"
FORCE=1
if [ -f ${PID} ]; then
  TARGET_PID=`cat ${PID}`
  kill -15 ${TARGET_PID}
  loop=1
  while(( $loop<=5 ))
  do
    ## Use a health check to confirm that the current application process is terminated. The logic can be customized based on the application characteristics.
    health
    if [ $? == 0 ]; then
      echo "check $loop times, current app has not stop yet."
      sleep 5s
      let "loop++"
    else
      FORCE=0
      break
    fi
  done
  if [ $FORCE -eq 1 ]; then
  	echo "App(pid:${TARGET_PID}) stop timeout, forced termination."
    kill -9 ${TARGET_PID}
  if
  rm -rf ${PID}
  echo "App(pid:${TARGET_PID}) stopped successful."
fi

K8s containerized deployment, shutdown using PreStop

The lifecycle management feature of Kubernetes pods can automatically implement graceful shutdown. You can also use the preStop hook to implement the graceful shutdown logic by running exec to execute scripts and using HTTP requests.

  • Invalid modification: If the application process is the main process PID 1 in a container, the system automatically sends a SIGTERM signal to the main process to gracefully shut down the application.

  • Custom preStop: If there are complex multi-process relationships within the container, you can configure a preStop script to customize the application process termination using kill -15 PID or by calling a pre-set stop.sh script to terminate the application process.

Important

The terminationGracePeriodSeconds parameter of the Pod controls the maximum waiting time for graceful shutdown (default is 30s). Configure it reasonably according to business needs.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: my-app-image:latest
        lifecycle:
          preStop:
            exec:
              # command: ["/bin/sh", "-c", "kill -15 PID && sleep 30"]
              command: ["/bin/sh", "-c", "script path/stop.sh"]

Automatic integration with Alibaba Cloud application release platform

Stay tuned.

Configuration parameter description

You must configure the following parameter to enable graceful shutdown for your application. This parameter provides two modes for you to implement graceful shutdown.

# Graceful shutdown mode, WAIT_ALL: wait for all; WAIT_RUNNING: wait for running.
# If you do not configure this parameter, the original logic of XXL-JOB is applied so that graceful shutdown is disabled by default.
xxl.job.executor.shutdownMode=WAIT_ALL

Shutdown mode

Description

Wait for all (WAIT_ALL)

In this mode, an application exits only after all jobs, including running jobs and jobs in queue, are complete. This mode is recommended.

Wait for running (WAIT_RUNNING)

In this mode, an application exits after running jobs to which threads are allocated in the application are complete. Jobs in queue are dropped.