All Products
Search
Document Center

Microservices Engine:Implement graceful shutdown using MSE XXL-JOB

Last Updated:Mar 11, 2026

During rolling deployments and application restarts, running XXL-JOB tasks can be interrupted mid-execution, causing incomplete business data and scheduling failures. Microservices Engine (MSE) XXL-JOB provides built-in graceful shutdown through the SchedulerX plug-in: it drains traffic from an executor, waits for running jobs to finish, and then stops the process. No modifications to the open-source XXL-JOB server code are required.

How it works

When graceful shutdown is enabled, MSE XXL-JOB follows this sequence before stopping an executor:

  1. Deregister the executor -- The executor sends a registryRemove request to the MSE XXL-JOB server and immediately removes itself from the active executor list. The server updates address_list in the xxl_job_group table in real time, so no new jobs are dispatched to this node.

  2. Wait for running jobs -- Depending on the shutdown mode, the executor waits for all running jobs (and optionally queued jobs in the JobThread queue) to complete.

  3. Report execution results -- The TriggerCallbackThread flushes all pending execution results back to the server.

  4. Stop the process -- The JVM shutdown hook completes and the application process exits.

Why open-source XXL-JOB needs enhancement

Open-source XXL-JOB has two issues that prevent clean shutdowns:

IssueRoot causeImpact
Jobs dispatched to offline nodesThe address_list in the xxl_job_group table is updated on a periodic schedule by JobRegistryHelper, not in real time. After an executor deregisters from xxl_job_registry, the trigger still reads the stale address_list.Scheduling failures during deployments.
Running jobs force-killedWhen an executor stops, XxlJobExecutor#destroy calls removeJobThread, which immediately interrupts the JobThread and discards all queued requests.Incomplete data and failed job records.

How open-source XXL-JOB processes jobs internally

The job distribution and execution process in open-source XXL-JOB involves two modules: XXL-JOB Admin and XXL-Job Executor.

Executor registration:

  1. After the XXL-JOB SDK starts, the business application initializes the ExecutorRegistryThread thread, which continuously sends heartbeat messages to XXL-JOB Admin.

  2. Upon receipt of heartbeat messages, XXL-JOB Admin writes executor information to the xxl_job_registry database table through JobRegistryHelper.

  3. A thread in JobRegistryHelper periodically queries and updates the address_list field in the xxl_job_group table, which provides a list of registered executors.

imageimage

Online executor selection:

  1. After a scheduling thread triggers a job, XxlJobTrigger runs the job.

  2. Before running the job, XxlJobTrigger reads a list of executors from the address_list field in the xxl_job_group table.

  3. ExecutorRouter selects an executor from the list based on the specified routing policy.

  4. XxlJobTrigger sends an RPC request to distribute the job to the selected node. If the selected node is offline, the job fails to be triggered.

image

Job execution and result feedback:

  1. After the executor receives a job request, it creates a JobThread thread for each job based on the job ID.

  2. When a job request is triggered, it is added to the queue for the current JobThread thread to process. Different jobs have different blocking policies.

  3. The JobThread thread continuously reads triggering results in the queue and executes the corresponding JobHandler to complete business logic processing.

  4. After a job ends, the JobThread thread submits the execution information to the execution response queue of TriggerCallbackThread and proceeds to the next job.

  5. When the executor stops, it executes the XxlJobExecutor.destroy method to stop the JobThread thread and clears the queue of scheduling requests.

The TriggerCallbackThread thread continuously runs, loads the current queue of execution results, and distributes them to XXL-JOB Admin in batches. If it fails to send results, it stores them to local disks and retries later.

image

Implement graceful shutdown with open-source XXL-JOB

If you use open-source XXL-JOB without the MSE SchedulerX plug-in, you can implement graceful shutdown manually by following three steps: remove traffic, wait for jobs in the queue to complete, and then shut down the application.

The com.xxl.job.core.executor.XxlJobExecutor#destroy method automatically performs a callback when an application process exits in Spring Boot mode. However, the default logic does not fully implement graceful shutdown. The following modifications are required.

Step 1: Remove traffic from application nodes

The stopEmbedServer() method in XxlJobExecutor#destroy stops the heartbeat registration mechanism and sends the registryRemove request to XXL-JOB Admin to remove the current executor. However, the address_list field in the xxl_job_group table is not synchronized in real time, so traffic is not effectively removed.

To fix this, modify the XXL-JOB Admin server using one of the following methods:

  • Add processing logic to the JobRegistryHelper.registryRemove method to update the address_list field in the xxl_job_group table. You can also implement the update logic in the freshGroupRegistryInfo method.

  • Modify the XxlJobTrigger#trigger() method to read address_list directly from the xxl_job_registry table during the automatic registration process.

Step 2: Wait for jobs in the queue to complete

Modify the XxlJobExecutor#destroy method to wait for all jobs in the queue to complete:

public void destroy(){

    // destroy executor-server
    stopEmbedServer();

    // destroy jobThreadRepository
    if (jobThreadRepository.size() > 0) {
        List keyList = new ArrayList(jobThreadRepository.keySet());
        for (int i=0; i < keyList.size(); i++) {
            JobThread jobThread = jobThreadRepository.get(keyList.get(i));
            // Wait for all jobs in the queue to complete.
            while (jobThread != null && jobThread.isRunningOrHasQueue()) {
                try {
                    TimeUnit.SECONDS.sleep(1L);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }
    }
    jobHandlerRepository.clear();

    // destroy JobLogFileCleanThread
    JobLogFileCleanThread.getInstance().toStop();

    // destroy TriggerCallbackThread
    TriggerCallbackThread.getInstance().toStop();

}

The TriggerCallbackThread.getInstance().toStop() method synchronizes the execution results after the TriggerCallbackThread thread is stopped, so no additional processing is required for result feedback.

Step 3: Stop application processes

To stop application processes, use kill -15 in the application deployment script to trigger a JVM shutdown hook. You can also forcefully stop application processes upon timeouts based on your business requirements. Alternatively, you can integrate the graceful shutdown feature by using the Spring Boot Actuator /actuator/shutdown endpoint.

Prerequisites

Before you begin, make sure that you have:

Add the following dependency to your pom.xml:

<dependency>
  <groupId>com.aliyun.schedulerx</groupId>
  <artifactId>schedulerx3-plugin-xxljob</artifactId>
  <version>Latest version</version>
</dependency>

Integrate the plug-in with your executor

Choose the integration method that matches your application framework.

Spring Boot (recommended)

The plug-in registers graceful shutdown automatically through Spring Boot auto-configuration. No additional code is needed.

Add the shutdown mode to your application.properties:

xxl.job.executor.shutdownMode=WAIT_ALL

Or in application.yml:

xxl:
  job:
    executor:
      shutdownMode: WAIT_ALL

Spring (non-Boot)

For Spring web applications, add the Maven dependency and application.properties configuration from the Spring Boot section above, then register the plug-in initializer in your web.xml:

<web-app>
  <context-param>
    <param-name>globalInitializerClasses</param-name>
    <param-value>com.aliyun.schedulerx.xxljob.enhance.XxlJobExecutorEnhancerInitializer</param-value>
  </context-param>
</web-app>

No framework (plain Java)

For applications that start executors without a framework, load the plug-in enhancements manually and register a JVM shutdown hook.

Sample code

public static void main(String[] args) {
    try {
        // Load executor configuration
        Properties xxlJobProp = FrameLessXxlJobConfig.loadProperties("xxl-job-executor.properties");

        // Load the SchedulerX graceful shutdown enhancements
        EnhancerLoader.load(xxlJobProp);

        // Start the executor
        FrameLessXxlJobConfig.getInstance().initXxlJobExecutor(xxlJobProp);

        // Register a shutdown hook so kill -15 triggers graceful shutdown
        Runtime.getRuntime().addShutdownHook(new Thread(() ->
            FrameLessXxlJobConfig.getInstance().destroyXxlJobExecutor()
        ));

        // Block the main thread
        while (true) {
            try {
                TimeUnit.HOURS.sleep(1);
            } catch (InterruptedException e) {
                break;
            }
        }
    } catch (Exception e) {
        logger.error(e.getMessage(), e);
    } finally {
        FrameLessXxlJobConfig.getInstance().destroyXxlJobExecutor();
    }
}

Make sure xxl-job-executor.properties includes the shutdown mode:

xxl.job.executor.shutdownMode=WAIT_ALL

Stop the application gracefully

The shutdown mode only takes effect when the JVM receives a SIGTERM signal (equivalent to kill -15). Choose the method that matches your deployment environment.

Note

If your Spring Boot application has Spring Boot Actuator enabled, you can also trigger graceful shutdown through the /actuator/shutdown endpoint.

Self-managed CD pipeline

Create a stop.sh script that sends SIGTERM, waits for the application to exit, and falls back to SIGKILL on timeout:

Sample application shutdown script:

# Path to the PID file written at application startup
PID="{Application deployment path}/app.pid"
FORCE=1
if [ -f ${PID} ]; then
  TARGET_PID=`cat ${PID}`
  kill -15 ${TARGET_PID}
  loop=1
  while(( $loop<=5 ))
  do
    ## Replace with your own health check logic
    health
    if [ $?  == 0 ]; then
      echo "check $loop times, current app has not stop yet."
      sleep 5s
      let "loop++"
    else
      FORCE=0
      break
    fi
  done
  if [ $FORCE -eq 1 ]; then
    echo "App(pid:${TARGET_PID}) stop timeout, forced termination."
    kill -9 ${TARGET_PID}
  fi
  rm -rf ${PID}
  echo "App(pid:${TARGET_PID}) stopped successful."
fi

Kubernetes

Kubernetes sends SIGTERM to PID 1 in the container by default, which triggers the JVM shutdown hook automatically.

If the application is not PID 1 (for example, in a multi-process container), configure a preStop hook to send the signal explicitly:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      # Default: 30. Set this to a value longer than the maximum expected job execution time.
      terminationGracePeriodSeconds: 30
      containers:
      - name: my-app-container
        image: my-app-image:latest
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "kill -15 <app-pid> && sleep 30"]
Important

Set terminationGracePeriodSeconds to a value longer than the maximum expected job execution time. If the grace period expires before jobs finish, Kubernetes sends SIGKILL and force-terminates the pod, which defeats graceful shutdown. The default value is 30 seconds.

Alibaba Cloud application release platform

Automatic graceful shutdown integration for the Alibaba Cloud application release platform will be available soon.

Shutdown mode reference

ModeBehaviorWhen to use
WAIT_ALL (recommended)Waits for all running jobs and queued jobs to complete before exiting.Most production workloads. No job is lost.
WAIT_RUNNINGWaits for currently running jobs to complete. Queued jobs are dropped.Latency-sensitive deployments where fast restarts take priority over queued jobs.
Not configuredUses the default open-source XXL-JOB behavior. Running jobs are interrupted and queued jobs are discarded.Not recommended for production.

Configure the shutdown mode in your application properties:

# Graceful shutdown mode. Valid values: WAIT_ALL, WAIT_RUNNING.
# If not configured, the default open-source XXL-JOB behavior applies (no graceful shutdown).
xxl.job.executor.shutdownMode=WAIT_ALL

Or in application.yml:

xxl:
  job:
    executor:
      # Graceful shutdown mode. Valid values: WAIT_ALL, WAIT_RUNNING.
      shutdownMode: WAIT_ALL

See also