All Products
Search
Document Center

Microservices Engine:Job FAQ

Last Updated:May 08, 2026

This document explains how to troubleshoot common job management issues in SchedulerX.

Troubleshooting missing Spring beans

  1. In application management, connect to the worker and verify that the startup mode is Spring or Spring Boot.

  2. Inject the JobProcessor as a bean, for example, by adding the @Component annotation.

  3. Check your pom dependencies. If you have a dependency on spring-boot-devtools, you must exclude it.

  4. If the JobProcessor or process method uses AOP annotations, upgrade the SchedulerX agent to the latest version. Earlier versions do not support AOP.

  5. An extra proxy layer can cause a bean type mismatch. To diagnose this, place a breakpoint in the DefaultListableBeanFactory class. The beanDefinitionNames member variable lists all beans registered in Spring. Check this list to see if a bean is proxied by an aspect. This can occur if an incorrect third-party library is indirectly imported. If so, exclude the library.

If the preceding solutions do not resolve the issue, debug the ThreadContainer.start method. If a class.forName error occurs for a class that you know exists, it may be because your application uses a framework that causes classLoader inconsistencies. You can resolve this by setting SchedulerxWorker.setClassLoader.

Job fails with "Unable to make field private"

MapReduce uses serialization and deserialization frameworks. Starting from Java 9, reflective access to private variables must be enabled manually. Add the following parameter to your JVM arguments:

--add-opens java.base/java.lang=ALL-UNNAMED

Job fails with "submit jobInstanceId to worker timeout"

You can ignore this error if it occurs only during an application release or happens infrequently.

If the error persists on the same workerAddr, it indicates a disconnected persistent connection between the server and the agent. Restart the worker node or upgrade the SchedulerX agent. After the upgrade, the agent can automatically restore disconnected connections.

Job fails with "used space beyond 90.0%!"

The disk is full. You need to clear disk space on the ECS instance or container.

Job fails with "ClassNotFoundException"

This error indicates that the class was not found on the worker running the job. Ensure that the Class full path configured for the Java job is the fully qualified name of the class, not an abbreviation.

For example, a correct fully qualified path is com.aliyun.schedulerx.example.processor.DtsSimpleJob.

If the configured Class full path is correct, the class is missing on the worker. This is typically caused by deploying the wrong package or by the application connecting to another user's worker. You can log on to the worker and use a decompiler to investigate.

Job fails with "jobInstance=xxx don't update progress more than 60s"

The server forcibly terminates a job if its running worker stops or is being released and fails to report progress for more than 60 seconds. If you confirm that the issue was caused by the worker or that the worker no longer exists, no action is required.

Job fails without an error message

  • Symptom:

    A job fails, but no error message is displayed.

  • Possible cause:

    A worker failure or a failure in the business logic.

  • Solution:

    • On the Instances page, go to the Task instance List tab. Find the target job instance, click Actions in the Details column to open the Task instance details panel, and identify the worker where the job failed.

      This is the address of the worker that runs the job.

    • Log on to the worker and open the ~/logs/schedulerx/worker.log log file.

      Run grep <Instance ID> worker.log to view logs related to the instance. If there is an ERROR-level exception, check the stack trace for the specific cause.

    • If the error description is empty, the issue is likely within your business logic, which has not returned a failure message. You should investigate your business code to identify the cause.

    • If the error description contains a framework exception, join the DingTalk group (ID: 23103656) to contact SchedulerX technical support.

Troubleshooting job failures

  • For a standalone job that throws an exception, go to the Instances page, click the Task instance List tab, find the target job instance, and click Actions in the Details column to view the error message.

  • If the job does not throw an exception or if it is a distributed job, you can use Log Service to troubleshoot the issue with your Professional Edition application.

  • For a Basic Edition application, log on to the worker node and check the SchedulerX and business logs.

Job gets stuck

  • Symptom:

    A scheduled job is stuck in a running state and never finishes.

  • Possible cause:

    • An issue in your business logic.

    • An issue in SchedulerX.

  • Solution: You can troubleshoot business logic issues by following these steps. For other issues, join the DingTalk group (ID: 23103656) to contact SchedulerX technical support.

    • For Professional Edition applications: Use ThreadDump in the console to obtain a job's exception stack trace. This feature is available for agent versions 1.4.2 and later.

    • For Basic Edition applications: Log on to the stuck worker node and use the jstack command to view the stack trace. Run the following command:

      jstack <pid> | grep <job instance ID> -A 20
      $jstack 29191 |grep 58903617 -A 200
      "Schedulerx-Container-Thread-58903617-0" #4093 prio=5 os_prio=0 tid=xxx nid=xxx waiting on condition [0x00002ad443101000]
         java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)
      - parking to wait for  <0x000000072b837ae0> (a java.util.concurrent.CountDownLatch$Sync)
      at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
      at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
      at com.aliyun.ticket.importworker.WOImportJob.startImport(WOImportJob.java:353)
      at com.aliyun.ticket.importworker.WOImportJob.doImportForAll(WOImportJob.java:242)
      at com.aliyun.ticket.importworker.WOImportJob.process(WOImportJob.java:163)
      at com.alibaba.schedulerx.worker.container.ThreadContainer.start(ThreadContainer.java:90)
      at com.alibaba.schedulerx.worker.container.ThreadContainer.run(ThreadContainer.java:60)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:756)
      
      "Container-Batch-Statues-Retrieve-Thread-58903617" #4092 prio=5 os_prio=0 tid=xxx nid=xxx waiting on condition [0x00002ad44fe0e000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      at java.lang.Thread.sleep(Native Method)
      at com.alibaba.schedulerx.worker.batch.BaseReqHandler$2.run(BaseReqHandler.java:74)
      at java.lang.Thread.run(Thread.java:756)
      
      "TDDL-Druid-ConnectionPool-DestroyScheduler--2-thread-237" #4052 daemon prio=5 os_prio=0 tid=xxx nid=xxx waiting on condition [0x00002ad462e00]
         java.lang.Thread.State: TIMED_WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)
      - parking to wait for  <0x00000007436c7190> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
      at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:1129)
      at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:809)
      at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)

Troubleshooting slow jobs

Enable Tracing Analysis in the Professional Edition. For more information, see Integrate Tracing Analysis.

Job instance limit is reached

  • Symptom:

    On the Job Management page, clicking Run once displays the prompt: The number of running job instances has reached the upper limit. Please try again later.

  • Possible cause:

    • An instance of this job is already running.

    • The number of running job instances has reached the maximum concurrency configured for the job.

  • Solution:

    • If the concurrency setting is appropriate, no action is required. On the Job Management page, you can view the running job instances by clicking More > Historical records.

    • If this concurrency limit is too restrictive, click Actions in the Edit column for the job and increase the Instance concurrency in the advanced settings.

Unfinished jobs: Queued or skipped?

  • By default, instance concurrency is 1, meaning jobs run serially. If a long-running job has not finished by its next scheduled time, the new run is discarded, not queued.

  • If the instance concurrency is set to 2, and the previous run has not finished, another instance can still be started at the next scheduled time. A maximum of two job instances can run concurrently.

Create a one-time job

To create a one-time job, select one_time as the time type. SchedulerX 2.0 supports this feature, but these jobs do not retain execution records.

View the history of a one-time job

A one_time job is automatically destroyed after it runs to prevent data accumulation, and no history is retained. If you need to save execution records, you can enable Log Service, which keeps the execution logs of all jobs for the last two weeks to facilitate troubleshooting. For more information about how to enable Log Service, see Application Management.

Configure second-level scheduling

SchedulerX supports second-level scheduling. The cron and fix_rate time types do not support second-level scheduling. You can select the second_delay time type, which runs the job a specified number of seconds after the previous run is complete.

Job is not scheduled

If a standalone job is not scheduled at a specific time, check if there are any workers in the worker list and confirm that not all workers are busy. If no workers are available, troubleshoot based on the "no workers available" or "all workers are busy" state. For more information, see What to do if no workers are available? and What to do if all workers are busy?.

We recommend configuring an alarm for when no workers are available for a job. For more information, see Job Management.

Set a timeout in SchedulerX

SchedulerX supports timeouts for an entire job, but not for individual tasks. You can dynamically modify the timeout in the console. For more information, see Job Management.

Instance continues to run after being stopped

  • Symptom: An instance continues to run after it is stopped.

  • Possible cause: When a job instance is stopped, SchedulerX sends a kill message to the agent. The agent then stops dispatching new tasks and destroys the instance context and its thread pools. However, tasks already in progress are not forcibly stopped. Their threads are only interrupted, allowing them to run to completion.

  • Solution:

    • In most cases, no action is required. Wait for the tasks to finish.

    • If you need to immediately terminate all running tasks upon stopping the instance, you must modify your task processing logic to handle the interrupt state of the current thread.

Configure advanced job settings

For more information, see Advanced parameters for job management.

All workers are busy

On the application management page, view the instances to locate busy workers. Then, click Busy to see which metrics have exceeded the threshold.

The busy threshold is configured on the Application Management page by clicking Edit Application Group.

In the Instance Busy Configuration section of the Basic Configurations step, you can set thresholds for load5 (default: 0), Memory Usage (default: 90%), and Disk Usage (default: 95%). You can also use the Enable Busy Worker Check switch to control whether the busy check is enabled.

If a worker is busy due to high load, check if your application is deployed in a container (Kubernetes). If so, you need to configure the following two parameters. Otherwise, the collected CPU usage may be inaccurate. For more information, see Connect a Spring Boot application to SchedulerX.

Parameter

Description

Value

Initial version

spring.schedulerx2.enableCgroupMetrics

Specifies whether to use cgroup to collect metrics for agent instances. You must manually enable this in container (Kubernetes) environments.

true/false. Default: false.

1.2.2.2

spring.schedulerx2.cgroupPathPrefix

The cgroup path within the container.

Default: /sys/fs/cgroup/cpu/. You do not need to set this parameter if this path exists.

1.2.2.2

Integrate Tracing Analysis

Job scheduling supports end-to-end tracing analysis. For more information, see Integrate Tracing Analysis.

Job stuck or slow during application release

  • Symptom: Job execution is stuck or slow during an application release.

  • Possible cause: For distributed jobs, if a worker that is processing tasks goes offline, its tasks are redistributed, and the system polls to check if the worker is online. This process can slow down the overall execution.

  • Solution: Upgrade the agent to the latest version. Version 1.7.9 and later include optimizations for this issue.

Instance parameters for one-time runs

On the Job Management page, click Actions in the Run once column to execute a scheduled job one time. In the dialog box that appears, you can specify a worker and set Instance Parameters. This is an optional parameter used primarily for testing.

Instance parameters vs. job parameters

Instance parameters and job parameters are two different concepts. The parameters that your code retrieves are determined by your business logic.

Getting job or instance parameters

The following code shows how to get the parameters:

@Component
public class JavaDemoProcessor extends JavaProcessor {

    private static final Logger LOGGER = LoggerFactory.getLogger("schedulerxLog");

    @Override
    public ProcessResult process(JobContext jobContext) throws InterruptedException {

        LOGGER.info(JSON.toJSONString(jobContext));
        // Get job parameters
        String jobParameters = jobContext.getJobParameters();
        // Get instance parameters
        String instanceParameters = jobContext.getInstanceParameters();
        LOGGER.info("Job parameters: " + jobParameters);
        LOGGER.info("Instance parameters: " + instanceParameters);
        return new ProcessResult(InstanceStatus.SUCCESS);
    }

}

Retrieving instance parameters with a fallback to job parameters

The following code shows the details:

@Component
public class JavaDemoProcessor extends JavaProcessor {

    private static final Logger LOGGER = LoggerFactory.getLogger("schedulerxLog");

    @Override
    public ProcessResult process(JobContext jobContext) throws InterruptedException {
        String params = null;
        if (StringUtils.isNotBlank(jobContext.getInstanceParameters())) {
            params = jobContext.getInstanceParameters();
        } else {
            params = jobContext.getJobParameters();
        }
        LOGGER.info("JavaDemoProcessor params:{}", params);
        return new ProcessResult(InstanceStatus.SUCCESS);
    }

}