All Products
Search
Document Center

SchedulerX:FAQ about job management

Last Updated:Jan 23, 2024

This topic provides answers to some commonly asked questions about job management in SchedulerX.

What do I do if a job fails and the error message "submit jobInstanceId to worker timeout" is returned?

If this issue occurs during application release or occurs occasionally, you can ignore it.

If this issue persists and the workAddr parameter in the reported errors is the same, the persistent connection between the server and the agent is disconnected. You need to restart the worker node or upgrade the SchedulerX agent to the latest version. Then, the persistent connection automatically recovers.

What do I do if a job fails and the error message "used space beyond 90.0%!" is returned?

The disk is full. You need to clear the disk space of the Elastic Compute Service (ECS) instance or container.

What do I do if a job fails and the error message "ClassNotFoundException" is returned?

This error message indicates that the worker that runs the job does not have this class. Make sure that the value of the Processor class name parameter configured for the Java job is the full path of the class.

1

If the JobProcessor class name is correct, the class does not exist on the worker. In this case, the common cause is that the used package is incorrect, or the application is connected to another worker. You can log on to the worker and use Java decompilation to view the details.

What do I do if a job fails and the error message "jobInstance=xxx don't update progress more than 60s" is returned?

If a worker that is running a job fails to function normally and does not update the progress for 60s, the worker is forcibly terminated by the server. If the issue is caused by the worker or the worker does not exist, you can ignore the issue.

What do I do if a job fails and no error message is returned?

Problem description

A job fails to run, and no error message is returned.

Cause

The possible causes include worker or business logic failures.

Solution

  • On the Execution List page, find the desired job and click Details in the Operation column. In the Task instance details panel, view the details of the task and check the failed worker.

    If no task details are available, the job is a simple job. The value of the WorkAddr parameter on the Basic information tab indicates the worker that runs the job.

  • Log on to the worker and open the ~/logs/schedulerx/worker.log file.

    Run grep <instance ID> worker.log to view logs related to the instance. If an error occurs, view the stack cause of the error.

  • If the error description is empty, this issue is caused by a business logic failure, and no error message is returned. In this case, troubleshoot the business logic.
  • If the error description indicates a framework exception, Join the DingTalk group (ID: 23103656) to contact SchedulerX technical support.

How do I identify the cause of a job failure?

  • If an exception is reported for a standalone job, open the Execution List page and click the Task instance List tab. Then, find the job and click Details in the Operation column to view the error message.
  • If no exception is reported or the job is a distributed job, use Log Service for troubleshooting if you use SchedulerX Professional Edition.
  • In Basic Edition, you can log on to the worker node and view the logs of SchedulerX and business logs for troubleshooting.

What do I do if a job is stuck during running?

Problem description

The job remains in the running state and cannot be completed.

Cause

  • A business issue may occur.
  • A SchedulerX issue may occur.

Solution

For a business issue, perform the following operations. For other issues, Join the DingTalk group (ID: 23103656) to contact SchedulerX technical support.

  • In Professional Edition, you can use the View Stack feature in the console to query the stack and troubleshoot the job exception. This feature is available in agents in version 1.4.2 or later.
  • In Basic Edition, you can log on to the worker node where the job running is stuck and run the jstack command to view the stack.
    jstack <pid> | grep <job instance ID> -A 20
    6

How do I troubleshoot slow job running?

Use Professional Edition and enable the Tracing Analysis feature. For more information, see Integrate tracing analysis.

What do I do if the number of instances that run a job reaches the upper limit?

Problem description

When I click Run once in the Operation column of a job on the Task Management page, the system returns a message indicating that the number of instances that run the job has reached the upper limit.

Cause

  • The job is already running on job instances.
  • The number of job instances that are running the job has reached the value of the Instance concurrency parameter configured for the job.

Solution

If the configured value of the Instance concurrency parameter is proper, you can ignore this issue. You can open the Task Management page, find the desired job, click the more icon in the Operation column, and click Historical records to view the job instances that are running.

If the configured value of the Instance concurrency parameter is improper, open the Task Management page, find the desired job, and click Edit in the Operation column. Then, set the Instance concurrency parameter in Advanced Configuration.

If a job does not complete as scheduled, will the job be queued to run next time or not?

The default value of the Instance concurrency parameter is 1, which indicates that a job is run on only one instance at a time. If the running of a job takes a long time and does not complete as scheduled, the job will be directly discarded at the next scheduling time and will not be run or queued.

If the value of the Instance concurrency parameter is set to 2, the running of a job that does not complete as scheduled can continue next time. This indicates that up to two job instances can run the job.

How do I create a one-time job?

SchedulerX 2.0 allows you to create one-time jobs by setting the Time type parameter to one_time. No execution records are kept for one-time jobs.

How do I view the historical records of a one-time job after it is run?

After a one-time job is run, the job is automatically destroyed and no historical records are kept. This prevents data accumulation. If you want to save historical records, you can enable Log Service to retain the execution logs of all jobs in the last two weeks for troubleshooting. For more information about how to enable Log Service, see Application management.

How do I perform second-delay scheduling?

SchedulerX supports second-delay scheduling. If the Time type parameter of a job is set to cron or fix_rate, the job does not support second-delay scheduling. You can set Time type to second_delay for a job. Then, the job is run at an interval of the delay after the last run.

What do I do if a job is not scheduled at a point in time?

If a standalone job is not scheduled at a point in time, check whether workers exist in the worker list and whether all workers are busy. If no worker is available, troubleshoot by performing the operations in What do I do if the system displays the message "no workers available"?. If all workers are busy, troubleshoot by performing the operations in What do I do if all workers are busy?.

We recommend that you turn on the No machine alarm available switch. For more information, see Job management.

How do I set the timeout period in SchedulerX?

SchedulerX allows you to set the timeout period for jobs but not for tasks. You can change the timeout period in the console. For more information, see Job management.

Why is a job still run after the instance is stopped?

Problem description

A job is still run after the instance is stopped.

Cause

After a job instance is stopped, SchedulerX sends a Kill message to the agent. After receiving the Kill message, the agent stops delivering and running tasks that are not executed, destroys the context of the instance, and destroys all thread pools of the instance. However, the tasks that are being executed are not stopped, and only the corresponding threads are interrupted. Therefore, the tasks will continue to run until they are complete.

Solution

  • In most cases, you can wait until the tasks are completed and do not need to handle the issue.
  • If you want to stop all running tasks immediately after you stop an instance, you need to modify the task processing logic and add the Interrupt state to the current thread.

How do I set advanced configurations for job management?

For more information, see Advanced parameters for job management.

What do I do if all workers are busy?

On the Application Management page, find the desired application and click View instances in the Operation column. Then, you can identify workers in the Busy state, and move the pointer over Busy to view the metrics that exceed the threshold.

To configure the thresholds, open the Application Management page, find the desired application, click Edit in the Operation column, and set the parameters in the Edit application grouping panel.

If a worker is busy because the load exceeds the specified threshold, check whether the application is deployed in a Kubernetes container. If you deploy the application in a Kubernetes container, configure the following parameters. Otherwise, the collected CPU utilization may be incorrect. For more information, see Connect a Spring Boot application to SchedulerX.

keyDescriptionValueInitial version
spring.schedulerx2.enableCgroupMetricsSpecifies whether to use control groups (cgroups) to collect the metrics of the agent. You must manually enable this feature in a Kubernetes environment. Valid values: true and false. Default value: false. 1.2.2.2
spring.schedulerx2.cgroupPathPrefixSpecifies the cgroup path in the container. The default path is /sys/fs/cgroup/cpu/. If the path already exists, you do not need to configure this parameter. 1.2.2.2

How do I integrate Tracing Analysis?

SchedulerX supports the end-to-end tracing analysis feature. For more information, see Integrate tracing analysis.

What do I do when the running of a job gets stuck or slows down during application release?

Problem description

During application release, the running of a job is stuck or slows down.

Cause

For a distributed job, if a worker that processes a task goes offline, SchedulerX redistributes the task and performs polling to identify online workers. This slows down the entire process.

Solution

This issue can be optimized if you update the agent to version 1.7.9 or later.

What do I do when the system asks me to enter instance parameters after I click Run once?

When you click Run once in the Operation column of a scheduling job on the Task Management page, the job is run once. The Instance Parameters field in the displayed dialog box is optional and is used for testing.

Does the code obtain instance parameters or job parameters when I click Run Once and enter instance parameters?

Instance parameters are different from job parameters. The parameters that are obtained in the code are determined by your business code.

How do I obtain task parameters or instance parameters?

The following sample code provides examples on how to obtain task parameters and instance parameters:

@Component
public class JavaDemoProcessor extends JavaProcessor {


    private static final Logger LOGGER = LoggerFactory.getLogger("schedulerxLog");


    @Override
    public ProcessResult process(JobContext jobContext) throws InterruptedException {

        LOGGER.info(JSON.toJSONString(jobContext));
        // Obtain job parameters.
        String jobParameters = jobContext.getJobParameters();
        // Obtain instance parameters.
        String instanceParameters = jobContext.getInstanceParameters();
        LOGGER.info("Job parameters:" + jobParameters);
        LOGGER.info("Instance parameters" + instanceParameters);
        return new ProcessResult(InstanceStatus.SUCCESS);
    }

}

How do I allow the code to obtain instance parameters by default if I specify instance parameters and obtain job parameters if I do not specify instance parameters?

The following sample code provides an example on the configuration:

@Component
public class JavaDemoProcessor extends JavaProcessor {

    private static final Logger LOGGER = LoggerFactory.getLogger("schedulerxLog");

    @Override
    public ProcessResult process(JobContext jobContext) throws InterruptedException {
        String params = null;
        if (StringUtils.isNotBlank(jobContext.getInstanceParameters())) {
            params = jobContext.getInstanceParameters();
        } else {
            params = jobContext.getJobParameters();
        }
        LOGGER.info("JavaDemoProcessor params:{}", params);
        return new ProcessResult(InstanceStatus.SUCCESS);
    }

}