All Products
Search
Document Center

SchedulerX:Sharding models for different programming languages

Last Updated:Mar 21, 2024

You can use SchedulerX to schedule jobs. For example, you can trigger a job at a scheduled time, create a workflow to orchestrate jobs, or update the output of a job. SchedulerX also provides sharding models for Java, Python, Shell, and Go to support business that is related to big data computing.

Background information

Sharding models include the static sharding model and the dynamic sharding model.

  • Static sharding: This model is suitable for processing a fixed number of shards. For example, you can use this model if you want to process 1,024 tables in a table sharding job on multiple workers in distributed computing scenarios.

  • Dynamic sharding: This model is suitable for processing data with an unknown volume in distributed computing scenarios. For example, you can use this model if you have a large table that is continuously updated and you want to process the table in batches. The MapReduce model that is provided by SchedulerX is the mainstream framework, which is not open source.

Features

The sharding models have the following features:

  • The static sharding model is compatible with elastic-job.

  • The sharding models support Java, Python, Shell, and Go.

  • High availability: The sharding models are developed based on the MapReduce model and can also ensure high availability. If a worker encounters an error, the master worker automatically fails over the shards to another worker.

  • Traffic throttling: The sharding models are developed based on the MapReduce model and also support traffic throttling. This feature allows you to control the concurrency of tasks on a single worker. For example, you have 1,000 shards and 10 workers. You can configure each worker to run at most five shards in parallel. The remaining shards must wait in the queue.

  • Automatic resharding: The sharding models are developed based on the MapReduce model and also support automatic resharding. This feature is used to automatically rerun failed tasks.

You can configure the high availability and traffic throttling features in the advanced settings of a job when you create the job. For more information, see the Create a job section of the "Job management" topic and Advanced parameters for job management.

Note

Only agents of version 1.1.0 and later support the multi-language sharding models.

Create a Java sharding job

  1. Log on to the SchedulerX console.

  2. In the top navigation bar, select a region.

  3. In the left-side navigation pane, click Task Management.

  4. On the Jobs page, select a namespace in which you want to create a job from the Namespace to which the namespace belongs drop-down list and click Create task.

  5. In the Basic configuration step of the Create task wizard, set the Execution mode parameter to Shard run and specify Sharding parameters. Then, click Next Step.

    Separate multiple sharding parameters with commas (,) or specify only one sharding parameter in each line. Example: Shard index 1=Shard parameter 1,Shard index 2=Shard parameter 2,....

    image

  6. Inherit the JavaProcessor in the application code, use JobContext.getShardingId() to obtain a shard index, and use JobContext.getShardingParameter() to obtain the corresponding sharding parameter.

    Sample code:

    @Component
    public class HelloWorldProcessor extends JavaProcessor {
        @Override
        public ProcessResult process(JobContext context) throws Exception {
            System.out.println("Shard index=" + context.getShardingId() + ", Shard parameter=" + context.getShardingParameter());
            return new ProcessResult(true);
        }
    }
  7. On the Instances page, find the job that you want to view and click Details in the Operation column.

Create a Python sharding job

If you want to configure a Python application to perform distributed batch processing, you need to only install the SchedulerX agent. You can use SchedulerX to maintain scripts.

  1. Download the SchedulerX agent and use the agent to deploy a script job.

  2. Create a Python sharding job in SchedulerX. For more information, see the Create a job section of the "Job management" topic.

    sys.argv[1] indicates a shard index and sys.argv[2] indicates a sharding parameter.

    Separate multiple sharding parameters with commas (,) or specify only one sharding parameter in each line. Example: Shard index 1=Shard parameter 1,Shard index 2=Shard parameter 2,....

  3. On the Instances page, find the job that you want to view and click Details in the Operation column.

Create a Shell sharding job or a Go sharding job

You can create Shell sharding jobs or Go sharding jobs in a similar way as you create Python sharding jobs. For more information, see the Create a Python sharding job section of this topic.