Configure a scheduler for an E-HPC cluster - Elastic High Performance Computing

Schedulers are used to schedule jobs in a cluster. Schedulers distribute jobs, plan job priorities, and allocate compute node resources, such as vCPUs, memory, and nodes, on demand. You can estimate the node resources to be used and the job completion time based on the job size, and configure the scheduler parameters of the cluster to improve the resource usage. This topic describes how to configure scheduler parameters in the Elastic High Performance Computing (E-HPC) console.

Limit

The scheduler of the cluster must be Slurm or PBS. The E-HPC console does not support other schedulers.

Procedure

Open the Scheduler page.
1. Log on to the E-HPC console.
2. In the top navigation bar, select a region.
3. In the left-side navigation pane, choose Resource Management > Scheduler.
Select a cluster from the Cluster drop-down list and a scheduler from the Scheduler drop-down list.

Configure the scheduler information and click submit in the upper-right corner.

Slurm

A Slurm scheduler supports following parameters:

Parameter

Description

Primary Scheduling Cycle

The scheduling timer, which indicates how often scheduling is initiated.

For example, a cluster contains only one node that is equipped with 1 vCPU. The primary scheduling cycle is set to 20s. You submit Job A and Job B that each require 1 vCPU and 30s to run. Then, the jobs run in the following method:

0s: Scheduling starts. Job A starts running and Job B is pending.
20s: Scheduling is triggered, but no idle resources can be assigned to Job B. Therefore, Job A is still running, and Job B is still pending.
30s: Idle resources are available, but Job B cannot obtain the idle resources to run because scheduling is not triggered. Therefore, Job A is complete, and Job B is still pending.
40s: The scheduling is triggered again. Job B starts running.

Backfill Scheduling Cycle

The backfill scheduling timer. The scheduler does not follow the priority order when scheduling is triggered. The scheduler preferentially submits light-loaded jobs to ensure high CPU utilization.

For example, a cluster contains only one node that is equipped with 8 vCPUs. The backfill scheduling cycle is set to 10s. You submit high-priority Job A and Job B that each require 6 vCPUs and 60 minutes to run, and then submit low-priority Job C that requires 2 vCPUs and 40 minutes to run. Then, the jobs run in the following method:

0s: Scheduling starts. Job A starts running. Job B and Job C are pending. The priority of Job B is higher than the priority of Job C. Even if idle resources are available for Job C, scheduling is not triggered.
10s: Backfill scheduling is triggered. The scheduler determines that light-loaded Job C can run ahead of high-priority Job B to ensure higher CPU utilization. As a result, Job A is running, Job B is pending, and Job C is running.
40 min: Job A is running, Job B is pending, and Job C is complete.
60 min: Job A is complete, Job B is running, and Job C is complete.

PBS

A PBS scheduler supports following parameters:

Setting section of the parameter	Parameter	Description
Global Configurations	Time reserved for historical assignments	The retention period of historical jobs. After the retention period expires, the data of the jobs is destroyed.
	Scheduling Cycle	The interval at which scheduling is triggered. If no other operations (such as submitting a job or restarting the scheduling service) trigger scheduling, scheduling is triggered every a period of the scheduling cycle.
	Maximum Jobs	The maximum number of jobs that are allowed for the cluster.
	Maximum Queuing Jobs	The maximum number of queuing jobs that are allowed in the cluster.
Queue Configurations	Queue	Select the queue that you want to configure.
	Resource Limits	Click Add Limit. Limits include: User: Select the users that you want to limit. CPU: The maximum number of vCPUs that the user can use in the selected queue. Memory: The maximum amount of memory that the user can use in the selected queue. Example: `1 gb` and `200 mb`. Node: The maximum number of nodes that the user can use in the selected queue. Maximum Jobs: The maximum number of jobs that the user can submit in the selected queue.
	User Mapping	Click Add New User. In the dialog box that appears, select a user and click OK. Note By default, the selected queue can be used by all users of the cluster. After you set user mapping, the selected queue can be used only by the selected user.