Capacity Scheduler is a built-in scheduler in Apache YARN. The YARN service that is deployed in an E-MapReduce (EMR) cluster uses Capacity Scheduler as the default scheduler. Capacity Scheduler is a multi-tenant, hierarchical resource scheduler. Resources used in each child queue are allocated based on the configured capacity.

Prerequisites

An EMR Hadoop cluster is created. For more information, see Create a cluster.

Precautions

After you turn on Enable Resource Queue, you can no longer configure cluster resources on the capacity-scheduler tab in the Service Configuration section of the Configure tab on the YARN service page. Existing configurations are synchronized to the Cluster Resources page. If you want to configure cluster resources on the Configure tab of the YARN service page, turn off Enable Resource Queue on the Cluster Resources page.

Configure Capacity Scheduler

  1. Go to the Cluster Resources page.
    1. Log on to the Alibaba Cloud EMR console.
    2. In the top navigation bar, select the region where you want to create a cluster. The region of a cluster cannot be changed after the cluster is created.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page, find your cluster and click Details in the Actions column.
    5. In the left-side navigation pane of the page that appears, click Cluster Resources.
  2. Enable Capacity Scheduler.
    1. On the Cluster Resources page, turn on Enable Resource Queue.
    2. Select Capacity Scheduler for Queue Type.
      Note If you use the cluster resource management feature for the first time, the Queue Type parameter is set to Capacity Scheduler by default.
    3. Click Save.
  3. In the upper-right corner of the Cluster Resources page, click Queue Settings.
  4. Configure queue information on the Queue Settings tab.
    • Find your queue and click Edit in the Actions column to modify resource queue information.
    • Find your queue and choose More > Create Child Queue in the Actions column to create a child queue.

      You cannot create a child queue for the default queue.

    root is a level-1 queue. It is the parent queue of all other queues and manages all resources of YARN. By default, only the default queue is available within the root queue.
    Notice
    • The sum of the Capacity values set for all child queues at the same level within the same parent queue must be 100, in percentage. For example, two child queues default and department are available within the root queue. The sum of the Capacity values set for the default and department queues must be 100. Child queues market and dev are available within the department queue. The sum of the Capacity values set for the market and dev queues must also be 100.
    • If you do not specify a queue when applications are running, jobs are submitted to the default queue.
    • After you create a level-2 queue within the root queue, you need only to click Deploy to make the configuration take effect.
    • After you create or modify a level-3 queue within the root queue, you must restart ResourceManager to make the configuration take effect.
    • You must restart ResourceManager after you modify the name of a queue.

Switch the scheduler type

After Enable Resource Queue is turned on, you can perform the following steps to switch the scheduler type:
Notice After the switchover is complete, you must restart ResourceManager to make the configuration take effect.
  1. In the upper-right corner of the Cluster Resources page, click Select Scheduler.
  2. Select the required scheduler for Queue Type.
  3. Click Save.
  4. Restart ResourceManager.
    1. In the upper-right corner of the Cluster Resources page, choose Actions > RestartResourceManager.
    2. In the Cluster Activities dialog box, configure the parameters and click OK.
    3. In the Confirm message, click OK.
      When a success message appears, the scheduler type is switched.

Disable the cluster resource management feature

Note After you disable the cluster resource management feature, you cannot perform operations on the Cluster Resources page. If you want to use the cluster resource management feature again, turn on Enable Resource Queue on the Cluster Resources page or configure the xml-direct-to-file-content parameter on the capacity-scheduler tab in the Service Configuration section of the Configure tab on the YARN service page.
  1. On the Cluster Resources page, turn off Enable Resource Queue.
  2. In the Disable Resource Queue dialog box, click OK.

Submit a job

  • If you do not specify a queue when applications are running, jobs are submitted to the default queue.
  • You must specify a child queue. Tasks cannot be submitted to the parent queue.
  • You must use the mapreduce.job.queuename parameter to specify the queue to which you want to submit a job. Example:
    `hadoop jar /usr/lib/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar pi -Dmapreduce.job.queuename=test  2 2`