YARN is a distributed resource management system. It is the core component of the Hadoop system. It manages resources in Hadoop clusters, and schedules and monitors tasks in the clusters.

Background information

The following figure shows the architecture of YARN. YARN
YARN consists of the following components:
  • ResourceManager: manages and schedules cluster resources and allocates resources for various types of tasks that are running on YARN.

    For a non-high availability (HA) Hadoop cluster, ResourceManager is deployed on the master node of the cluster. For an HA Hadoop cluster, ResourceManagers are deployed on all the master nodes of the cluster.

  • NodeManager: manages and monitors node resources and runs tasks on nodes.

    NodeManagers are deployed on core and task nodes of a Hadoop cluster.

  • ApplicationMaster: handles transactions related to applications.

    For example, ApplicationMaster schedules resources obtained from ResourceManager and communicates with NodeManagers to monitor and manage resources.

  • YARN client: submits tasks.

    YARN clients are deployed on master, core, and task nodes of a Hadoop cluster.

  • Job History Server: parses the metrics of a MapReduce task and displays the task running status.
  • App Timeline Server: collects the metrics of an application task and displays the task running status.
  • Web App Proxy Server: is responsible for the redirection to the URL of a task. It reduces the probability of web-based attacks.

Benefits

YARN in a Hadoop cluster has the following benefits:
  • By default, YARN is deployed in HA mode in an HA Hadoop cluster.
  • O&M is convenient.

    You can add NodeManagers, decommission NodeManagers, and perform a rolling restart on NodeManagers in the E-MapReduce (EMR) console.

  • Monitoring and alerting are supported.

    YARN can monitor various metrics and report alerts based on alert rules.

  • Graceful decommission of NodeManagers is supported.

    If graceful decommission is enabled, YARN does not decommission a NodeManager within a specific period of time until all running tasks on the node are completed.