All Products
Search
Document Center

E-MapReduce:Configurations of YARN resources

Last Updated:Mar 19, 2024

For E-MapReduce (EMR) clusters of a minor version earlier than V3.49.0 or V5.15.0, the default configurations of the heap memory size for YARN components are used. For EMR clusters of V3.49.0 or a later minor version and V5.15.0 or a later minor version, the default configurations of the heap memory size for YARN components are dynamically adjusted based on the instance types and service deployment of the clusters. This topic describes the configurations of the heap memory size for YARN components and configurations of cluster resources for YARN.

Note
  • After an EMR cluster is initialized, if the memory allocated to YARN components is too small, you can check whether an excessive number of services are deployed in the cluster. EMR allocates resources based on the services that are deployed in a cluster. If you deploy a large number of services in the cluster, the memory resources that are allocated to YARN components may be reduced. In addition, you can check whether the specifications of Elastic Compute Service (ECS) instances in a node group are too low and cannot meet the memory requirements of services that are deployed in an EMR cluster.

  • You can adjust the parameter settings of YARN components in the EMR console after an EMR cluster is created.

Configurations of the heap memory size for YARN components

On the Configure tab of the YARN service page in the EMR console, configure the parameters. The following table describes the parameters.

Component name

Configuration file

Parameter

Effective scope

Remarks

ResourceManager

yarn-env.sh

YARN_RESOURCEMANAGER_HEAPSIZE

Cluster

The minimum value is 1024. If a large number of small jobs exist, you can increase the heap memory size. If you increase the heap memory size, you must restart the ResourceManager component to make the modification take effect.

NodeManager

yarn-env.sh

YARN_NODEMANAGER_HEAPSIZE

Cluster

If full garbage collection (GC) occurs because the Shuffle Service component occupies a large amount of memory of the NodeManager component, you can increase the heap memory size. If you increase the heap memory size, you must restart the NodeManager component to make the modification take effect.

WebAppProxyServer

yarn-env.sh

YARN_PROXYSERVER_HEAPSIZE

Cluster

If you adjust the value of this parameter, you must restart the WebAppProxyServer component to make the modification take effect.

TimelineServer

yarn-env.sh

YARN_TIMELINESERVER_HEAPSIZE

Cluster

If you adjust the value of this parameter, you must restart the TimelineServer component to make the modification take effect.

TimelineServer

yarn-env.sh

-XX:MaxDirectMemorySize in YARN_TIMELINESERVER_OPTS

Cluster

The maximum size of the direct memory for TimelineServer. The minimum value of this parameter is 512m. If you adjust the value of this parameter, you must restart the TimelineServer component to make the modification take effect.

MRHistoryServer

mapred-env.sh

HADOOP_JOB_HISTORYSERVER_HEAPSIZE

Cluster

If you adjust the value of this parameter, you must restart the MRHistoryServer component to make the modification take effect.

Configurations of cluster resources for YARN

On the Configure tab of the YARN service page in the EMR console, configure the parameters. The following table describes the parameters.

Parameter

Description

Configuration file

Effective Scope

Remarks

yarn.scheduler.maximum-allocation-mb

The maximum memory resources that can be requested by a single container in a scheduler.

yarn-site.xml

Cluster

If a cluster needs to submit large jobs in a single container, you can increase the value of this parameter. However, an excessively large value may cause resource fragmentation. If you adjust the value of this parameter, you must restart the ResourceManager component to make the modification take effect.

yarn.scheduler.minimum-allocation-mb

The minimum memory resources that can be requested by a single container in a scheduler.

yarn-site.xml

Cluster

In most cases, you do not need to adjust the value of this parameter. If you adjust the value of this parameter, you must restart the ResourceManager component to make the modification to take effect.

yarn.scheduler.maximum-allocation-vcores

The maximum number of vCPUs that can be requested by a single container in a scheduler.

yarn-site.xml

Cluster

The default value is 32. If a cluster needs to submit large jobs in a single container, you can increase the value of this parameter. However, an excessively large value may cause resource fragmentation. If you adjust the value of this parameter, you must restart the ResourceManager component to make the modification take effect.

yarn.scheduler.minimum-allocation-vcores

The minimum number of vCPUs that can be requested by a single container in a scheduler.

yarn-site.xml

Cluster

The default value is 1. In most cases, you do not need to adjust the value of this parameter. If you adjust the value of this parameter, you must restart the ResourceManager component to make the modification to take effect.

yarn.nodemanager.resource.memory-mb

The memory resources that can be used by the NodeManager component.

yarn-site.xml

Node group

You can configure this parameter based on your cluster deployment. If you adjust the value of this parameter, you must restart the NodeManager component to make the modification take effect.

Important

When you configure this parameter, you must select a node group.

yarn.nodemanager.resource.cpu-vcores

The number of available vCPUs that can be used by the NodeManager component.

yarn-site.xml

Node group

The default value is the number of vCPUs of an instance type that is used by a node group. If the node group uses an instance type with high memory specifications, the node group uses twice the number of vCPUs of an instance type that has regular memory specifications. You can adjust the value of this parameter based on your cluster deployment. If you adjust the value of this parameter, you must restart the NodeManager component to make the modification take effect.

Important

When you configure this parameter, you must select a node group.

When you configure the yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores parameters on the Configure tab of the YARN service page, you must select Node Group Configuration from the drop-down list to the right of the search box. An EMR cluster can contain node groups that use different ECS instance types. Nodes in a node group use the same ECS instance type. If you configure the resources that can be used by the NodeManager component based on node groups, nodes that have low specifications can be used as expected and the resource utilization of nodes that have high specifications can be ensured during scheduling. In addition, you do not need to modify the configurations of the NodeManager resources for each node in a node group.YARN

EMR allows you to configure the yarn.scheduler.maximum-allocation-mb and yarn.nodemanager.resource.memory-mb parameters when you create a cluster or scale out a cluster by adding a node group for the first time. The value of the yarn.scheduler.maximum-allocation-mb parameter must be greater than the value of the yarn.nodemanager.resource.memory-mb parameter. This ensures that your jobs can be scheduled as expected.

  • When you upgrade the specifications of a node group or change the value of the yarn.nodemanager.resource.memory-mb parameter, the value of the yarn.scheduler.maximum-allocation-mb parameter is not automatically changed. You can manually change the value of the yarn.scheduler.maximum-allocation-mb parameter based on your business requirements.

  • To prevent your jobs from being affected, the first time you configure the yarn.scheduler.maximum-allocation-mb parameter for a new node group, the ResourceManager component is not automatically restarted. To make the configuration take effect, you must manually restart the ResourceManager component.

    Note

    A restart of the ResourceManager component may cause jobs to fail. We recommend that you restart the component during off-peak hours.