All Products
Search
Document Center

E-MapReduce:JVM memory tuning

Last Updated:Mar 26, 2026

As the number of files in your Hadoop Distributed File System (HDFS) cluster grows, the Java Virtual Machine (JVM) heap memory on the NameNode and DataNodes must grow with it. Without enough heap, new writes fail. This topic shows you how to calculate the required heap size and apply the configuration in the E-MapReduce (EMR) console.

Prerequisites

Before you begin, ensure that you have:

  • Access to the EMR console with permissions to modify cluster service configurations

  • The current file count and data block count from the HDFS web user interface (UI) — see Access the web UIs of open source components for instructions on opening the HDFS web UI

Adjust the NameNode heap size

Calculate the recommended heap size

Use the following formula:

Recommended memory size = (Number of files in millions + Number of data blocks in millions) × 512 MB

Example: A cluster has 10 million files. All files are small to medium sized (each fits within one block), so the block count also equals 10 million. Recommended heap size: (10 + 10) × 512 MB = 10,240 MB.

The following table shows recommended heap sizes for common file counts, assuming most files fit within one block.

Number of files Recommended memory size (MB)
10,000,000 10,240
20,000,000 20,480
50,000,000 51,200
100,000,000 102,400

Apply the configuration

The procedure differs depending on whether your cluster uses high availability (HA).

HA cluster

  1. Log on to the EMR console.

  2. Find the target cluster and click Services in the Actions column.

  3. On the Services tab, find the HDFS service and click Configure.

  4. On the Configure tab, search for hadoop_namenode_heapsize.

  5. Set the value based on your calculation.

  6. Restart the NameNode for the change to take effect.

Non-HA cluster

  1. Log on to the EMR console.

  2. Find the target cluster and click Services in the Actions column.

  3. On the Services tab, find the HDFS service and click Configure.

  4. On the Configure tab, search for hadoop_namenode_heapsize and hadoop_secondary_namenode_heapsize.

  5. Set the values based on your calculation.

  6. Restart the NameNode or the Secondary NameNode for the change to take effect.

Adjust the DataNode heap size

Calculate the recommended heap size

The heap demand on each DataNode depends on how many block replicas that node holds, not on the total file count. Use the following formulas:

Number of replicas per DataNode = Number of data blocks × 3 / Number of DataNodes
Recommended memory size = Number of replicas per DataNode in millions × 2,048 MB

The recommended value accounts for JVM kernel overhead and peak-hour job memory, so use it directly under normal circumstances.

Example: A cluster uses triplicate storage, runs on Elastic Compute Service (ECS) instances of the big data instance family, and has 6 core nodes. With 10 million files and 10 million data blocks (all small to medium sized):

  • Replicas per DataNode: 10,000,000 × 3 / 6 = 5,000,000

  • Recommended heap size: 5 × 2,048 MB = 10,240 MB

The following table shows recommended heap sizes based on the number of replicas per DataNode, assuming most files fit within one block.

Number of replicas per DataNode Recommended memory size (MB)
1,000,000 2,048
2,000,000 4,096
5,000,000 10,240

Apply the configuration

  1. Log on to the EMR console.

  2. Find the target cluster and click Services in the Actions column.

  3. On the Services tab, find the HDFS service and click Configure.

  4. On the Configure tab, search for hadoop_datanode_heapsize in the Configuration Filter section.

  5. Set the value based on your calculation.

  6. Restart the DataNodes for the change to take effect.