When you store data in Object Storage Service (OSS) or OSS-HDFS, your Hadoop cluster needs fewer DataNodes. Decommissioning DataNodes on core nodes and replacing them with task nodes reduces storage costs and O&M overhead while keeping full compute capacity.
How decommissioning works: HDFS moves the DataNode into a Decommission in progress state and replicates all its data blocks to remaining nodes. Once all blocks are fully replicated elsewhere, the DataNode transitions to Decommissioned. Decommissioning time scales with the amount of data stored on the node — this is expected behavior.
Prerequisites
Before you begin, ensure that you have:
-
SSH access to the cluster. See Log on to a cluster
-
Enough remaining core nodes so that the node count exceeds the Hadoop Distributed File System (HDFS) replication factor
If the number of core nodes equals the HDFS replication factor, decommissioning is blocked.
Usage notes
-
Earlier EMR versions only: If a service such as ZooKeeper or HDFS JournalNode runs on
emr-worker-1(HA clusters) or onemr-worker-1andemr-worker-2(non-HA clusters), you can decommission DataNodes on those nodes but cannot release them or reduce their count.
Decommission DataNodes
Step 1: Log on to the cluster and switch to the hdfs user
Log on to your cluster in SSH mode, then run:
su hdfs
Step 2: (HA clusters only) Identify your NameNodes
Skip this step for non-HA clusters.
hdfs haadmin -getAllServiceState
The output lists all NameNodes and their current state. Note the hostnames — you need to edit dfs.exclude on each NameNode in the next step.
Step 3: Edit the dfs.exclude file
The dfs.exclude file path differs by cluster type. Find your cluster type below and follow the corresponding instructions.
For HA clusters, log on to each NameNode and edit the file on each one. For non-HA clusters, edit the file on the current NameNode only.
EMR DataLake cluster
touch /etc/emr/hdfs-conf/dfs.exclude
vim /etc/emr/hdfs-conf/dfs.exclude
Add the hostname of each DataNode to decommission, one per line:
core-1-3.c-7dfd6ac2b7c9****
core-1-4.c-7dfd6ac2b7c9****
EMR Hadoop cluster
touch /etc/ecm/hadoop-conf/dfs.exclude
vim /etc/ecm/hadoop-conf/dfs.exclude
Add the hostname of each DataNode to decommission, one per line:
emr-worker-3.cluster-****
emr-worker-4.cluster-****
To decommission additional DataNodes later, append their hostnames to the end of the file. Do not delete existing entries.
Step 4: Trigger decommissioning
Run the following on one NameNode (HA clusters) or the current NameNode (non-HA clusters):
hdfs dfsadmin -refreshNodes
HDFS automatically starts migrating data blocks off the listed nodes.
Step 5: Verify decommissioning status
hadoop dfsadmin -report
Check the Decommission Status field for each node you are decommissioning:
| Status | Meaning |
|---|---|
Decommission in progress |
HDFS is still replicating data blocks off this node |
Decommissioned |
All data has been migrated; the node is fully decommissioned |
Run the command periodically until all target nodes show Decommissioned.