All Products
Search
Document Center

E-MapReduce:Decommission DataNodes

Last Updated:Mar 17, 2025

With the popularization of data lake technologies, more and more E-MapReduce (EMR) users choose to store data in a fully managed data storage service, such as Object Storage Service (OSS) and OSS-HDFS. This way, a Hadoop cluster requires minimal DataNodes. By decommissioning DataNodes deployed on core nodes in an EMR cluster and deploying more task nodes, you can implement compute-storage separation, cost optimization, and O&M load reduction. This topic describes how to decommission DataNodes.

Usage notes

  • If the number of core nodes is equal to the number of replicas in Hadoop Distributed File System (HDFS), the core nodes cannot be decommissioned.

  • In earlier versions of EMR, if you deploy a service such as ZooKeeper or HDFS JournalNode on the emr-worker-1 node of a high availability (HA) cluster or the emr-worker-1 and emr-worker-2 nodes of a non-HA cluster, you can only decommission DataNodes on the preceding nodes, but cannot release the DataNodes or decrease the number of DataNodes.

Procedure

  1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.

  2. Run the following command to switch to the hdfs user:

    su hdfs
  3. Optional:Run the following command to view the number of NameNodes:

    Note

    This step is required only for an HA cluster. You do not need to view the number of NameNodes in a non-HA cluster.

    hdfs haadmin -getAllServiceState
  4. Edit the dfs.exclude file.

    Important
    • For an HA cluster, log on to each NameNode in SSH mode and edit the dfs.exclude file on the NameNode.

    • For a non-HA cluster, edit the dfs.exclude file on the current NameNode.

    • EMR DataLake cluster

      1. Run the following commands to open the dfs.exclude file:

        touch /etc/emr/hdfs-conf/dfs.exclude
        vim /etc/emr/hdfs-conf/dfs.exclude
      2. Add the hostnames of the DataNodes that you want to decommission to the file.

        core-1-3.c-7dfd6ac2b7c9****
        core-1-4.c-7dfd6ac2b7c9****
        Note

        To decommission more DataNodes, you need to only add the hostnames of the DataNodes to the end of the file without deleting the existing hostnames.

    • EMR Hadoop cluster

      1. Run the following commands to open the dfs.exclude file:

        touch /etc/ecm/hadoop-conf/dfs.exclude
        vim /etc/ecm/hadoop-conf/dfs.exclude
      2. Add the hostnames of the DataNodes that you want to decommission to the file.

        emr-worker-3.cluster-****
        emr-worker-4.cluster-****
        Note

        To decommission more DataNodes, you need to only add the hostnames of the DataNodes to the end of the file without deleting the existing hostnames.

  5. Decommission the specified DataNodes.

    Run the following command as the hdfs user on one of the NameNodes in an HA cluster or the current NameNode in a non-HA cluster. Then, HDFS automatically decommissions the specified DataNodes.

    hdfs dfsadmin -refreshNodes
  6. Run the following command to check whether the decommissioning is complete:

    hadoop dfsadmin -report

    If the value of the Decommission Status parameter of a DataNode is Decommission in progress, the DataNode is being decommissioned. If the value of the Decommission Status parameter of a DataNode is Decommissioned, the DataNode is decommissioned, and its data is migrated to another node.