All Products
Search
Document Center

E-MapReduce:Scale in a node group

Last Updated:Jan 22, 2024

When large amounts of resources remain idle in your E-MapReduce (EMR) cluster and you need to scale in a core node group or a subscription task node group, you must perform manual scale-in. Manual scale-in can ensure the stability of Hadoop Distributed File System (HDFS) data in the core node group. Subscription task node groups cannot be automatically scaled in.

Precautions

  • The operations in this topic cannot be rolled back. The components of a service cannot be recovered after you unpublish the components.

  • This topic describes the best practices for scale-in operations. We recommend that you evaluate the impacts on your business before you scale in a node group and proceed with caution. This can help prevent job scheduling failures and data security risks.

Preparations

Select the nodes whose components you want to unpublish based on the service load of your cluster. Focus on the specifications of the nodes whose components you want to unpublish to prevent high cluster loads. You can check the load of your cluster by using one of the following methods:

Important

Take note of the following items before you scale in a node group:

  • If the number of core nodes is equal to the number of Hadoop Distributed File System (HDFS) replicas, you cannot remove the core nodes.

  • If your cluster is a non-high-availability cluster, you cannot remove the emr-worker-1 or emr-worker-2 node.

  • If your cluster is a high availability cluster but the number of master nodes in the cluster is 2, you cannot remove the emr-worker-1 node.

Unpublish the components of a service deployed in a cluster

If the YARN, HDFS, SmartData, HBase, or StarRocks service is deployed in your cluster, you must unpublish the components of the services before you release the Elastic Compute Service (ECS) instances in the cluster. If the preceding condition is not met, jobs that run in the cluster may fail and data security risks may occur.

Unpublish the NodeManager component of the YARN service

  1. Go to the Status tab of the YARN service page.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the desired cluster and click Services in the Actions column.

    4. On the Services tab, find the YARN service and click Status.

  2. Unpublish the NodeManager component that is deployed on desired nodes.

    1. In the Components section of the Status tab, find NodeManager, move the pointer over the More icon in the Actions column, and then select Unpublish.

    2. In the dialog box that appears, configure the Execution Scope and Execution Reason parameters and click OK.

    3. In the Confirm message, click OK.

  3. In the upper-right corner of the Services tab, click Operation History to view the operation progress.

Unpublish the DataNode component of the HDFS service

  1. Log on to the master node of your cluster in SSH mode. For more information, see Log on to a cluster.

  2. Switch to the hdfs user and view the number of NameNodes.

    sudo su - hdfs
    hdfs haadmin -getAllServiceState
  3. Log on to the nodes on which NameNode is deployed in SSH mode and add the nodes whose DataNode component you want to unpublish to the dfs.exclude file. We recommend that you add only one node at a time.

    • Hadoop clusters

      touch /etc/ecm/hadoop-conf/dfs.exclude
      vim /etc/ecm/hadoop-conf/dfs.exclude

      Enter o after the vim command, start a new line, and then enter the hostname of the DataNode component that you want to unpublish.

      emr-worker-3.cluster-xxxxx
      emr-worker-4.cluster-xxxxx
    • Non-Hadoop clusters

      touch /etc/taihao-apps/hdfs-conf/dfs.exclude
      vim /etc/taihao-apps/hdfs-conf/dfs.exclude

      Enter o after the vim command, start a new line, and then enter the hostname of the DataNode component that you want to unpublish.

      core-1-3.c-0894dxxxxxxxxx
      core-1-4.c-0894dxxxxxxxxx
  4. Switch to the hdfs user on a node on which NameNode is deployed and run the following commands. Then, HDFS automatically starts to unpublish the DataNode component.

    sudo su - hdfs
    hdfs dfsadmin -refreshNodes
  5. Confirm the result.

    Run the following command to check whether the DataNode component is unpublished:

    hadoop dfsadmin -report

    If the status is Decommissioned, data of the DataNode component is migrated to other nodes and the DataNode component is unpublished.

Unpublish the JindoStorageService component of the SmartData service (Hadoop clusters)

  1. Go to the Status tab of the SmartData service.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the desired cluster and click Services in the Actions column.

    4. On the Services tab, find the SmartData service and click Status.

  2. Unpublish the JindoStorageService component that is deployed on a desired node.

    1. In the Components section of the Status tab, find JindoStorageService, move the pointer over the More icon in the Actions column, and then select Unpublish.

    2. In the dialog box that appears, configure the Execution Scope and Execution Reason parameters and click OK.

    3. In the Confirm message, click OK.

  3. In the upper-right corner of the Services tab, click Operation History to view the operation progress.

Unpublish the HRegionServer component of the HBase service

  1. Go to the Status tab of the HBase service page.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the desired cluster and click Services in the Actions column.

    4. On the Services tab, find the HBase service and click Status.

  2. Unpublish the HRegionServer component that is deployed on a desired node.

    1. In the Components section of the Status tab, find HRegionServer, move the pointer over the More icon in the Actions column, and then select Unpublish.

    2. In the dialog box that appears, configure the Execution Reason parameter and click OK. In the Confirm message, click OK.

  3. Click Operation History in the upper-right corner to view the operation progress.

Remove BE nodes of the StarRocks service

  1. Log on to your cluster and use a client to access the cluster. For more information, see Getting started.

  2. Run the following command to remove backend (BE) nodes:

    ALTER SYSTEM DECOMMISSION backend "be_ip:be_heartbeat_service_port";

    Replace the following parameters based on your business requirements.

    • be_ip: You can find the desired node and obtain its internal IP address on the Nodes tab.

    • be_heartbeat_service_port: The default value is 9050. You can run the show backends command to obtain the service port.

    If the speed at which the BE nodes are removed is slow, you can run the DROP command to forcefully remove the BE nodes.

    Important

    If you run the DROP command to remove the BE nodes, make sure that the system contains three replicas.

    ALTER SYSTEM DROP backend "be_ip:be_heartbeat_service_port";
  3. Run the following command to check the status of the BE nodes:

    show backends;

    Starrocks1

    If the value in the SystemDecommissioned column is true, the BE nodes are being removed. If the value in the TabletNum column is 0, the system cleans up the metadata.

    If the BE nodes are not displayed in the preceding figure, the nodes are successfully removed.

Unpublish the DataNode component of the HBase-HDFS service

  1. Log on to the master node of your cluster in SSH mode. For more information, see Log on to a cluster.

  2. Run the following commands to switch to the hdfs user and set the environment variable:

    sudo su - hdfs
    export HADOOP_CONF_DIR=/etc/taihao-apps/hdfs-conf/namenode
  3. Run the following command to view the information about the NameNode:

    hdfs dfsadmin -report
  4. Log on to the nodes on which NameNode is deployed in SSH mode and add the nodes whose DataNode component you want to unpublish to the dfs.exclude file. We recommend that you add only one node at a time.

    touch /etc/taihao-apps/hdfs-conf/dfs.exclude
    vim /etc/taihao-apps/hdfs-conf/dfs.exclude

    Enter o after the vim command, start a new line, and then enter the hostname of the DataNode component that you want to unpublish.

    core-1-3.c-0894dxxxxxxxxx
    core-1-4.c-0894dxxxxxxxxx
  5. Switch to the hdfs user on a node on which NameNode is deployed and run the following commands. Then, HDFS automatically starts to unpublish the DataNode component.

    sudo su - hdfs
    export HADOOP_CONF_DIR=/etc/taihao-apps/hdfs-conf/namenode
    hdfs dfsadmin -refreshNodes
  6. Confirm the result.

    Run the following command to check whether the DataNode component is unpublished:

    hadoop dfsadmin -report

    If the status is Decommissioned, data of the DataNode component is migrated to other nodes and the DataNode component is unpublished.

Release a node

Important

You can log on to the ECS console to release the nodes in a node group of your EMR cluster. If you want to perform this operation as a RAM user, you must have the required ECS permissions. We recommend that you attach the AliyunECSFullAccess policy to the RAM user.

  1. Go to the Nodes tab.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the desired cluster and click Nodes in the Actions column.

  2. On the Nodes tab, find the node that you want to release and click the ID of the node.

    The Instances page in the ECS console appears.

  3. Release the node in the ECS console. For more information, see Release an instance.

References

For information about how to scale in a task node group that contains pay-as-you-go instances or preemptible instances, see Scale in a cluster.