Unpublish the components deployed on nodes - E-MapReduce

You can scale in a Core or Task node group to save resources if your cluster load is low for an extended period and a large amount of cluster resources are idle. You can scale in pay-as-you-go Task node groups in the console. For other types of node groups, such as pay-as-you-go Core node groups, subscription Task node groups, and subscription Core node groups, follow the procedures in this topic.

Limitations

To avoid data loss, if the number of Core nodes in your cluster equals the number of replicas in Hadoop Distributed File System (HDFS), do not scale in the core node group.
If your cluster is an old version high availability (HA) Hadoop cluster with 2 master nodes, do not scale in the emr-worker-1 node (in old version high availability clusters, Zookeeper is deployed on worker-1).

Precautions

The operations in this topic cannot be rolled back. The components of a service cannot be recovered after you unpublish the components.
This topic describes the best practices for scale-in operations. We recommend that you evaluate the impacts on your business before you scale in a node group and proceed with caution. This can help prevent job scheduling failures and data security risks.

How to select nodes for removal

Scaling in a node group is primarily achieved by removing nodes from the node group. Select the node that you want to remove from your cluster based on the service load of the cluster. You can use one of the following methods to view the resource usage of a cluster and select the node that you want to remove:

Method 1: EMR console Monitoring and Diagnostics

On the Metric Monitoring page, view the AvailableVCores metric on the YARN-Queues dashboard. A consistently high AvailableVCores metric indicates that many cores are available in the queue, and you can consider scaling in the Core or Task node groups.
On the Metric Monitoring page, view the AvailableGB metric in the YARN-NodeManagers dashboard. If the AvailableGB metric for a node remains at a high level for an extended period, this indicates that the node has a large amount of available memory, and you can consider releasing the node.

Note

You can evaluate other metrics in the context of your business requirements.

Method 2: Use the web UI of YARN

View the queue resource usage of a cluster. If the queue resources are often underutilized, consider scaling in the Core or Task node group.
On the Nodes page, sort the nodes by Nodes Address and identify the node that has largest amount of available memory resources. Then, remove the node.

Important

If your cluster is an old version Hadoop cluster, take note of the following items:

If your cluster is a non-high-availability cluster, you cannot remove the emr-worker-1 or emr-worker-2 node from the cluster.
If your cluster is a high availability cluster but the number of master nodes in the cluster is 2, you cannot remove the emr-worker-1 node from the cluster.

Step 1: View the components on the node

When you scale in a Core or Task node group by removing nodes, you must first unpublish the components on those nodes before you can release the corresponding node resources. You can view which components are deployed on each node on the Nodes page in the console.

core_p3

Step 2: Unpublish the components deployed on nodes

If the following components are deployed on the node that you want to remove, you must unpublish the components before you remove the nodes. If you remove the node on which following components are deployed without unpublishing the components, jobs that run on the node may fail and data security risks may occur.

Unpublish the NodeManager component of the YARN service
Unpublish the DataNode component of the HDFS service
Unpublish the Backend component of the StarRocks service
Unpublish the HRegionServer component of the HBase service
Unpublish the DataNode component of the HBase-HDFS service
Unpublish the JindoStorageService component of the SmartData service (Hadoop clusters)

Unpublish the NodeManager component of the YARN service

Go to the Status tab of the YARN service page.
1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
2. In the top navigation bar, select a region and a resource group as needed.
3. On the EMR on ECS page, find the target cluster and click Services in the Actions column.
4. On the Cluster Services page, click Status in the YARN service area.
Unpublish the NodeManager component that is deployed on a desired node.
1. In the Components list, click > Unpublish in the Actions column for NodeManager.
2. In the dialog box that appears, select Execution Scope > Specific Machine, enter Execution Reason, and click OK.
3. In the pop-up dialog box, click OK.
Click Operation History in the upper-right corner to view the operation progress.

Unpublish the DataNode component of the HDFS service

Log on to the master node of the cluster in SSH mode. For more information, see Log on to a cluster.
Switch to the hdfs user and view the number of NameNodes.
```
sudo su - hdfs
hdfs haadmin -getAllServiceState
```
Log on to the nodes on which NameNode is deployed in SSH mode and add the nodes whose DataNode component you want to unpublish to the dfs.exclude file. We recommend that you add only one node at a time.
- Hadoop clusters
```
touch /etc/ecm/hadoop-conf/dfs.exclude
vim /etc/ecm/hadoop-conf/dfs.exclude
```
  In vim, press o to start a new line and enter the hostname of the DataNode that you want to unpublish.
```
emr-worker-3.cluster-xxxxx
emr-worker-4.cluster-xxxxx
```
- Non-legacy Hadoop clusters
```
touch /etc/taihao-apps/hdfs-conf/dfs.exclude
vim /etc/taihao-apps/hdfs-conf/dfs.exclude
```
  Enter o to insert a new line and enter the hostname of the DataNode that you want to unpublish.
```
core-1-3.c-0894dxxxxxxxxx
core-1-4.c-0894dxxxxxxxxx
```
Switch to the hdfs user on a node on which NameNode is deployed and run the following commands. Then, HDFS automatically starts to unpublish the DataNode component.
```
sudo su - hdfs
hdfs dfsadmin -refreshNodes
```
Confirm the result.
Run the following command to determine whether the offline process is complete.
```
hadoop dfsadmin -report
```
If the status is Decommissioned, the data of the DataNode component is migrated to other nodes and the DataNode component is unpublished.

Unpublish the Backend component of the StarRocks service

Log on to the cluster and use a client to access it. For more information, see Quick Start.
Run the following command to unpublish the BE node using the DECOMMISSION method.
```
ALTER SYSTEM DECOMMISSION backend "be_ip:be_heartbeat_service_port";
```
Replace the following parameters as needed:
- be_ip: The internal IP address of the BE node that you want to scale in. You can find the IP address on the Nodes page.
- be_heartbeat_service_port: The default value is 9050. You can run the show backends command to view the port.
If Decommission is slow, you can use the DROP method to unpublish the BE by force.
Important
If you use the DROP method to unpublish the BE node, ensure that the system has three complete replicas.
```
ALTER SYSTEM DROP backend "be_ip:be_heartbeat_service_port";
```
Run the following command to check the status of the BE node:
```
show backends;
```
If the value in the SystemDecommissioned column is true, the BE nodes are being removed. If the value in the TabletNum column is 0, the system cleans up the metadata.
If the BE nodes are not displayed in the preceding figure, the nodes are successfully removed.

Unpublish the HRegionServer component of the HBase service

Go to the Status tab of the HBase service page.
1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
2. In the top menu bar, select a region and a resource group as needed.
3. On the EMR on ECS page, find the desired cluster and click Services in the Actions column.
4. On the Cluster Services page, click Status in the HBase service area.
Unpublish the HRegionServer component that is deployed on the desired node.
1. In the Components section, click Stop in the Actions column of HRegionServer.
2. In the dialog box, select Execution Scope > Specific Machine, enter the Execution Reason, and click OK.
3. In the dialog box, click OK.
Click Operation History in the upper-right corner to view the operation progress.

Unpublish the DataNode component of the HBase-HDFS service

Log on to the master node of the cluster in SSH mode. For more information, see Log on to a cluster.
Run the following commands to switch to the hdfs user and set the environment variable:
```
sudo su - hdfs
export HADOOP_CONF_DIR=/etc/taihao-apps/hdfs-conf/namenode
```
Run the following command to view information about the NameNode:
```
hdfs dfsadmin -report
```
Log on to the nodes on which NameNode is deployed in SSH mode and add the nodes on which you want to unpublish the DataNode component to the dfs.exclude file. We recommend that you add only one node at a time.
```
touch /etc/taihao-apps/hdfs-conf/dfs.exclude
vim /etc/taihao-apps/hdfs-conf/dfs.exclude
```
In vim, press o to start a new line and enter the hostname of the DataNode that you want to unpublish.
```
core-1-3.c-0894dxxxxxxxxx
core-1-4.c-0894dxxxxxxxxx
```
Switch to the hdfs user on a node on which NameNode is deployed and run the following commands. Then, HDFS automatically starts to unpublish the DataNode component.
```
sudo su - hdfs
export HADOOP_CONF_DIR=/etc/taihao-apps/hdfs-conf/namenode
hdfs dfsadmin -refreshNodes
```
Confirm the result.
Run the following command to check whether the DataNode component is unpublished:
```
hadoop dfsadmin -report
```
If the status is Decommissioned, the data of the DataNode component is migrated to other nodes and the DataNode component is unpublished.
Unpublish the JindoStorageService component of the SmartData service (Hadoop clusters)
1. Go to the Status tab of the SmartData service page.
  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
  2. In the top menu bar, select a region and a resource group as needed.
  3. On the EMR on ECS page, click Services in the Actions column of the target cluster.
  4. On the Cluster Services page, click Status in the SmartData service area.
2. Unpublish the JindoStorageService component that is deployed on a desired node.
  1. In the Components List, click > Unpublish in the Actions column of JindoStorageService.
  2. In the dialog box that appears, select Execution Scope > Specific Machine, enter the Execution Reason, and click OK.
  3. In the pop-up dialog box, click OK.
3. Click Operation History in the upper-right corner to view the operation progress.

Step 3: Release the node