When large amounts of resources remain idle in your E-MapReduce (EMR) cluster and you need to scale in a core node group or a subscription task node group, you must perform manual scale-in. Manual scale-in can ensure the stability of Hadoop Distributed File System (HDFS) data in the core node group. Subscription task node groups cannot be automatically scaled in.
Precautions
The operations in this topic cannot be rolled back. The components of a service cannot be recovered after you unpublish the components.
This topic describes the best practices for scale-in operations. We recommend that you evaluate the impacts on your business before you scale in a node group and proceed with caution. This can help prevent job scheduling failures and data security risks.
Preparations
Select the nodes whose components you want to unpublish based on the service load of your cluster. Focus on the specifications of the nodes whose components you want to unpublish to prevent high cluster loads. You can check the load of your cluster by using one of the following methods:
Method 1: View metrics in the EMR console. For more information, see View service metrics.
Method 2: View the status of a service by accessing the web UI of the service. For more information, see Access the web UIs of open source components.
Take note of the following items before you scale in a node group:
If
the number of core nodes is equal to the number of Hadoop Distributed File System (HDFS) replicas
, you cannot remove the core nodes.If your cluster is a non-high-availability cluster, you cannot remove the emr-worker-1 or emr-worker-2 node.
If your cluster is a high availability cluster but the number of master nodes in the cluster is 2, you cannot remove the emr-worker-1 node.
Unpublish the components of a service deployed in a cluster
If the YARN, HDFS, SmartData, HBase, or StarRocks service is deployed in your cluster, you must unpublish the components of the services before you release the Elastic Compute Service (ECS) instances in the cluster. If the preceding condition is not met, jobs that run in the cluster may fail and data security risks may occur.
Unpublish the NodeManager component of the YARN service
Go to the Status tab of the YARN service page.
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.
On the EMR on ECS page, find the desired cluster and click Services in the Actions column.
On the Services tab, find the YARN service and click Status.
Unpublish the NodeManager component that is deployed on desired nodes.
In the Components section of the Status tab, find NodeManager, move the pointer over the More icon in the Actions column, and then select Unpublish.
In the dialog box that appears, configure the Execution Scope and Execution Reason parameters and click OK.
In the Confirm message, click OK.
In the upper-right corner of the Services tab, click Operation History to view the operation progress.
Unpublish the DataNode component of the HDFS service
Log on to the master node of your cluster in SSH mode. For more information, see Log on to a cluster.
Switch to the hdfs user and view the number of NameNodes.
sudo su - hdfs hdfs haadmin -getAllServiceState
Log on to the nodes on which NameNode is deployed in SSH mode and add the nodes whose DataNode component you want to unpublish to the dfs.exclude file. We recommend that you add only one node at a time.
Hadoop clusters
touch /etc/ecm/hadoop-conf/dfs.exclude vim /etc/ecm/hadoop-conf/dfs.exclude
Enter
o
after the vim command, start a new line, and then enter the hostname of the DataNode component that you want to unpublish.emr-worker-3.cluster-xxxxx emr-worker-4.cluster-xxxxx
Non-Hadoop clusters
touch /etc/taihao-apps/hdfs-conf/dfs.exclude vim /etc/taihao-apps/hdfs-conf/dfs.exclude
Enter
o
after the vim command, start a new line, and then enter the hostname of the DataNode component that you want to unpublish.core-1-3.c-0894dxxxxxxxxx core-1-4.c-0894dxxxxxxxxx
Switch to the hdfs user on a node on which NameNode is deployed and run the following commands. Then, HDFS automatically starts to unpublish the DataNode component.
sudo su - hdfs hdfs dfsadmin -refreshNodes
Confirm the result.
Run the following command to check whether the DataNode component is unpublished:
hadoop dfsadmin -report
If the status is Decommissioned, data of the DataNode component is migrated to other nodes and the DataNode component is unpublished.
Unpublish the JindoStorageService component of the SmartData service (Hadoop clusters)
Go to the Status tab of the SmartData service.
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.
On the EMR on ECS page, find the desired cluster and click Services in the Actions column.
On the Services tab, find the SmartData service and click Status.
Unpublish the JindoStorageService component that is deployed on a desired node.
In the Components section of the Status tab, find JindoStorageService, move the pointer over the More icon in the Actions column, and then select Unpublish.
In the dialog box that appears, configure the Execution Scope and Execution Reason parameters and click OK.
In the Confirm message, click OK.
In the upper-right corner of the Services tab, click Operation History to view the operation progress.
Unpublish the HRegionServer component of the HBase service
Go to the Status tab of the HBase service page.
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.
On the EMR on ECS page, find the desired cluster and click Services in the Actions column.
On the Services tab, find the HBase service and click Status.
Unpublish the HRegionServer component that is deployed on a desired node.
In the Components section of the Status tab, find HRegionServer, move the pointer over the More icon in the Actions column, and then select Unpublish.
In the dialog box that appears, configure the Execution Reason parameter and click OK. In the Confirm message, click OK.
Click Operation History in the upper-right corner to view the operation progress.
Remove BE nodes of the StarRocks service
Log on to your cluster and use a client to access the cluster. For more information, see Getting started.
Run the following command to remove backend (BE) nodes:
ALTER SYSTEM DECOMMISSION backend "be_ip:be_heartbeat_service_port";
Replace the following parameters based on your business requirements.
be_ip
: You can find the desired node and obtain its internal IP address on the Nodes tab.be_heartbeat_service_port
: The default value is 9050. You can run theshow backends
command to obtain the service port.
If the speed at which the BE nodes are removed is slow, you can run the
DROP
command to forcefully remove the BE nodes.ImportantIf you run the
DROP
command to remove the BE nodes, make sure that the system contains three replicas.ALTER SYSTEM DROP backend "be_ip:be_heartbeat_service_port";
Run the following command to check the status of the BE nodes:
show backends;
If the value in the SystemDecommissioned column is true, the BE nodes are being removed. If the value in the TabletNum column is 0, the system cleans up the metadata.
If the BE nodes are not displayed in the preceding figure, the nodes are successfully removed.
Unpublish the DataNode component of the HBase-HDFS service
Log on to the master node of your cluster in SSH mode. For more information, see Log on to a cluster.
Run the following commands to switch to the hdfs user and set the environment variable:
sudo su - hdfs export HADOOP_CONF_DIR=/etc/taihao-apps/hdfs-conf/namenode
Run the following command to view the information about the NameNode:
hdfs dfsadmin -report
Log on to the nodes on which NameNode is deployed in SSH mode and add the nodes whose DataNode component you want to unpublish to the dfs.exclude file. We recommend that you add only one node at a time.
touch /etc/taihao-apps/hdfs-conf/dfs.exclude vim /etc/taihao-apps/hdfs-conf/dfs.exclude
Enter
o
after the vim command, start a new line, and then enter the hostname of the DataNode component that you want to unpublish.core-1-3.c-0894dxxxxxxxxx core-1-4.c-0894dxxxxxxxxx
Switch to the hdfs user on a node on which NameNode is deployed and run the following commands. Then, HDFS automatically starts to unpublish the DataNode component.
sudo su - hdfs export HADOOP_CONF_DIR=/etc/taihao-apps/hdfs-conf/namenode hdfs dfsadmin -refreshNodes
Confirm the result.
Run the following command to check whether the DataNode component is unpublished:
hadoop dfsadmin -report
If the status is Decommissioned, data of the DataNode component is migrated to other nodes and the DataNode component is unpublished.
Release a node
You can log on to the ECS console to release the nodes in a node group of your EMR cluster. If you want to perform this operation as a RAM user, you must have the required ECS permissions. We recommend that you attach the AliyunECSFullAccess policy to the RAM user.
Go to the Nodes tab.
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.
On the EMR on ECS page, find the desired cluster and click Nodes in the Actions column.
On the Nodes tab, find the node that you want to release and click the ID of the node.
The Instances page in the ECS console appears.
Release the node in the ECS console. For more information, see Release an instance.
References
For information about how to scale in a task node group that contains pay-as-you-go instances or preemptible instances, see Scale in a cluster.