When you use E-MapReduce (EMR), cluster instability may occur or clusters may become unavailable due to unexpected operations. Take note of the information in this topic to avoid these issues. This topic describes the limits of EMR.

Notice When you use EMR, you must perform all operations in the EMR console. We recommend that you do not perform operations in the Elastic Compute Service (ECS) console. This may cause cluster instability or abnormalities. Take note of the information in this topic. In the case that you perform the operations that are prohibited, you will bear the consequences and responsibilities.

High-risk operations (prohibited)

Operation Possible result Suggestion
Delete or modify the hosts file that is stored in the etc/ directory. You cannot find the services that run on the nodes of your cluster, which causes service exceptions. Add information to the hosts file.
Modify parameters in component configuration files in the ECS console. After specific services are restarted, the settings of the parameters that are modified are overwritten. Modify the parameters in the EMR console.
Redeploy ECS instances in the ECS console. The EMR service is affected. .
Attach disks to the nodes of your EMR cluster in the ECS console. The disks are unavailable because EMR cannot recognize and initialize the disks. Add data disks in the EMR console.
Detach disks from the nodes of your EMR cluster in the ECS console. This may cause data loss because EMR is unaware of the disk detaching operation. None
Remove core nodes in the ECS console. This causes data loss, and execution failures of jobs on the removed nodes. None
Remove master nodes in the ECS console.
  • For a high availability (HA) cluster, if you remove master nodes, the switchover of HDFS NameNode HA, YARN ResourceManager, or HBase HMaster fails. In this case, you must purchase a new EMR cluster to migrate data or tasks.
  • For a non-HA cluster, if you remove the master node, the cluster becomes unavailable, and you cannot migrate data or tasks.
None
Remove task nodes in the ECS console. The jobs that you run on the removed nodes fail. None
Stop the MySQL service of the master node. (Type is set to Built-in MySQL when you create an EMR cluster.) The MySQL service deployed on the emr-header-1 node is associated with Hive MetaStore, Oozie, and Ranger. If you stop the MySQL service, the associated components cannot access the specific database. None
Change the password of the root user that is used to access the MySQL service deployed on the emr-header-1 node. (Type is set to Built-in MySQL when you create an EMR cluster.) The associated component such as Hue or Ranger fails. None
Modify the security group to which ECS instances belong when an EMR cluster is running.
  • The network connection between nodes is abnormal.
  • Components become unavailable.
None

FAQ

Problem description Solution
Insufficient disk capacity Increase the capacity of a single disk or add core nodes in the EMR console. EMR clusters do not support the addition of disks.
Excess disk capacity Purchase a new cluster and release the original one. For more information, see Create a cluster. EMR clusters do not support scale-down of disk capacity.
Insufficient computing capabilities Add task nodes in the EMR console. For more information, see Scale out a cluster.
Excess computing capabilities
Resolve this issue based on the billing method of your cluster.
  • For a pay-as-you-go cluster, remove one or more task nodes from the cluster in the EMR console.
  • For a subscription cluster, stop the NodeManager of YARN on a specific task node, change the billing method of the ECS instance that serves as the task node to pay-as-you-go in the ECS console, and then release the instance.
Outdated component versions Purchase a cluster of a later version. You are not allowed to update a specific component of an existing cluster. For more information, see Create a cluster.
Conversion from a non-HA cluster to an HA cluster Non-HA clusters cannot be converted to HA clusters. We recommend that you purchase an HA cluster.
Deployment of third-party software or services on EMR We recommend that you use bootstrap actions to install third-party software or third-party services when you create a cluster.

If you manually install third-party software or third-party services after you create a cluster, you must reinstall the software or services when you add nodes.