All Products
Search
Document Center

E-MapReduce:Limits

Last Updated:May 11, 2024

When you use E-MapReduce (EMR), cluster instability may occur or clusters may become unavailable due to unexpected operations. Take note of the information in this topic to avoid these issues. This topic describes the limits of EMR.

Important

When you use EMR, you must perform all operations in the EMR console. We recommend that you do not perform operations in the Elastic Compute Service (ECS) console. This may cause cluster instability or abnormalities. Take note of the information in this topic. In the case that you perform the operations that are prohibited, you will bear the consequences and responsibilities.

High-risk operations (prohibited)

Operation

Possible result

Suggestion

Delete or modify the hosts file that is stored in the etc/ directory.

You cannot find the services that run on the nodes of your cluster, which causes service exceptions.

Add information to the hosts file.

Modify parameters in component configuration files in the ECS console.

After specific services are restarted, the settings of the parameters that are modified are overwritten.

Modify the parameters in the EMR console.

Redeploy ECS instances in the ECS console or by calling the API operation provided by ECS.

The EMR service is affected.

None

Attach disks to the nodes of your EMR cluster in the ECS console or by calling the API operation provided by ECS.

The disks are unavailable because EMR cannot recognize and initialize the disks.

None

Detach disks from the nodes of your EMR cluster in the ECS console or by calling the API operation provided by ECS.

This may cause data loss because EMR is unaware of the disk detaching operation.

Remove core nodes in the ECS console or by calling the API operation provided by ECS.

This causes data loss, and execution failures of jobs on the removed nodes.

None

Remove master nodes in the ECS console or by calling the API operation provided by ECS.

  • For a high availability (HA) cluster, if you remove master nodes, the switchover of HDFS NameNode HA, YARN ResourceManager, or HBase HMaster fails. In this case, you must purchase a new EMR cluster to migrate data or tasks.

  • For a non-HA cluster, if you remove the master node, the cluster becomes unavailable, and you cannot migrate data or tasks.

None

Remove task nodes in the ECS console or by calling the API operation provided by ECS.

The jobs that you run on the removed nodes fail.

None

Stop the MySQL service of the master node. (Type is set to Built-in MySQL when you create an EMR cluster.)

The MySQL service deployed on the master-1-1 node is associated with Hive MetaStore, Oozie, and Ranger. If you stop the MySQL service, the associated components cannot access the specific database.

None

Change the password of the root user that is used to access the MySQL service deployed on the master-1-1 node. (Type is set to Built-in MySQL when you create an EMR cluster.)

The associated component such as Hue or Ranger fails.

None

Modify the security group to which ECS instances belong, the VPC in which ECS instances are deployed, and the vSwitch of ECS instances in the ECS console or by calling the API operation provided by ECS.

  • The network connection between nodes is abnormal.

  • Components become unavailable.

None

Change the billing method in the ECS console or by calling the API operation provided by ECS.

After you change the billing method, you can no longer change the new billing method back to the original one.

Change the billing method in the EMR console. For more information, see Switch from pay-as-you-go to subscription.

Important

You cannot change the billing method from subscription to pay-as-you-go in the EMR console.

Delete agent-related directories.

EMR clusters cannot run as expected.

None

FAQ

Problem description

Solution

Insufficient disk capacity

Increase the capacity of a single disk or add core nodes in the EMR console. EMR clusters do not support the addition of disks.

Excess disk capacity

Purchase a new cluster and release the original one. For more information, see Create a cluster. EMR clusters do not support scale-down of disk capacity.

Insufficient computing capabilities

Add task nodes in the EMR console. For more information, see Scale out a cluster.

Excess computing capabilities

Resolve this issue based on the billing method of your cluster.

  • For a pay-as-you-go cluster, remove one or more task nodes from the cluster in the EMR console.

  • For a subscription cluster, stop the NodeManager of YARN on a specific task node, change the billing method of the ECS instance that serves as the task node to pay-as-you-go in the ECS console, and then release the instance.

Outdated component versions

Purchase a cluster of a later version. You are not allowed to update a specific component of an existing cluster. For more information, see Create a cluster.

Conversion from a non-HA cluster to an HA cluster

Non-HA clusters cannot be converted to HA clusters. We recommend that you purchase an HA cluster.

Deployment of third-party software or services on EMR

We recommend that you use bootstrap actions to install third-party software or third-party services when you create a cluster.

If you manually install third-party software or third-party services after you create a cluster, you must reinstall the software or services when you add nodes.