This topic provides answers to some frequently asked questions about cluster management.

How do I fix the error "The specified zone is not available for purchase"?

Pay-as-you-go Elastic Compute Service (ECS) instances are unavailable in the zone you selected. We recommend that you select a different zone.

How do I fix the error "The request processing has failed due to some unknown error, exception or failure"?

Try again later. If the error persists, submit a ticket. You can also submit a ticket right after the error is reported.

How do I fix the error "The Node Controller is temporarily unavailable"?

Wait a moment and create a cluster again.

How do I fix the error "No quota or zone is available"?

ECS instances in the zone you selected are insufficient. Create a cluster again, but select a different zone or use the default zone when you create the cluster.

How do I fix the error "The specified InstanceType is not authorized for use"?

Submit applications to use pay-as-you-go instances with advanced configurations (each instance with more than eight vCPUs). Click submit a ticket to submit the applications. The supported specifications include eight vCPUs and 16 GiB of memory, eight vCPUs and 32 GiB of memory, 16 vCPUs and 32 GiB of memory, and 16 vCPUs and 64 GiB of memory.

How do I fix the error "The specified instance Type exceeds the maximum limit for the PostPaid instances"?

Possible causes:
  • Your quota for pay-as-you-go instances has been reached.
  • You do not have the permissions to create the current type of instance.
Solutions:
  • Submit an application to increase your quota.
  • Go to the ECS console and grant your account the permissions to create the current type of instance.

How do I log on to a core node?

  1. On the master node, run the following command to switch to the hadoop user:
    su hadoop
  2. Log on to the core node in password-free mode.
    ssh emr-worker-1
  3. Run the following sudo command to obtain the root permissions:
    sudo su - root

Why do I still receive renewal notifications after I have renewed my cluster?

Cause: You are charged for both E-MapReduce (EMR) resources and ECS instances. You probably renewed only ECS instances.

Solution: Go to the Cluster Overview page for your EMR cluster. Select Renewal from the Renewal drop-down list in the upper-right corner. On the page that appears, check the expiration time of the ECS instances and EMR resources.

Do EMR clusters support auto-renewal?

Yes, both EMR resources and ECS instances can be automatically renewed.

How do I apply for a refund for an EMR cluster?

submit a ticket to apply for a refund.

Do I need to handle a cluster creation failure?

No, you do not need to handle cluster creation failures. No computing resources are created if you fail to create a cluster. The cluster record is automatically removed from the cluster list in the EMR console after three days.

What is the division of work in an EMR cluster?

A standard EMR cluster consists of a single master node and multiple core nodes. Only core nodes store data and implement data computing. For example, a cluster consists of three instances. Each instance has four vCPUs and 8 GiB of memory. One instance serves as the master node and the other two instances serve as core nodes. The available computing resources of this cluster are two instances, each with four vCPUs and 8 GiB of memory.

How do I handle disk exceptions in a Kafka cluster?

Cause: The disk is full or damaged.

Solution:
  • If your disk is full, perform the following operations:
    1. Log on to the server.
    2. Find the fully occupied disk and delete unnecessary data to free up some of the disk space. Take note of the following rules:
      • Do not delete Kafka data directories. Otherwise, you may lose all of your data.
      • Do not delete Kafka topics, such as consumer_offsets and schema.
      • Find the topics that occupy a large space or that you no longer need. Delete historical log segments and the index and timeindex files of the segments from some partitions of the topics.
    3. Restart the Kafka broker.
  • If your disk is damaged, perform the following operations:
    • If no more than 25% of disks are damaged on a machine, you do not need to take an action.
    • If more than 25% of disks are damaged on a machine, submit a ticket.

How do I fix the error "The specified DataDisk Size beyond the permitted range, or the capacity of snapshot exceeds the size limit of the specified disk category"?

The disk size you specified is too small. Set the disk size to 40 GB or larger.

How do I fix the error "Your account does not have enough balance"?

The account balance is insufficient. Top up your account and try again.

How do I fix the error "The maximum number of Pay-As-You-Go instances is exceeded: create ecs vcpu quota per region limited by user quota [xxx]"?

Your quota for pay-as-you-go ECS instances has been reached. You can release some existing pay-as-you-go ECS instances or submit an application to increase your quota.

Can I install additional software on the master node of an EMR cluster?

We recommend that you do not install additional software. The installation may affect the stability and reliability of the cluster.

Do services on each node automatically start when I power on the server? Are services automatically resumed after they are interrupted unexpectedly?

Yes, services automatically start when you power on the server, and services are automatically resumed after they are interrupted unexpectedly.

What is the port number of the HBase Thrift server?

The port number of the HBase Thrift server is 9099.

Does EMR support preemptible instances?

If you enable the auto scaling feature for a cluster, you can use preemptible instances. For more information, see Manage auto scaling.

How do I fix the error FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient?

During cluster creation, Unified Metabases is selected for Type. A Hive error is reported.

The cluster fails to be created because the cluster does not have an elastic IP address. submit a ticket.

Why can I not set EMR Version to EMR-3.4.3 when I create a cluster?

EMR is updated periodically. Some earlier versions are deprecated. Check whether the versions of services such as Hive and Spark in the available EMR versions meet your business requirements. If you want to use a deprecated EMR version, submit a ticket.

What are the differences between EMR and MaxCompute?

Both of them are big data processing solutions. EMR is a big data platform that is built completely based on open source technologies. It is fully compatible with open source software. MaxCompute is a platform developed by Alibaba Cloud and is not open source. It offers easy-to-use features based on the encapsulation and low costs of operations and maintenance.

Does EMR support automatic storage balancing? How do I manually rebalance storage?

Automatic storage balancing is not supported. You can perform the following steps to manually rebalance the storage of a cluster:
  1. Log on to the EMR console.
  2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
  3. Click the Cluster Management tab.
  4. Find the cluster whose storage you want to rebalance, and click Details in the Actions column.
  5. In the left-side navigation pane, choose Cluster Service > HDFS.
  6. In the upper-right corner of the page that appears, choose Actions > Rebalance.
  7. In the Cluster Activities dialog box, specify the related parameters and click OK.
  8. In the Confirm message, click OK.

How do I apply for an instance type with advanced configurations?

submit a ticket.

What do I do if the disk capacity of an EMR cluster is insufficient?

Expand the capacity of a disk or add core nodes to the EMR cluster. You are not allowed to add disks to a node of an EMR cluster.

What do I do if the disk capacity of an EMR cluster is excessively large?

Purchase a new cluster and release the original one. You are not allowed to scale down the disk capacity of an EMR cluster. For more information, see Create a cluster.

What do I do if the computing capability of an EMR cluster is excessively low?

Add task nodes to the cluster in the EMR console. For more information, see Scale out a cluster.

What do I do if the computing capability of an EMR cluster is excessively high?

Resolve this issue based on the billing method of your cluster.
  • For a pay-as-you-go cluster, remove one or more task nodes from the cluster in the EMR console.
  • For a subscription cluster, stop the NodeManager of YARN on a specific task node, change the billing method of the ECS instance that serves as the task node to pay-as-you-go in the ECS console, and then release the instance.

What do I do if the version of a component in an EMR cluster does not meet my business requirements?

Purchase a cluster of a later version. You are not allowed to update a specific component of an existing cluster. For more information, see Create a cluster.

How do I convert a non-high-availability (non-HA) cluster into an HA cluster?

Non-HA clusters cannot be converted into HA clusters. We recommend that you purchase an HA cluster.

How do I deploy third-party software or third-party services on EMR?

We recommend that you use bootstrap actions to install third-party software or third-party services when you create a cluster. If you manually install third-party software or third-party services after you create a cluster, you must reinstall the software or services when you add nodes.

What do I do if an ECS instance of an EMR cluster reports an alert that an error occurred and the instance must be re-deployed?

submit a ticket.

I upgraded the node configurations of a node group to increase the memory and vCPUs. Why are the added resources not applied to the YARN service?

The memory and vCPU resources of the YARN service on an EMR cluster are the sum of the resources of the NodeManager on each node. However, the memory and vCPU configurations of the NodeManager on each node do not automatically change after you upgrade the node configurations. You can manually modify the configurations and restart the NodeManager on each node. This way, the total memory and vCPU resources of the YARN service increase. For more information about how to modify the parameters of a service, see Manage parameters for services.

You must modify the following parameters:
  • Memory: yarn.nodemanager.resource.memory-mb
  • vCPU: yarn.nodemanager.resource.cpu-vcores
mem_cpu

Why does a job remain in the ACCEPTED state when sufficient resources are available in the YARN service?

When you run a job on YARN, an ApplicationMaster starts and applies for the required resources. However, the yarn.scheduler.capacity.maximum-am-resource-percent parameter is used to limit the percentage of YARN resources that an ApplicationMaster can use. By default, the value of this parameter is 0.25, which indicates 25%. If your jobs do not consume a large number of resources, we recommend that you set the yarn.scheduler.capacity.maximum-am-resource-percent parameter to a value that ranges from 0.5 to 0.8. After you modify the parameter, you do not need to restart the YARN service. You can directly run the yarn rmadmin -refreshQueues command on a node. For more information about how to modify the parameters of a service, see Manage parameters for services. maximum

What are the differences between the two job submission modes Worker Node and Header/Gateway Node?

On the Edit Job page, click Job Settings in the upper-right corner. In the Job Settings panel, click the Advanced Settings tab. In the Mode section, select Worker Node or Header/Gateway Node for Job Submission.

Worker Node: In this mode, you cannot specify a node to submit the job. Before your actual task starts and is monitored, a task named Launcher starts. Therefore, resources for two ApplicationMasters are required. In most cases, the application IDs of the two tasks are continuous.

Header/Gateway Node: In this mode, you can specify a node to submit the job. Only the actual tasks start. However, if the number of tasks that you submit is excessively large, the node may be overloaded. If the memory of the node is insufficient, some tasks cannot start.

How do I allow ECS instances in the classic network and ECS instances of EMR clusters in a virtual private cloud (VPC) to access each other?

Alibaba Cloud provides two network types: classic network and VPC. EMR clusters run in VPCs, but the business systems of many users run in the classic network. To connect these two types of networks, Alibaba Cloud launched the ClassicLink solution. For more information, see ClassicLink overview.

To implement this solution, perform the following steps:
  1. Create a vSwitch for which a specific CIDR is configured. For more information, see ClassicLink overview.
  2. Create an EMR cluster. Select the created vSwitch when you create the cluster.
  3. In the ECS console, connect the required instances of the classic network to the VPC where the EMR cluster resides.
  4. Configure security group rules.

How do I access the web UIs of open source services?

You can access the web UIs of open source services on the Public Connect Strings page in the EMR console. For more information, see Access the web UIs of open source components.

How do I isolate the Object Storage Service (OSS) data of different Resource Access Management (RAM) users?

To isolate the OSS data of different RAM users, perform the following steps in the RAM console:
  1. Log on to the RAM console by using your Alibaba Cloud account.
  2. Create a RAM user.
    1. In the left-side navigation pane, choose Identities > Users.
    2. On the Users page, click Create User.
      Note You can create multiple RAM users at the same time.
    3. On the Create User page, specify Logon Name and Display Name.
    4. In the Access Mode section, select Console Access or OpenAPI Access.
      • Console Access: If you select this access mode, you must configure the basic settings for logon security. These settings specify whether to use a system-generated logon password or a custom logon password, whether to reset the password on the next logon, and whether to enable multi-factor authentication (MFA).
      • OpenAPI Access: If you select this option, an AccessKey pair is automatically created for the RAM user. The RAM user can call API operations or use other development tools to access Alibaba Cloud resources.
      Note To ensure the security of your Alibaba Cloud account, we recommend that you select only one access mode for the RAM user. This way, the RAM user cannot use an AccessKey pair to access Alibaba Cloud resources after the RAM user leaves the organization.
    5. Click OK.
  3. Create permission policies.
    1. In the left-side navigation pane, choose Permissions > Policies.
    2. Click Create Policy.
    3. On the Create Custom Policy page, specify Policy Name.
    4. Select Script for Configuration Mode. For more information about how to configure a permission policy in Script mode, see Policy structure and syntax. In this example, two policies are created based on different environments. You can select one of the scripts to create a policy based on your business environment.
      Test environment (test-bucket) Production environment (prod-bucket)
      {
      "Version": "1",
      "Statement": [
      {
      "Effect": "Allow",
      "Action": [
        "oss:ListBuckets"
      ],
      "Resource": [
        "acs:oss:*:*:*"
      ]
      },
      {
      "Effect": "Allow",
      "Action": [
        "oss:Listobjects",
        "oss:GetObject",
        "oss:PutObject",
        "oss:DeleteObject"
      ],
      "Resource": [
        "acs:oss:*:*:test-bucket",
        "acs:oss:*:*:test-bucket/*"
      ]
      }
      ]
      }
      {
      "Version": "1",
      "Statement": [
      {
      "Effect": "Allow",
      "Action": [
        "oss:ListBuckets"
      ],
      "Resource": [
        "acs:oss:*:*:*"
      ]
      },
      {
      "Effect": "Allow",
      "Action": [
        "oss:Listobjects",
        "oss:GetObject",
        "oss:PutObject"
      ],
      "Resource": [
        "acs:oss:*:*:prod-bucket",
        "acs:oss:*:*:prod-bucket/*"
      ]
      }
      ]
      }
      After the preceding permission policies are attached to a RAM user, the RAM user is subject to the following limits in the EMR console:
      • When the RAM user creates a cluster, job, or workflow, all buckets are displayed on the OSS file page. However, the RAM user can access only authorized buckets.
      • Only the data in authorized buckets can be viewed.
      • A job can read and write data only from and to an authorized bucket.
    5. Click OK.
  4. Optional:Authorize the RAM user.
    If the created RAM user is not authorized, perform the following steps to authorize the RAM user:
    1. In the left-side navigation pane, choose Identities > Users.
    2. On the Users page, find the RAM user that you want to authorize and click Add Permissions in the Actions column.
    3. In the Add Permissions panel, click the policies that you want to attach to the RAM user and click OK.
    4. Click Complete.
  5. Optional:Grant console logon permissions to the RAM user.
    If you have not granted console logon permissions to the RAM user that you created, perform the following steps to grant the permissions to the RAM user:
    1. In the left-side navigation pane, choose Identities > Users.
    2. On the Users page, find the RAM user to which you want to grant console logon permissions and click the logon name of the RAM user.
    3. In the Console Logon Management section of the page that appears, click Enable Console Logon.
    4. In the Modify Logon Settings panel, select Enabled for Console Password Logon.
    5. Click OK.
  6. Log on to the EMR console by using the RAM user.
    1. Log on to the Alibaba Cloud Management Console by using the RAM user.
    2. Click the More icon in the upper-left corner and choose Products and Services > E-MapReduce.

How do I connect EMR clusters that belong to different VPCs of the same account?

Each VPC is an isolated network environment. You can customize CIDR blocks, create subnets, and configure route tables and gateways for VPCs. You can create EMR clusters in different VPCs and use Express Connect to allow the VPCs to communicate with each other.

You must configure the following VPC-related parameters when you create an EMR cluster in a VPC:
  • VPC: the VPC to which the cluster belongs. If no VPCs are available, create a VPC in the VPC console. You can create a maximum of two VPCs by using one account. To create more than two VPCs, submit a ticket.
  • VSwitch: the vSwitch to which the cluster belongs. The vSwitch is used to support the communication of the ECS instances of the cluster. If no vSwitches are available, log on to the VPC console and click vSwitch in the left-side navigation pane to create a vSwitch. vSwitches are deployed in zones. When you create a vSwitch, you must select the zone in which you want to create a cluster.
  • Security Group Name: the security group to which the cluster belongs. Security groups in the classic network cannot be used. Only security groups in the specified VPC can be used. For security purposes, only the security groups created in EMR are available in the drop-down list. To create a security group, you need only to enter a name in the Security Group Name field.
The following example describes how to create a Hive cluster and an HBase cluster that belong to different VPCs and use Cloud Enterprise Network to allow the Hive cluster to access the HBase cluster.
  1. For more information about how to create a cluster, see Create a cluster.

    In the EMR console, create Hive cluster C1 and HBase cluster C2 in the China (Hangzhou) region. C1 belongs to VPC 1 and C2 belongs to VPC 2.

  2. Connect the two VPCs.

    For more information, see Create a CEN instance.

  3. Log on to the HBase cluster in SSH mode and run the following command in the HBase shell to create a table:
    create 'testfromHbase','cf'
  4. Log on to the Hive cluster in SSH mode and perform the following operations to enable communication between the clusters:
    1. Add the following information to the hosts file:
      $zk_ip emr-cluster // $zk_ip specifies the IP address of the ZooKeeper node in the HBase cluster. 
    2. Run the following commands in the Hive shell to access the HBase cluster:
      set hbase.zookeeper.quorum=172.*.*.111,172.*.*.112,172.*.*.113;
      CREATE EXTERNAL TABLE IF NOT EXISTS testfromHive (rowkey STRING, pageviews Int, bytes STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'testfromHbase');
      If the java.net.SocketTimeoutException error appears, add rules to the security group of the HBase cluster to enable all ports required for the Hive cluster to access the HBase cluster. The following figure shows an example. Security group rules

What do I do if the "Failed to get schema version" error is reported when the MetaStore client is initialized?

The following figure shows the error details. initialize_fail

Check the security settings of the ApsaraDB RDS for MySQL instance. Make sure that the IP addresses of the ECS instances that are used for your EMR cluster are added to the whitelist of the ApsaraDB RDS for MySQL instance. For more information, see Configure an IP address whitelist for an ApsaraDB RDS for MySQL instance.

What do I do if Hive metadata contains Chinese characters, such as Chinese characters in column comments and partition names?

Perform the following operations in the ApsaraDB RDS for MySQL database to encode the related fields in the UTF-8 format:
  1. Change the data type of the COMMENT column:
    alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
  2. Change the data type of the PARAM_VALUE column in the TABLE_PARAMS table:
    alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
  3. Change the data type of the PARAM_VALUE column in the PARTITION_PARAMS table:
    alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
  4. Change the data type of the PKEY_COMMENT column:
    alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
  5. Change the data type of the PARAM_VALUE column in the INDEX_PARAMS table:
    alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;