What do I do if the services of a cluster fail to start because I do not have access permissions on the root storage directory of the cluster? - E-MapReduce

This topic describes the causes of and the solutions to startup failures of specific services of a cluster that occur when you do not have access permissions on the root storage directory of the cluster. The root storage directory of a cluster is specified by the fs.DefaultFS configuration item of the Hadoop-Common service in the cluster.

Problem description

When you view the results of check items that are in the abnormal state in the Health Check Items section of the Status tab of the Hadoop-Common service in the E-MapReduce (EMR) console, the [hadoop_fs_availability] DefaultFS is unable to access message is displayed. When you move the pointer over the icon, the system prompts the message fs.defaultFS is unable to access. Check the configuration items and select a storage address in which you have access permissions.

Scenarios in which this issue may occur:

When you create a cluster, you select the OSS-HDFS service, and select a bucket for which the OSS-HDFS service is activated as the root storage directory for the cluster. However, in the Basic Configuration step of the procedure for configuring the cluster, you use an Elastic Compute Service (ECS) application role that does not have access permissions on Object Storage Service (OSS). As a result, you cannot access the bucket when the cluster is running.
When you create a cluster, you pass the address of a bucket on which you do not have access permissions to the fs.DefaultFS configuration item of the Hadoop-Common service by using the custom software configuration method.
You set the fs.DefaultFS configuration item of the Hadoop-Common service of an existing cluster to a bucket on which you do not have access permissions.

Causes and solutions

The causes and solutions vary based on services.

YARN

Cause

You do not have access permissions on the bucket that is configured as the root storage directory of the cluster. As a result, when the YARN service is started in the cluster, the ResourceManager component cannot properly create data directories such as Node Labels, and the MRHistoryServer component cannot properly create data directories such as directories of aggregate logs.

Solution

Log on to the master node of the cluster. For more information, see Log on to a cluster.

Run the following commands to create data directories and grant users the required permissions on the directories:

sudo su hadoop
STORE_DIR=$(hdfs getconf -confKey yarn.node-labels.fs-store.root-dir)
hadoop fs -mkdir -p $STORE_DIR
hadoop fs -chmod 775 $STORE_DIR
hadoop fs -chown hadoop:hadoop $STORE_DIR
STAGING_DIR=$(hdfs getconf -confKey yarn.app.mapreduce.am.staging-dir)
hadoop fs -mkdir -p $STAGING_DIR
hadoop fs -chmod 777 $STAGING_DIR
hadoop fs -chown hadoop:hadoop $STAGING_DIR
hadoop fs -mkdir -p $STAGING_DIR/history
hadoop fs -chmod 775 $STAGING_DIR/history
hadoop fs -chown hadoop:hadoop $STAGING_DIR/history
LOG_DIR=$(hdfs getconf -confKey yarn.nodemanager.remote-app-log-dir)
hadoop fs -mkdir -p $LOG_DIR
hadoop fs -chmod 1777 $LOG_DIR
hadoop fs -chown hadoop:hadoop $LOG_DIR

Restart the YARN service. For more information, see Restart a service.
In the Components section of the Status tab of the YARN service in the EMR console, you can check whether the YARN service starts as expected.

Hive

Cause

You do not have access permissions on the bucket that is configured as the root storage directory of the cluster. As a result, when the Hive service is started in the cluster, the HiveServer component cannot properly create data directories such as a Hive warehouse.

Solution

Log on to the master node of the cluster. For more information, see Log on to a cluster.

Run the following commands to create data directories and grant users the required permissions on the directories:

hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chown hive /user/hive
hadoop fs -chown hive /user/hive/warehouse
hadoop fs -chmod 751 /user/hive
hadoop fs -chmod 1771 /user/hive/warehouse

In the Components section of the Status tab of the Hive service in the EMR console, you can check whether the HiveServer component starts as expected.

Spark

Cause

You do not have access permissions on the bucket that is configured as the root storage directory of the cluster. As a result, when the Spark service is started in the cluster, the Spark History directory cannot be properly created.

Solution

Log on to the master node of the cluster. For more information, see Log on to a cluster.
Run the following command to create the Spark History directory:
```
hadoop fs -mkdir /spark-history
```
In the Components section of the Status tab of the Spark service in the EMR console, you can check whether the SparkHistoryServer and SparkThriftServer components start as expected.

Tez

Cause

You do not have access permissions on the bucket that is configured as the root storage directory of the cluster. As a result, when the Tez service is started in the cluster, related files cannot be properly uploaded to the related storage directories.

Solution

Log on to the master node of the cluster. For more information, see Log on to a cluster.

Run the following commands:

tez_dir=`readlink  $TEZ_HOME`
tez_version=`basename $tez_dir`

cd /tmp
mkdir -p $tez_version/lib
cp $TEZ_HOME/*.jar $tez_version
cp $TEZ_HOME/lib/*.jar $tez_version/lib
tar czf $tez_version.tar.gz $tez_version

hadoop fs -mkdir -p /apps/$tez_version
hadoop fs -rm -f /apps/$tez_version/$tez_version.tar.gz
hadoop fs -put $tez_version.tar.gz /apps/$tez_version/

rm -fr $tez_version*

In the Health Check Items section of the Status tab of the Tez service in the EMR console, you can check whether the status of the tez_env_status check item is normal.

Flink

Cause

You do not have access permissions on the bucket that is configured as the root storage directory of the cluster. As a result, when the Flink service is started in the cluster, the Flink History directory cannot be properly created and Flink jobs that are started based on the default settings may not be able to write checkpoints or savepoints to external storage systems.

Solution

Log on to the master node of the cluster. For more information, see Log on to a cluster.
Run the following commands to create directories required by Flink and grant users the required permissions on the directories:
```
hdfs dfs -mkdir -p /flink/flink-checkpoints /flink/flink-jobs /flink/flink-savepoints
hdfs dfs -chmod -R /flink
```
Restart the Flink service. For more information, see Restart a service.
In the Components section of the Status tab of the Flink service in the EMR console, you can check whether the FlinkHistoryServer component starts as expected. You can also start a sample Flink job to check whether checkpoints of the Flink job are written to external storage systems.

HBase

Cause

You do not have access permissions on the bucket that is configured as the root storage directory of the cluster. As a result, when the HBase service is started in the cluster, an HBase data storage directory cannot be properly created.

Solution

Log on to the master node of the cluster. For more information, see Log on to a cluster.
Run the following commands to create a data storage directory and grant users the required permissions on the directory:
```
hadoop dfs -mkdir -p /hbase
hadoop fs -chown hbase:hadoop /hbase
hadoop fs -chmod 755 /hbase
```
Restart the HBase service. For more information, see Restart a service.
In the Components section of the Status tab of the HBase service in the EMR console, you can check whether the HBase service starts as expected.