This topic describes how to build a Hadoop pseudo-distributed environment on an Elastic
Compute Service (ECS) instance that runs a Linux operating system.
Prerequisites
- An ECS Linux instance is created. For more information, see Create an instance by using the wizard.
In this topic, an ECS instance that has the following configurations is used:
- Instance type: ecs.g6.large
- Operating system: CentOS 7.7 64-bit public image
- Network type: Virtual Private Cloud (VPC)
- IP address: public IP address
Note The commands used may vary based on the actual operating system and software versions
of your instance. If your software versions or operating system differs from the preceding
versions, adjust the commands accordingly.
- The ECS instance is added to security groups that contain rules to allow traffic on
ports 8088 and 50070 used by Hadoop. For more information, see Add a security group rule.
Background information
Hadoop is an Apache open source distributed framework written in java to efficiently
process and store large datasets across clusters. It allows users to develop distributed
programs without the need to understand the underlying layer. Hadoop Distributed File
System (HDFS) and MapReduce are vital components of Hadoop.
- HDFS is a distributed file system that allows distributed storage and retrieval of
application data.
- MapReduce is a distributed computing framework that distributes computing jobs across
servers in a Hadoop cluster. Computing jobs are split into map and reduce tasks. JobTracker
schedules these tasks for distributed processing.
For more information, visit the
Apache Hadoop website.
Step 1: Install Java Development Kit (JDK)
- Connect to the ECS instance.
- Run the following command to download the JDK 1.8 installation package:
wget https://download.java.net/openjdk/jdk8u41/ri/openjdk-8u41-b04-linux-x64-14_jan_2020.tar.gz
- Run the following command to decompress the downloaded installation package:
tar -zxvf openjdk-8u41-b04-linux-x64-14_jan_2020.tar.gz
- Run the following command to move and rename the folder to which the JDK 1.8 installation
files are extracted.
In this example, the folder is renamed
java8
. You can specify a different name for the folder based on your business requirements.
mv java-se-8u41-ri/ /usr/java8
- Run the following commands to configure Java environment variables.
If your specified name of the folder to which the JDK 1.8 installation files are extracted
is not java8, replace
java8
in the following commands with the actual folder name:
echo 'export JAVA_HOME=/usr/java8' >> /etc/profile
echo 'export PATH=$PATH:$JAVA_HOME/bin' >> /etc/profile
source /etc/profile
- Run the following command to check whether JDK is installed:
java -version
A command output similar to the following one indicates that JDK 1.8 is installed:
openjdk version "1.8.0_41"
OpenJDK Runtime Environment (build 1.8.0_41-b04)
OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)
Step 2: Install Hadoop
- Run the following command to download the Hadoop installation package:
wget https://mirrors.bfsu.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
- Run the following command to decompress the Hadoop installation package to the /opt/hadoop path:
tar -zxvf hadoop-2.10.1.tar.gz -C /opt/
mv /opt/hadoop-2.10.1 /opt/hadoop
- Run the following commands to configure Hadoop environment variables:
echo 'export HADOOP_HOME=/opt/hadoop/' >> /etc/profile
echo 'export PATH=$PATH:$HADOOP_HOME/bin' >> /etc/profile
echo 'export PATH=$PATH:$HADOOP_HOME/sbin' >> /etc/profile
source /etc/profile
- Run the following commands to modify the yarn-env.sh and hadoop-env.sh configuration files:
echo "export JAVA_HOME=/usr/java8" >> /opt/hadoop/etc/hadoop/yarn-env.sh
echo "export JAVA_HOME=/usr/java8" >> /opt/hadoop/etc/hadoop/hadoop-env.sh
- Run the following command to check whether Hadoop is installed:
hadoop version
A command output similar to the following one indicates that Hadoop is installed:
Hadoop 2.10.1
Subversion https://github.com/apache/hadoop -r 1827467c9a56f133025f28557bfc2c562d78e816
Compiled by centos on 2020-09-14T13:17Z
Compiled with protoc 2.5.0
From source with checksum 3114edef868f1f3824e7d0f68be03650
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-2.10.1.jar
Step 3: Configure Hadoop
- Modify the core-site.xml configuration file of Hadoop.
- Run the following command to open the core-site.xml file:
vim /opt/hadoop/etc/hadoop/core-site.xml
- Press the
I
key to enter the edit mode.
- In the
configuration
section, add the following content: <property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/tmp</value>
<description>location to store temporary files</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
- Press the
Esc
key to exit the edit mode and enter :wq
to save and close the file.
- Modify the hdfs-site.xml configuration file of Hadoop.
- Run the following command to open the hdfs-site.xml file:
vim /opt/hadoop/etc/hadoop/hdfs-site.xml
- Press the
I
key to enter the edit mode.
- In the
configuration
section, add the following content: <property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/tmp/dfs/data</value>
</property>
- Press the
Esc
key to exit the edit mode and enter :wq
to save and close the file.
Step 4: Configure password-free SSH logon
- Run the following command to create a public key and a private key:
ssh-keygen -t rsa
A command output similar to the following one indicates that the public and private
keys are created:
[root@iZbp1chrrv37a2kts7sydsZ ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:gjWO5mgARst+O5VUaTnGs+LxVhfmCJnQwKfEBTro2oQ root@iZbp1chrrv37a2kts7s****
The key's randomart image is:
+---[RSA 2048]----+
| . o+Bo= |
|o o .+.# o |
|.= o..B = + . |
|=. oO.o o o |
|Eo..=o* S . |
|.+.+o. + |
|. +o. . |
| . . |
| |
+----[SHA256]-----+
- Run the following command to add the public key to the authorized_keys file:
cd .ssh
cat id_rsa.pub >> authorized_keys
Step 5: Start Hadoop
- Run the following command to initialize
namenode
:
- Run the following commands in sequence to start Hadoop:
start-dfs.sh
At the prompts that appear, enter
yes
, as shown in the following figure.

start-yarn.sh
A command output similar to the following one is returned:
[root@iZbp1chrrv37a2kts7s**** .ssh]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/logs/yarn-root-resourcemanager-iZbp1chrrv37a2kts7sydsZ.out
localhost: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-iZbp1chrrv37a2kts7sydsZ.out
- Run the following command to view the processes that are started:
jps
The following processes are started:
[root@iZbp1chrrv37a2kts7s**** .ssh]# jps
11620 DataNode
11493 NameNode
11782 SecondaryNameNode
11942 ResourceManager
12344 Jps
12047 NodeManager
- Use a browser to access
http://<Public IP address of the ECS instance>:8088
and http://<Public IP address of the ECS instance>:50070
. If the Hadoop pseudo-distributed environment is built, the page shown in the following
figure is displayed.
Notice Make sure that security group rules of the ECS instance allow inbound traffic to ports
8088 and 50070 used by Hadoop. Otherwise, the Hadoop pseudo-distributed environment
cannot be accessed. For more information, see
Add a security group rule.

