Community Blog How to Setup a Single-Node Hadoop File System Cluster on Ubuntu

How to Setup a Single-Node Hadoop File System Cluster on Ubuntu

In this article, we will introduce how to set up a Hadoop file system on a single node cluster with Ubuntu.

Hadoop is a free, open-source, scalable, and fault-tolerant framework written in Java that provides an efficient framework for running jobs on multiple nodes of clusters. Hadoop contains three main components: HDFS, MapReduce and YARN.

Since Hadoop is written in Java, you will need to install Java to your server first. You can install it by just running the following command:

apt-get install default-jdk -y

Then you can create a new user account for Hadoop and set up the SSH key-based authentication.

Next, download the latest version of the Hadoop from their official website and extract the downloaded file.

Next, move the extracted directory to the /opt with the following command:

mv hadoop-3.1.0 /opt/hadoop

Next, change the ownership of the hadoop directory using the following command:

chown -R hadoop:hadoop /opt/hadoop/

Next, you will need to set and initialize environment variables. Then log in with hadoop user and create a directory for hadoop file system storage:

mkdir -p /opt/hadoop/hadoopdata/hdfs/namenode
mkdir -p /opt/hadoop/hadoopdata/hdfs/datanode

First, you will need to edit core-site.xml file. This file contains the Hadoop port number information, file system allocated memory, data store memory limit and the size of Read/Write buffers.

nano /opt/hadoop/etc/hadoop/core-site.xml

Make the following changes:


Save the file, then open the hdfs-site.xml file. This file contains the replication data value, namenode path and datanode path for local file systems.

nano /opt/hadoop/etc/hadoop/hdfs-site.xml

Make the following changes:




Save the file, then open the mapred-site.xml file.

nano /opt/hadoop/etc/hadoop/mapred-site.xml

Make the following changes:


Save the file, then open the yarn-site.xml file:

nano /opt/hadoop/etc/hadoop/yarn-site.xml

Make the following changes:


Save and close the file, when you are finished.

Hadoop is now installed and configured. It's time to initialize HDFS file system. You can do this by formatting Namenode:

hdfs namenode -format

Next, change the directory to the /opt/hadoop/sbin and start the Hadoop cluster using the following command:

cd /opt/hadoop/sbin/


Next, check the status of the service using the following command:


Now Hadoop is installed, you can access Hadoop different services through web browser. By default, Hadoop NameNode service started on port 9870. You can access it by visiting the URL in your web browser.

To test Hadoop file system cluster. Create a directory in the HDFS file system and copy a file from local file system to HDFS storage. For details, you can go to How to Setup Hadoop Cluster Ubuntu 16.04.

Related Blog Posts

Setup a Single-Node Hadoop Cluster Using Docker

Docker is a very popular containerization tool with which you can create containers where software or other dependencies that are installed run the application.

Apache Hadoop is a core big data framework written in Java to store and process Big Data. The storage component of Hadoop is called Hadoop Distributed File system (usually abbreviated HDFS) and the processing component of Hadoop is called MapReduce. Next, there are several daemons that will run inside a Hadoop cluster, which include NameNode, DataNode, Secondary Namenode, ResourceManager, and NodeManager.

This article shows you how to set up Docker to be used to launch a single-node Hadoop cluster inside a Docker container on Alibaba Cloud.

Diving into Big Data: Hadoop User Experience

Hadoop User Experience (HUE) is an open-source Web interface used for analysing data with Hadoop Ecosystem applications. Hue provides interfaces to interact with HDFS, MapReduce, Hive and even Impala queries. In this article, we will explore how to access, browse, and interact with the files in Hadoop Distributed File System, and how using these can be simpler and easy.

Related Products

Data Integration

Data Integration is an all-in-one data synchronization platform. The platform supports online real-time and offline data exchange between all data sources, networks, and locations.

Data Integration leverages the computing capability of Hadoop clusters to synchronize the HDFS data from clusters to MaxCompute. This is called Mass Cloud Upload. Data Integration can transmit up to 5TB of data per day. The maximum transmission rate is 2 GB/s.

Object Storage Service

Alibaba Cloud Object Storage Service (OSS) is an encrypted, secure, cost-effective, and easy-to-use object storage service that enables you to store, back up, and archive large amounts of data in the cloud, with a guaranteed reliability of 99.999999999%. RESTful APIs allow storage and access to OSS anywhere on the Internet. You can elastically scale the capacity and processing capability, and choose from a variety of storage types to optimize the storage cost.

0 0 0
Share on

Alibaba Clouder

2,600 posts | 750 followers

You may also like