By Sajid Qureshi, Alibaba Community Blog author.
In this tutorial, you will learn how to install Hadoop and its components on a multinode cluster using Apache Ambari.
Apache Ambari is an open-source tool that allows you to deploy, manage, and monitor a Hadoop cluster. You can also integrate Hadoop with your existing infrastructure through using Ambari. This is especially useful for an enterprise-level infrastructure.
Moreover, you can use Ambari to check and review cluster health, with things like heatmaps, and view MapReduce from Ambari's dashboard. Ambari comes with a very simple and easy-to-use, user-friendly interface, where you can complete various management, configuring and monitoring tasks.
For this tutorial, you will need the following items:
Note: For this tutorial, we will be using four Alibaba Cloud ECS instances, corresponding to node01, node02, node03, node04, to deploy a Hadoop cluster. Next, our Ambari server will be installed on node01, and there will also be Ambari agents running on every single node in our Hadoop cluster.
For this tutorial, before you do anything else, it's recommended that you upgrade all the available packages before installing any new package on the system. To do that, run the
sudo apt-get update command.
Apache Ambari doesn't require any special hardware or software configurations. So, let's get directly started and install Apache Ambari on node01. To do this, you'll need to add the Ambari repository to your system using the following command:
sudo wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu18/2.x/updates/126.96.36.199/ambari.list
Next, you will need to add the key using the following command:
sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
After this, run the
sudo apt-get install ambari-server command to install the Apache Ambari server. Note that this command may take some time and will install Apache Ambari on your system. Then, once the installation is complete, you will need to configure it and set up the Ambari server.
You can do so by running the
sudo ambari-server setup command and it will do the job for you. After doing this, you'll be prompted to customize the user account for the Ambari server daemon. You can choose any option as your wish. If you don't already have the Java Development Kit (JDK) installed on your system, then you'll be asked for that, too.
Checking JDK...  Oracle JDK 1.8 + Java Cryptography Extension (JCE) Policy Files 8  Custom JDK ============================================================================== Enter choice (1): 1
Apache Ambari, by default, uses the PostgreSQL database to store the configuration data. Now that the configuration is complete, and we are ready now to start the Ambari server, so let's start the ambari server by using the
sudo ambari-server start command.
After running this command, you should see the following output:
Server started listening on 8080 DB configs consistency check: no errors and warnings were found. Ambari Server 'start' completed successfully. sajid@testalibaba:~$
Following this, you can check the status of the Ambari server using the
sudo ambari-server status command. As the ouput of this command, you should see
Ambari Server running.
Now that you've set up the Ambari server, you can access its interface on port 8080. So now go ahead and open up a web browser and visit
http://YourDomain.com:8080. Of course, you'll need to change the address to your actual one.
Now, you'll want to log in to Ambari using the default credentials admin as both the username and password. Then, following this, you should see the Apache Ambari homepage, which should look something like this:
Now, click on the Launch Install Wizard button to proceed. From there, you'll be prompted to enter a name for your Hadoop cluster. After doing so, click Next, and select a stack, then click Next again. Then, you'll want to select HDP 3.1.
On the next interface, you will be prompted to enter the hosts' details. For this, simply, enter the FQDN (fully qualified domain name) of each host. Then, you'll need to select the private key used when creating these instances and the username to log in to these nodes. Finally, click Register and Confirm.
Then, on the next interface, we will see host registration status. When you see all hosts entries are in the green, go on to the next interface. There you will be asked to choose services to install for your Hadoop cluster according to the stacks that you've selected earlier. Based on our selections, Ambari will list other dependent services that are needed to be included in our cluster.
On the following interface, you will need to associate master services with the nodes. For this, select the nodes where
zookeeper, so on, will run. Keep in mind that the primary and secondary services like
secondary namenode are not on the same machine, whereas
zookeeper will run on all the machines.
Following this, you will then need to choose where data nodes and node managers will run. Here, we will run data nodes and node manager on all the machines. You can choose any different configuration setup according to your choice.
On the next interface, we will need to customize services for our cluster. We can change any configuration value in this step. Also, if there is any error or requirement in the configuration, then you can find them on this page. You will need to resolve the errors before proceeding further.
After this, you will need to review all the services and configuration setup. If nothing goes wrong and everything looks good then proceed further and start the installation. Note that it may take a some time.
Then, once the installation is completed, click Next to proceed to the summary page, then click Complete to finish this process. Finally, you will see the cluster homepage with charts like this:
In this tutorial, you've learned how to install Hadoop and its components on a multinode cluster using Apache Ambari. Hopefully, now you can deploy, manage, and monitor your Hadoop cluster using the Apache Ambari.
Alibaba Clouder - August 26, 2019
Alibaba Clouder - May 22, 2019
Alibaba Clouder - April 9, 2019
Apache Flink Community China - January 9, 2020
Alibaba Clouder - September 29, 2019
Alibaba Clouder - December 26, 2017
A Big Data service that uses Apache Hadoop and Spark to process and analyze dataLearn More
A high-performance managed Memcached database service to get your applications up and runningLearn More
ApsaraDB for POLARDB is a cloud-native relational database compatible with MySQL, PostgreSQL, and Oracle.Learn More
A financial-grade distributed relational database that features high stability, high scalability, and high performance.Learn More
More Posts by Alibaba Clouder