Community Blog How to Access Hadoop Components for Big Data

How to Access Hadoop Components for Big Data

In this tutorial, you can learn how to access hadoop components more easily with Hadoop User Experience (HUE).

When we want to play with big data, it is important that we understand how we can access the Hadoop components more easily, making easier and more productive.

Hadoop User Experience (HUE) is an open-source Web interface used for analysing data with Hadoop Ecosystem applications. Hue provides interfaces to interact with HDFS, MapReduce, Hive and even Impala queries. You can install it on any server. Then, users can access Hue right from within the browser, which can help to enhance the productivity of Hadoop development workflows. In other words, Hue essentially provides "a view on top of Hadoop", by making accessing the cluster and its components much easier, eliminating the need to use the command line prompt. For those who are new to Command Line and Linux environment, it tends be tough to even perform basic operations, such as viewing and editing files, or copying and moving file, using a command line interface. And to add to this, these actions are generally the most common operations that users will be carrying out day to day.

Alibaba E-MapReduce provides you with a set of services which includes HUE and it can be accessed through Apache Knox. Hue uses the CherryPy web server, and the default port is 8888. Therefore, to access Hue from Alibaba E-MapReduce console, we will need to add this port number though adding a Security Group Rule, which you can see how to do below.

Security Group ID ---> Cluster Overview ---> Next Page ---> Add Security Group, then you can specify the port range "8888/8888" and mention all the details and click OK once done. Then you can access Hue in Quick Links in the left pane of the EMR cluster you have created.

Clicking the quick link will take you to a page where you will be asked for your credentials to log in to Hue. And before logging in to Hue, you will also receive a warning to enable WebHDFS. Then please enable WebHDFS as it is important protocol if you want to access the files in HDFS over the web or HTTP.

Viewing and managing the data in HDFS is an important part of big data analytics. The HDFS Browser/File Browser in Hue will help us achieve this.

We have seen how accessing the Hadoop File System by logging into the cluster using Cygwin and interacting with the files in HDFS using Linux commands may be difficult for someone not familiar with using a Command Line Interface. Being able to access, browse, and interact with the files in Hadoop Distributed File System is one of the most important factors in the use and analysis of big data because this is an important basis for working with different tools related to big data. Hue provides a user interface for this, and this interface happens to be capable of performing all the required tasks. When you do not feel like working with the command line interface, the Hue interface may be a good option for you.

For step by step details, please go to Diving into Big Data: Hadoop User Experience.

Related Blog Posts

Diving into Big Data: Hadoop User Experience (Continued)

As we are taking a look at various ways of making Big Data Analytics more productive and efficient with better cluster management and easy-to-use interfaces. In this article, we will continue to walk through HUE, or Hadoop User Experience, discussing its several features and how you can make the best out of the interface and all of its features. In the previous article, Diving into Big Data: Hadoop User Experience, we started to look at how you can access Hue and what are some prerequisites needed for accessing Hadoop components using Hue. In this article, we will specifically focus on making the file operations easier with Hue, as well as the usage of editors, how to create workflows and scheduling them using Hue.

By reading through both this article and the previous one, you have gained a general understanding of HUE and several of its features. You also understand how HUE can help you navigate around with the Hadoop ecosystem. If you got at least one thing from these two blogs, then you know that, if you are not used to using command line interfaces or what a simpler way to interface with Hadoop and big data, the interface provided by HUE can be a helpful alternative.

Combining Redis with Hadoop and ELK for Big Data

We are already living in the era of Big Data. Big Data technology and products are ubiquitous in every aspect of our lives. From online banking to smart homes, Big Data has proven to be enormously useful in their respective use cases.

Redis—a high-performance key value database— has become an essential element in Big Data applications. As a NoSQL database, Redis helps enterprises make sense out of data by making database scaling more convenient and cost-effective. Cloud providers from across the globe, including Alibaba Cloud, are now offering a wide variety of Redis-related products for Big Data applications, such as Alibaba Cloud ApsaraDB for Redis.

This article introduces two methods of combining Redis with other Big Data technologies, specifically Hadoop and ELK.

Related Documentation

Big data instance type families

The big data instance type families d1 and d1ne are designed to compute and store massive amounts of data on the cloud, allowing you to achieve big data solutions at an enterprise level. Moreover, these instance family types can be used to build a Hadoop distributed computing architecture that is supplemented by self-hosted storage at your on-premises data center. This way, you can build a Hadoop cluster at costs similar to that of building a self-hosted cluster in your on-premises data center, while at the same time also guarantee increased storage space with improved performance.

Data interconnection between ES-Hadoop and Elasticsearch

You can directly write data to Alibaba Cloud Elasticsearch through ES-Hadoop based on Alibaba Cloud Elasticsearch and E-MapReduce.

Related Products

Elastic Compute Service

Alibaba Cloud Elastic Compute Service (ECS) provides fast memory and the latest Intel CPUs to help you to power your cloud applications and achieve faster results with low latency. All ECS instances come with Anti-DDoS protection to safeguard your data and applications from DDoS and Trojan attacks.

Alibaba Cloud Elasticsearch

Elasticsearch is a Lucene-based data search and analysis tool that provides distributed services. Elasticsearch is an open-source product that complies with the Apache open standards. It is the mainstream search engine for enterprise data.

Alibaba Cloud Elasticsearch includes multiple versions, including Elasticsearch 5.5.3 with Commercial Feature, Elasticsearch 6.3.2 with Commercial Feature, and Elasticsearch 6.7.0 with Commercial Feature. It also contains the X-Pack plug-in. You can use Alibaba Cloud Elasticsearch to analyze and search data. Alibaba Cloud Elasticsearch provides enterprise-class access control, security monitoring and alarms, and automatic reporting based on the open-source Elasticsearch engine.

0 0 0
Share on

Alibaba Clouder

2,600 posts | 750 followers

You may also like