Advantages of Hadoop over cloud
Created#More Posted time:Mar 17, 2017 13:52 PM
In the Hadoop Submit World, Doug Cutting, founder of Hadoop, talked about that the trend of Hadoop: the use of new hardware, particularly with big storage and cloud-based big data systems. Since cloud-based development is the trend, it must have some advantages. We previously talked about challenges of cloud-based Hadoop. Coincident with the challenges, there are certainly some advantages. Some readers may ask why there is no disadvantage. Actually, some disadvantages have been discussed about when we talked about challenges. Then how can we eliminate these disadvantages in the cloud? Some opinions are given below, but not purely based on technology, for careful reading.
Advantages of cloud-based Hadoop
Easy to use
Mainly reflected in aspects of creation, destroy, expansion and shrinkage of clusters. Currently, a cluster can be generally started within 4 minutes. Support job composing, warning of wrong job execution and so on. Hadoop provides basic software. Presently, systems such as Hue, Zeppelin, Ooize and others allow webpage-based interactive job composing, which, however, is not oriented to enterprises, nor has guaranteed high availability; for warning, integration with other colleagues’ accounts in the group is hard to be realized. EMapReduce will provide services in this aspect, but some services are still under development.
Mainly in contrast to the high cost for purchasing offline Hadoop and maintaining Hadoop clusters. There are better combinations in the cloud. For example, data can be put in OSS before EMapReduce cluster is started to run as needed. Based on the business performance of a customer, resources held constantly by the customer may be subject to annual or monthly contracts, and additional resources may be flexibly provided gearing with the business growth. For ETL required for several hours per day, it can be run based on needs with data stored in OSS.
Due to its deep integration with other products of Alibaba Cloud, you may not only use the big data system, but also need the simultaneous coordination of many systems for seamless integration with almost all other data storage services in the Alibaba Cloud EMR.
The platform provides operation maintenance tools, which may automatically work to make repairs in scenarios where automatic repair can be done. For example, re-start of failed data nodes; warning the user about over pressure of the master before inquiry.
VPC is provided by default isolation from other users; strategies with access subject to setup by the security group are provided; RAM provides parent/child accounts with independent access to resources.
An expert system is provided to analyze the performance of customers’ jobs. Expert services are also provided for consultation about big data solutions and solutions to various problems.
This article, as the final piece of this series, has briefly introduced some advantages of the cloud.
For other pieces in the cloud-based Hadoop series, see:
• Deployment structure of Hadoop over cloud
• Challenges of Hadoop over cloud